Merge pull request #226 from AutoForgeAI/feat/batch-size-limits-and-testing-batch-setting

feat: increase batch size limits to 15 and add testing_batch_size setting
version patch
2026-03-21 21:03:08 +00:00 · 2026-03-20 13:41:54 +02:00 · 2026-03-20 13:39:56 +02:00 · 2026-03-20 13:39:23 +02:00 · 2026-03-20 13:39:19 +02:00 · 2026-02-26 14:10:02 +02:00
76 changed files with 4136 additions and 671 deletions
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -12,7 +12,13 @@ Pull request(s): $ARGUMENTS
 1. **Retrieve PR Details**
   - Use the GH CLI tool to retrieve the details (descriptions, diffs, comments, feedback, reviews, etc)
-2. **Assess PR Complexity**
+2. **Check for Merge Conflicts**
   - After retrieving PR details, check whether the PR has merge conflicts against the target branch
   - Use `gh pr view <number> --json mergeable,mergeStateStatus` or attempt a local merge check with `git merge-tree`
   - If conflicts exist, note the conflicting files — these must be resolved on the PR branch before merging
   - Surface conflicts early so they inform the rest of the review (don't discover them as a surprise at merge time)
 3. **Assess PR Complexity**
   After retrieving PR details, assess complexity based on:
   - Number of files changed
@@ -34,13 +40,13 @@ Pull request(s): $ARGUMENTS
   - >15 files, OR >500 lines, OR >2 contributors, OR touches core architecture
   - Spawn up to 3 agents to analyze different aspects (e.g., security, performance, architecture)
-3. **Analyze Codebase Impact**
+4. **Analyze Codebase Impact**
   - Based on the complexity tier determined above, spawn the appropriate number of deep dive subagents
   - For Simple PRs: analyze directly without spawning agents
   - For Medium PRs: spawn 1-2 agents focusing on the most impacted areas
   - For Complex PRs: spawn up to 3 agents to cover security, performance, and architectural concerns
-4. **PR Scope & Title Alignment Check**
+5. **PR Scope & Title Alignment Check**
   - Compare the PR title and description against the actual diff content
   - Check whether the PR is focused on a single coherent change or contains multiple unrelated changes
   - If the title/description describe one thing but the PR contains significantly more (e.g., title says "fix typo in README" but the diff touches 20 files across multiple domains), flag this as a **scope mismatch**
@@ -48,28 +54,53 @@ Pull request(s): $ARGUMENTS
   - Suggest specific ways to split the PR (e.g., "separate the refactor from the feature addition")
   - Reviewing large, unfocused PRs is impractical and error-prone; the review cannot provide adequate assurance for such changes
-5. **Vision Alignment Check**
+6. **Vision Alignment Check**
-   - Read the project's README.md and CLAUDE.md to understand the application's core purpose
+   - **VISION.md protection**: First, check whether the PR diff modifies `VISION.md` in any way (edits, deletions, renames). If it does, **stop the review immediately** — verdict is **DON'T MERGE**. VISION.md is immutable and no PR is permitted to alter it. Explain this to the user and skip all remaining steps.
-   - Assess whether this PR aligns with the application's intended functionality
+   - Read the project's `VISION.md`, `README.md`, and `CLAUDE.md` to understand the application's core purpose and mandatory architectural constraints
-   - If the changes deviate significantly from the core vision or add functionality that doesn't serve the application's purpose, note this in the review
+   - Assess whether this PR aligns with the vision defined in `VISION.md`
-   - This is not a blocker, but should be flagged for the reviewer's consideration
+   - **Vision deviation is a merge blocker.** If the PR introduces functionality, integrations, or architectural changes that conflict with `VISION.md`, the verdict must be **DON'T MERGE**. This is not negotiable — the vision document takes precedence over any PR rationale.
-6. **Safety Assessment**
+7. **Safety Assessment**
   - Provide a review on whether the PR is safe to merge as-is
   - Provide any feedback in terms of risk level
-7. **Improvements**
+8. **Improvements**
   - Propose any improvements in terms of importance and complexity
-8. **Merge Recommendation**
+9. **Merge Recommendation**
-   - Based on all findings, provide a clear merge/don't-merge recommendation
+   - Based on all findings (including merge conflict status from step 2), provide a clear recommendation
-   - If all concerns are minor (cosmetic issues, naming suggestions, small style nits, missing comments, etc.), recommend **merging the PR** and note that the reviewer can address these minor concerns themselves with a quick follow-up commit pushed directly to master
+   - **If no concerns and no conflicts**: recommend merging as-is
-   - If there are significant concerns (bugs, security issues, architectural problems, scope mismatch), recommend **not merging** and explain what needs to be resolved first
+   - **If concerns are minor/fixable and/or merge conflicts exist**: recommend fixing on the PR branch first, then merging. Never merge a PR with known issues to main — always fix on the PR branch first
   - **If there are significant concerns** (bugs, security issues, architectural problems, scope mismatch) that require author input or are too risky to fix: recommend **not merging** and explain what needs to be resolved
-9. **TLDR**
+10. **TLDR**
-   - End the review with a `## TLDR` section
+    - End the review with a `## TLDR` section
-   - In 3-5 bullet points maximum, summarize:
+    - In 3-5 bullet points maximum, summarize:
-     - What this PR is actually about (one sentence)
+      - What this PR is actually about (one sentence)
-     - The key concerns, if any (or "no significant concerns")
+      - Merge conflict status (clean or conflicting files)
-     - **Verdict: MERGE** / **MERGE (with minor follow-up)** / **DON'T MERGE** with a one-line reason
+      - The key concerns, if any (or "no significant concerns")
-   - This section should be scannable in under 10 seconds
+      - **Verdict: MERGE** / **MERGE (after fixes)** / **DON'T MERGE** with a one-line reason
    - This section should be scannable in under 10 seconds
    Verdict definitions:
    - **MERGE** — no issues, clean to merge as-is
    - **MERGE (after fixes)** — minor issues and/or conflicts exist, but can be resolved on the PR branch first, then merged
    - **DON'T MERGE** — needs author attention, too complex or risky to fix without their input
 11. **Post-Review Action**
    - Immediately after the TLDR, provide a `## Recommended Action` section
    - Based on the verdict, recommend one of the following actions:
    **If verdict is MERGE (no concerns):**
    - Merge as-is. No further action needed.
    **If verdict is MERGE (after fixes):**
    - List the specific changes that need to be made (fixes, conflict resolutions, etc.)
    - Offer to: check out the PR branch, resolve any merge conflicts, apply the minor fixes identified during review, push the updated branch, then merge the now-clean PR
    - Ask the user: *"Should I check out the PR branch, apply these fixes, and then merge?"*
    - **Never merge first and fix on main later** — always fix on the PR branch before merging
    **If verdict is DON'T MERGE:**
    - If the issues are contained and you are confident you can fix them: offer the same workflow as "MERGE (after fixes)" — check out the PR branch, apply fixes, push, then merge
    - If the issues are too complex, risky, or require author input (e.g., design decisions, major refactors, unclear intent): recommend sending the PR back to the author with specific feedback on what needs to change
    - Be honest about your confidence level — if you're unsure whether you can address the concerns correctly, say so and defer to the author
--- a/.claude/launch.json
+++ b/.claude/launch.json
@@ -0,0 +1,18 @@
 {
  "version": "0.0.1",
  "configurations": [
    {
      "name": "backend",
      "runtimeExecutable": "python",
      "runtimeArgs": ["-m", "uvicorn", "server.main:app", "--host", "127.0.0.1", "--port", "8888", "--reload"],
      "port": 8888
    },
    {
      "name": "frontend",
      "runtimeExecutable": "cmd",
      "runtimeArgs": ["/c", "cd ui && npx vite"],
      "port": 5173
    }
  ],
  "autoVerify": true
 }
--- a/.claude/skills/playwright-cli/SKILL.md
+++ b/.claude/skills/playwright-cli/SKILL.md
@@ -0,0 +1,259 @@
 ---
 name: playwright-cli
 description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
 allowed-tools: Bash(playwright-cli:*)
 ---
 # Browser Automation with playwright-cli
 ## Quick start
 ```bash
 # open new browser
 playwright-cli open
 # navigate to a page
 playwright-cli goto https://playwright.dev
 # interact with the page using refs from the snapshot
 playwright-cli click e15
 playwright-cli type "page.click"
 playwright-cli press Enter
 # take a screenshot
 playwright-cli screenshot
 # close the browser
 playwright-cli close
 ```
 ## Commands
 ### Core
 ```bash
 playwright-cli open
 # open and navigate right away
 playwright-cli open https://example.com/
 playwright-cli goto https://playwright.dev
 playwright-cli type "search query"
 playwright-cli click e3
 playwright-cli dblclick e7
 playwright-cli fill e5 "user@example.com"
 playwright-cli drag e2 e8
 playwright-cli hover e4
 playwright-cli select e9 "option-value"
 playwright-cli upload ./document.pdf
 playwright-cli check e12
 playwright-cli uncheck e12
 playwright-cli snapshot
 playwright-cli snapshot --filename=after-click.yaml
 playwright-cli eval "document.title"
 playwright-cli eval "el => el.textContent" e5
 playwright-cli dialog-accept
 playwright-cli dialog-accept "confirmation text"
 playwright-cli dialog-dismiss
 playwright-cli resize 1920 1080
 playwright-cli close
 ```
 ### Navigation
 ```bash
 playwright-cli go-back
 playwright-cli go-forward
 playwright-cli reload
 ```
 ### Keyboard
 ```bash
 playwright-cli press Enter
 playwright-cli press ArrowDown
 playwright-cli keydown Shift
 playwright-cli keyup Shift
 ```
 ### Mouse
 ```bash
 playwright-cli mousemove 150 300
 playwright-cli mousedown
 playwright-cli mousedown right
 playwright-cli mouseup
 playwright-cli mouseup right
 playwright-cli mousewheel 0 100
 ```
 ### Save as
 ```bash
 playwright-cli screenshot
 playwright-cli screenshot e5
 playwright-cli screenshot --filename=page.png
 playwright-cli pdf --filename=page.pdf
 ```
 ### Tabs
 ```bash
 playwright-cli tab-list
 playwright-cli tab-new
 playwright-cli tab-new https://example.com/page
 playwright-cli tab-close
 playwright-cli tab-close 2
 playwright-cli tab-select 0
 ```
 ### Storage
 ```bash
 playwright-cli state-save
 playwright-cli state-save auth.json
 playwright-cli state-load auth.json
 # Cookies
 playwright-cli cookie-list
 playwright-cli cookie-list --domain=example.com
 playwright-cli cookie-get session_id
 playwright-cli cookie-set session_id abc123
 playwright-cli cookie-set session_id abc123 --domain=example.com --httpOnly --secure
 playwright-cli cookie-delete session_id
 playwright-cli cookie-clear
 # LocalStorage
 playwright-cli localstorage-list
 playwright-cli localstorage-get theme
 playwright-cli localstorage-set theme dark
 playwright-cli localstorage-delete theme
 playwright-cli localstorage-clear
 # SessionStorage
 playwright-cli sessionstorage-list
 playwright-cli sessionstorage-get step
 playwright-cli sessionstorage-set step 3
 playwright-cli sessionstorage-delete step
 playwright-cli sessionstorage-clear
 ```
 ### Network
 ```bash
 playwright-cli route "**/*.jpg" --status=404
 playwright-cli route "https://api.example.com/**" --body='{"mock": true}'
 playwright-cli route-list
 playwright-cli unroute "**/*.jpg"
 playwright-cli unroute
 ```
 ### DevTools
 ```bash
 playwright-cli console
 playwright-cli console warning
 playwright-cli network
 playwright-cli run-code "async page => await page.context().grantPermissions(['geolocation'])"
 playwright-cli tracing-start
 playwright-cli tracing-stop
 playwright-cli video-start
 playwright-cli video-stop video.webm
 ```
 ### Install
 ```bash
 playwright-cli install --skills
 playwright-cli install-browser
 ```
 ### Configuration
 ```bash
 # Use specific browser when creating session
 playwright-cli open --browser=chrome
 playwright-cli open --browser=firefox
 playwright-cli open --browser=webkit
 playwright-cli open --browser=msedge
 # Connect to browser via extension
 playwright-cli open --extension
 # Use persistent profile (by default profile is in-memory)
 playwright-cli open --persistent
 # Use persistent profile with custom directory
 playwright-cli open --profile=/path/to/profile
 # Start with config file
 playwright-cli open --config=my-config.json
 # Close the browser
 playwright-cli close
 # Delete user data for the default session
 playwright-cli delete-data
 ```
 ### Browser Sessions
 ```bash
 # create new browser session named "mysession" with persistent profile
 playwright-cli -s=mysession open example.com --persistent
 # same with manually specified profile directory (use when requested explicitly)
 playwright-cli -s=mysession open example.com --profile=/path/to/profile
 playwright-cli -s=mysession click e6
 playwright-cli -s=mysession close  # stop a named browser
 playwright-cli -s=mysession delete-data  # delete user data for persistent session
 playwright-cli list
 # Close all browsers
 playwright-cli close-all
 # Forcefully kill all browser processes
 playwright-cli kill-all
 ```
 ## Example: Form submission
 ```bash
 playwright-cli open https://example.com/form
 playwright-cli snapshot
 playwright-cli fill e1 "user@example.com"
 playwright-cli fill e2 "password123"
 playwright-cli click e3
 playwright-cli snapshot
 playwright-cli close
 ```
 ## Example: Multi-tab workflow
 ```bash
 playwright-cli open https://example.com
 playwright-cli tab-new https://example.com/other
 playwright-cli tab-list
 playwright-cli tab-select 0
 playwright-cli snapshot
 playwright-cli close
 ```
 ## Example: Debugging with DevTools
 ```bash
 playwright-cli open https://example.com
 playwright-cli click e4
 playwright-cli fill e7 "test"
 playwright-cli console
 playwright-cli network
 playwright-cli close
 ```
 ```bash
 playwright-cli open https://example.com
 playwright-cli tracing-start
 playwright-cli click e4
 playwright-cli fill e7 "test"
 playwright-cli tracing-stop
 playwright-cli close
 ```
 ## Specific tasks
 * **Request mocking** [references/request-mocking.md](references/request-mocking.md)
 * **Running Playwright code** [references/running-code.md](references/running-code.md)
 * **Browser session management** [references/session-management.md](references/session-management.md)
 * **Storage state (cookies, localStorage)** [references/storage-state.md](references/storage-state.md)
 * **Test generation** [references/test-generation.md](references/test-generation.md)
 * **Tracing** [references/tracing.md](references/tracing.md)
 * **Video recording** [references/video-recording.md](references/video-recording.md)
--- a/.claude/skills/playwright-cli/references/request-mocking.md
+++ b/.claude/skills/playwright-cli/references/request-mocking.md
@@ -0,0 +1,87 @@
 # Request Mocking
 Intercept, mock, modify, and block network requests.
 ## CLI Route Commands
 ```bash
 # Mock with custom status
 playwright-cli route "**/*.jpg" --status=404
 # Mock with JSON body
 playwright-cli route "**/api/users" --body='[{"id":1,"name":"Alice"}]' --content-type=application/json
 # Mock with custom headers
 playwright-cli route "**/api/data" --body='{"ok":true}' --header="X-Custom: value"
 # Remove headers from requests
 playwright-cli route "**/*" --remove-header=cookie,authorization
 # List active routes
 playwright-cli route-list
 # Remove a route or all routes
 playwright-cli unroute "**/*.jpg"
 playwright-cli unroute
 ```
 ## URL Patterns
 ```
 **/api/users           - Exact path match
 **/api/*/details       - Wildcard in path
 **/*.{png,jpg,jpeg}    - Match file extensions
 **/search?q=*          - Match query parameters
 ```
 ## Advanced Mocking with run-code
 For conditional responses, request body inspection, response modification, or delays:
 ### Conditional Response Based on Request
 ```bash
 playwright-cli run-code "async page => {
  await page.route('**/api/login', route => {
    const body = route.request().postDataJSON();
    if (body.username === 'admin') {
      route.fulfill({ body: JSON.stringify({ token: 'mock-token' }) });
    } else {
      route.fulfill({ status: 401, body: JSON.stringify({ error: 'Invalid' }) });
    }
  });
 }"
 ```
 ### Modify Real Response
 ```bash
 playwright-cli run-code "async page => {
  await page.route('**/api/user', async route => {
    const response = await route.fetch();
    const json = await response.json();
    json.isPremium = true;
    await route.fulfill({ response, json });
  });
 }"
 ```
 ### Simulate Network Failures
 ```bash
 playwright-cli run-code "async page => {
  await page.route('**/api/offline', route => route.abort('internetdisconnected'));
 }"
 # Options: connectionrefused, timedout, connectionreset, internetdisconnected
 ```
 ### Delayed Response
 ```bash
 playwright-cli run-code "async page => {
  await page.route('**/api/slow', async route => {
    await new Promise(r => setTimeout(r, 3000));
    route.fulfill({ body: JSON.stringify({ data: 'loaded' }) });
  });
 }"
 ```
--- a/.claude/skills/playwright-cli/references/running-code.md
+++ b/.claude/skills/playwright-cli/references/running-code.md
@@ -0,0 +1,232 @@
 # Running Custom Playwright Code
 Use `run-code` to execute arbitrary Playwright code for advanced scenarios not covered by CLI commands.
 ## Syntax
 ```bash
 playwright-cli run-code "async page => {
  // Your Playwright code here
  // Access page.context() for browser context operations
 }"
 ```
 ## Geolocation
 ```bash
 # Grant geolocation permission and set location
 playwright-cli run-code "async page => {
  await page.context().grantPermissions(['geolocation']);
  await page.context().setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
 }"
 # Set location to London
 playwright-cli run-code "async page => {
  await page.context().grantPermissions(['geolocation']);
  await page.context().setGeolocation({ latitude: 51.5074, longitude: -0.1278 });
 }"
 # Clear geolocation override
 playwright-cli run-code "async page => {
  await page.context().clearPermissions();
 }"
 ```
 ## Permissions
 ```bash
 # Grant multiple permissions
 playwright-cli run-code "async page => {
  await page.context().grantPermissions([
    'geolocation',
    'notifications',
    'camera',
    'microphone'
  ]);
 }"
 # Grant permissions for specific origin
 playwright-cli run-code "async page => {
  await page.context().grantPermissions(['clipboard-read'], {
    origin: 'https://example.com'
  });
 }"
 ```
 ## Media Emulation
 ```bash
 # Emulate dark color scheme
 playwright-cli run-code "async page => {
  await page.emulateMedia({ colorScheme: 'dark' });
 }"
 # Emulate light color scheme
 playwright-cli run-code "async page => {
  await page.emulateMedia({ colorScheme: 'light' });
 }"
 # Emulate reduced motion
 playwright-cli run-code "async page => {
  await page.emulateMedia({ reducedMotion: 'reduce' });
 }"
 # Emulate print media
 playwright-cli run-code "async page => {
  await page.emulateMedia({ media: 'print' });
 }"
 ```
 ## Wait Strategies
 ```bash
 # Wait for network idle
 playwright-cli run-code "async page => {
  await page.waitForLoadState('networkidle');
 }"
 # Wait for specific element
 playwright-cli run-code "async page => {
  await page.waitForSelector('.loading', { state: 'hidden' });
 }"
 # Wait for function to return true
 playwright-cli run-code "async page => {
  await page.waitForFunction(() => window.appReady === true);
 }"
 # Wait with timeout
 playwright-cli run-code "async page => {
  await page.waitForSelector('.result', { timeout: 10000 });
 }"
 ```
 ## Frames and Iframes
 ```bash
 # Work with iframe
 playwright-cli run-code "async page => {
  const frame = page.locator('iframe#my-iframe').contentFrame();
  await frame.locator('button').click();
 }"
 # Get all frames
 playwright-cli run-code "async page => {
  const frames = page.frames();
  return frames.map(f => f.url());
 }"
 ```
 ## File Downloads
 ```bash
 # Handle file download
 playwright-cli run-code "async page => {
  const [download] = await Promise.all([
    page.waitForEvent('download'),
    page.click('a.download-link')
  ]);
  await download.saveAs('./downloaded-file.pdf');
  return download.suggestedFilename();
 }"
 ```
 ## Clipboard
 ```bash
 # Read clipboard (requires permission)
 playwright-cli run-code "async page => {
  await page.context().grantPermissions(['clipboard-read']);
  return await page.evaluate(() => navigator.clipboard.readText());
 }"
 # Write to clipboard
 playwright-cli run-code "async page => {
  await page.evaluate(text => navigator.clipboard.writeText(text), 'Hello clipboard!');
 }"
 ```
 ## Page Information
 ```bash
 # Get page title
 playwright-cli run-code "async page => {
  return await page.title();
 }"
 # Get current URL
 playwright-cli run-code "async page => {
  return page.url();
 }"
 # Get page content
 playwright-cli run-code "async page => {
  return await page.content();
 }"
 # Get viewport size
 playwright-cli run-code "async page => {
  return page.viewportSize();
 }"
 ```
 ## JavaScript Execution
 ```bash
 # Execute JavaScript and return result
 playwright-cli run-code "async page => {
  return await page.evaluate(() => {
    return {
      userAgent: navigator.userAgent,
      language: navigator.language,
      cookiesEnabled: navigator.cookieEnabled
    };
  });
 }"
 # Pass arguments to evaluate
 playwright-cli run-code "async page => {
  const multiplier = 5;
  return await page.evaluate(m => document.querySelectorAll('li').length * m, multiplier);
 }"
 ```
 ## Error Handling
 ```bash
 # Try-catch in run-code
 playwright-cli run-code "async page => {
  try {
    await page.click('.maybe-missing', { timeout: 1000 });
    return 'clicked';
  } catch (e) {
    return 'element not found';
  }
 }"
 ```
 ## Complex Workflows
 ```bash
 # Login and save state
 playwright-cli run-code "async page => {
  await page.goto('https://example.com/login');
  await page.fill('input[name=email]', 'user@example.com');
  await page.fill('input[name=password]', 'secret');
  await page.click('button[type=submit]');
  await page.waitForURL('**/dashboard');
  await page.context().storageState({ path: 'auth.json' });
  return 'Login successful';
 }"
 # Scrape data from multiple pages
 playwright-cli run-code "async page => {
  const results = [];
  for (let i = 1; i <= 3; i++) {
    await page.goto(\`https://example.com/page/\${i}\`);
    const items = await page.locator('.item').allTextContents();
    results.push(...items);
  }
  return results;
 }"
 ```
--- a/.claude/skills/playwright-cli/references/session-management.md
+++ b/.claude/skills/playwright-cli/references/session-management.md
@@ -0,0 +1,169 @@
 # Browser Session Management
 Run multiple isolated browser sessions concurrently with state persistence.
 ## Named Browser Sessions
 Use `-b` flag to isolate browser contexts:
 ```bash
 # Browser 1: Authentication flow
 playwright-cli -s=auth open https://app.example.com/login
 # Browser 2: Public browsing (separate cookies, storage)
 playwright-cli -s=public open https://example.com
 # Commands are isolated by browser session
 playwright-cli -s=auth fill e1 "user@example.com"
 playwright-cli -s=public snapshot
 ```
 ## Browser Session Isolation Properties
 Each browser session has independent:
 - Cookies
 - LocalStorage / SessionStorage
 - IndexedDB
 - Cache
 - Browsing history
 - Open tabs
 ## Browser Session Commands
 ```bash
 # List all browser sessions
 playwright-cli list
 # Stop a browser session (close the browser)
 playwright-cli close                # stop the default browser
 playwright-cli -s=mysession close   # stop a named browser
 # Stop all browser sessions
 playwright-cli close-all
 # Forcefully kill all daemon processes (for stale/zombie processes)
 playwright-cli kill-all
 # Delete browser session user data (profile directory)
 playwright-cli delete-data                # delete default browser data
 playwright-cli -s=mysession delete-data   # delete named browser data
 ```
 ## Environment Variable
 Set a default browser session name via environment variable:
 ```bash
 export PLAYWRIGHT_CLI_SESSION="mysession"
 playwright-cli open example.com  # Uses "mysession" automatically
 ```
 ## Common Patterns
 ### Concurrent Scraping
 ```bash
 #!/bin/bash
 # Scrape multiple sites concurrently
 # Start all browsers
 playwright-cli -s=site1 open https://site1.com &
 playwright-cli -s=site2 open https://site2.com &
 playwright-cli -s=site3 open https://site3.com &
 wait
 # Take snapshots from each
 playwright-cli -s=site1 snapshot
 playwright-cli -s=site2 snapshot
 playwright-cli -s=site3 snapshot
 # Cleanup
 playwright-cli close-all
 ```
 ### A/B Testing Sessions
 ```bash
 # Test different user experiences
 playwright-cli -s=variant-a open "https://app.com?variant=a"
 playwright-cli -s=variant-b open "https://app.com?variant=b"
 # Compare
 playwright-cli -s=variant-a screenshot
 playwright-cli -s=variant-b screenshot
 ```
 ### Persistent Profile
 By default, browser profile is kept in memory only. Use `--persistent` flag on `open` to persist the browser profile to disk:
 ```bash
 # Use persistent profile (auto-generated location)
 playwright-cli open https://example.com --persistent
 # Use persistent profile with custom directory
 playwright-cli open https://example.com --profile=/path/to/profile
 ```
 ## Default Browser Session
 When `-s` is omitted, commands use the default browser session:
 ```bash
 # These use the same default browser session
 playwright-cli open https://example.com
 playwright-cli snapshot
 playwright-cli close  # Stops default browser
 ```
 ## Browser Session Configuration
 Configure a browser session with specific settings when opening:
 ```bash
 # Open with config file
 playwright-cli open https://example.com --config=.playwright/my-cli.json
 # Open with specific browser
 playwright-cli open https://example.com --browser=firefox
 # Open in headed mode
 playwright-cli open https://example.com --headed
 # Open with persistent profile
 playwright-cli open https://example.com --persistent
 ```
 ## Best Practices
 ### 1. Name Browser Sessions Semantically
 ```bash
 # GOOD: Clear purpose
 playwright-cli -s=github-auth open https://github.com
 playwright-cli -s=docs-scrape open https://docs.example.com
 # AVOID: Generic names
 playwright-cli -s=s1 open https://github.com
 ```
 ### 2. Always Clean Up
 ```bash
 # Stop browsers when done
 playwright-cli -s=auth close
 playwright-cli -s=scrape close
 # Or stop all at once
 playwright-cli close-all
 # If browsers become unresponsive or zombie processes remain
 playwright-cli kill-all
 ```
 ### 3. Delete Stale Browser Data
 ```bash
 # Remove old browser data to free disk space
 playwright-cli -s=oldsession delete-data
 ```
--- a/.claude/skills/playwright-cli/references/storage-state.md
+++ b/.claude/skills/playwright-cli/references/storage-state.md
@@ -0,0 +1,275 @@
 # Storage Management
 Manage cookies, localStorage, sessionStorage, and browser storage state.
 ## Storage State
 Save and restore complete browser state including cookies and storage.
 ### Save Storage State
 ```bash
 # Save to auto-generated filename (storage-state-{timestamp}.json)
 playwright-cli state-save
 # Save to specific filename
 playwright-cli state-save my-auth-state.json
 ```
 ### Restore Storage State
 ```bash
 # Load storage state from file
 playwright-cli state-load my-auth-state.json
 # Reload page to apply cookies
 playwright-cli open https://example.com
 ```
 ### Storage State File Format
 The saved file contains:
 ```json
 {
  "cookies": [
    {
      "name": "session_id",
      "value": "abc123",
      "domain": "example.com",
      "path": "/",
      "expires": 1735689600,
      "httpOnly": true,
      "secure": true,
      "sameSite": "Lax"
    }
  ],
  "origins": [
    {
      "origin": "https://example.com",
      "localStorage": [
        { "name": "theme", "value": "dark" },
        { "name": "user_id", "value": "12345" }
      ]
    }
  ]
 }
 ```
 ## Cookies
 ### List All Cookies
 ```bash
 playwright-cli cookie-list
 ```
 ### Filter Cookies by Domain
 ```bash
 playwright-cli cookie-list --domain=example.com
 ```
 ### Filter Cookies by Path
 ```bash
 playwright-cli cookie-list --path=/api
 ```
 ### Get Specific Cookie
 ```bash
 playwright-cli cookie-get session_id
 ```
 ### Set a Cookie
 ```bash
 # Basic cookie
 playwright-cli cookie-set session abc123
 # Cookie with options
 playwright-cli cookie-set session abc123 --domain=example.com --path=/ --httpOnly --secure --sameSite=Lax
 # Cookie with expiration (Unix timestamp)
 playwright-cli cookie-set remember_me token123 --expires=1735689600
 ```
 ### Delete a Cookie
 ```bash
 playwright-cli cookie-delete session_id
 ```
 ### Clear All Cookies
 ```bash
 playwright-cli cookie-clear
 ```
 ### Advanced: Multiple Cookies or Custom Options
 For complex scenarios like adding multiple cookies at once, use `run-code`:
 ```bash
 playwright-cli run-code "async page => {
  await page.context().addCookies([
    { name: 'session_id', value: 'sess_abc123', domain: 'example.com', path: '/', httpOnly: true },
    { name: 'preferences', value: JSON.stringify({ theme: 'dark' }), domain: 'example.com', path: '/' }
  ]);
 }"
 ```
 ## Local Storage
 ### List All localStorage Items
 ```bash
 playwright-cli localstorage-list
 ```
 ### Get Single Value
 ```bash
 playwright-cli localstorage-get token
 ```
 ### Set Value
 ```bash
 playwright-cli localstorage-set theme dark
 ```
 ### Set JSON Value
 ```bash
 playwright-cli localstorage-set user_settings '{"theme":"dark","language":"en"}'
 ```
 ### Delete Single Item
 ```bash
 playwright-cli localstorage-delete token
 ```
 ### Clear All localStorage
 ```bash
 playwright-cli localstorage-clear
 ```
 ### Advanced: Multiple Operations
 For complex scenarios like setting multiple values at once, use `run-code`:
 ```bash
 playwright-cli run-code "async page => {
  await page.evaluate(() => {
    localStorage.setItem('token', 'jwt_abc123');
    localStorage.setItem('user_id', '12345');
    localStorage.setItem('expires_at', Date.now() + 3600000);
  });
 }"
 ```
 ## Session Storage
 ### List All sessionStorage Items
 ```bash
 playwright-cli sessionstorage-list
 ```
 ### Get Single Value
 ```bash
 playwright-cli sessionstorage-get form_data
 ```
 ### Set Value
 ```bash
 playwright-cli sessionstorage-set step 3
 ```
 ### Delete Single Item
 ```bash
 playwright-cli sessionstorage-delete step
 ```
 ### Clear sessionStorage
 ```bash
 playwright-cli sessionstorage-clear
 ```
 ## IndexedDB
 ### List Databases
 ```bash
 playwright-cli run-code "async page => {
  return await page.evaluate(async () => {
    const databases = await indexedDB.databases();
    return databases;
  });
 }"
 ```
 ### Delete Database
 ```bash
 playwright-cli run-code "async page => {
  await page.evaluate(() => {
    indexedDB.deleteDatabase('myDatabase');
  });
 }"
 ```
 ## Common Patterns
 ### Authentication State Reuse
 ```bash
 # Step 1: Login and save state
 playwright-cli open https://app.example.com/login
 playwright-cli snapshot
 playwright-cli fill e1 "user@example.com"
 playwright-cli fill e2 "password123"
 playwright-cli click e3
 # Save the authenticated state
 playwright-cli state-save auth.json
 # Step 2: Later, restore state and skip login
 playwright-cli state-load auth.json
 playwright-cli open https://app.example.com/dashboard
 # Already logged in!
 ```
 ### Save and Restore Roundtrip
 ```bash
 # Set up authentication state
 playwright-cli open https://example.com
 playwright-cli eval "() => { document.cookie = 'session=abc123'; localStorage.setItem('user', 'john'); }"
 # Save state to file
 playwright-cli state-save my-session.json
 # ... later, in a new session ...
 # Restore state
 playwright-cli state-load my-session.json
 playwright-cli open https://example.com
 # Cookies and localStorage are restored!
 ```
 ## Security Notes
 - Never commit storage state files containing auth tokens
 - Add `*.auth-state.json` to `.gitignore`
 - Delete state files after automation completes
 - Use environment variables for sensitive data
 - By default, sessions run in-memory mode which is safer for sensitive operations
--- a/.claude/skills/playwright-cli/references/test-generation.md
+++ b/.claude/skills/playwright-cli/references/test-generation.md
@@ -0,0 +1,88 @@
 # Test Generation
 Generate Playwright test code automatically as you interact with the browser.
 ## How It Works
 Every action you perform with `playwright-cli` generates corresponding Playwright TypeScript code.
 This code appears in the output and can be copied directly into your test files.
 ## Example Workflow
 ```bash
 # Start a session
 playwright-cli open https://example.com/login
 # Take a snapshot to see elements
 playwright-cli snapshot
 # Output shows: e1 [textbox "Email"], e2 [textbox "Password"], e3 [button "Sign In"]
 # Fill form fields - generates code automatically
 playwright-cli fill e1 "user@example.com"
 # Ran Playwright code:
 # await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
 playwright-cli fill e2 "password123"
 # Ran Playwright code:
 # await page.getByRole('textbox', { name: 'Password' }).fill('password123');
 playwright-cli click e3
 # Ran Playwright code:
 # await page.getByRole('button', { name: 'Sign In' }).click();
 ```
 ## Building a Test File
 Collect the generated code into a Playwright test:
 ```typescript
 import { test, expect } from '@playwright/test';
 test('login flow', async ({ page }) => {
  // Generated code from playwright-cli session:
  await page.goto('https://example.com/login');
  await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
  await page.getByRole('textbox', { name: 'Password' }).fill('password123');
  await page.getByRole('button', { name: 'Sign In' }).click();
  // Add assertions
  await expect(page).toHaveURL(/.*dashboard/);
 });
 ```
 ## Best Practices
 ### 1. Use Semantic Locators
 The generated code uses role-based locators when possible, which are more resilient:
 ```typescript
 // Generated (good - semantic)
 await page.getByRole('button', { name: 'Submit' }).click();
 // Avoid (fragile - CSS selectors)
 await page.locator('#submit-btn').click();
 ```
 ### 2. Explore Before Recording
 Take snapshots to understand the page structure before recording actions:
 ```bash
 playwright-cli open https://example.com
 playwright-cli snapshot
 # Review the element structure
 playwright-cli click e5
 ```
 ### 3. Add Assertions Manually
 Generated code captures actions but not assertions. Add expectations in your test:
 ```typescript
 // Generated action
 await page.getByRole('button', { name: 'Submit' }).click();
 // Manual assertion
 await expect(page.getByText('Success')).toBeVisible();
 ```
--- a/.claude/skills/playwright-cli/references/tracing.md
+++ b/.claude/skills/playwright-cli/references/tracing.md
@@ -0,0 +1,139 @@
 # Tracing
 Capture detailed execution traces for debugging and analysis. Traces include DOM snapshots, screenshots, network activity, and console logs.
 ## Basic Usage
 ```bash
 # Start trace recording
 playwright-cli tracing-start
 # Perform actions
 playwright-cli open https://example.com
 playwright-cli click e1
 playwright-cli fill e2 "test"
 # Stop trace recording
 playwright-cli tracing-stop
 ```
 ## Trace Output Files
 When you start tracing, Playwright creates a `traces/` directory with several files:
 ### `trace-{timestamp}.trace`
 **Action log** - The main trace file containing:
 - Every action performed (clicks, fills, navigations)
 - DOM snapshots before and after each action
 - Screenshots at each step
 - Timing information
 - Console messages
 - Source locations
 ### `trace-{timestamp}.network`
 **Network log** - Complete network activity:
 - All HTTP requests and responses
 - Request headers and bodies
 - Response headers and bodies
 - Timing (DNS, connect, TLS, TTFB, download)
 - Resource sizes
 - Failed requests and errors
 ### `resources/`
 **Resources directory** - Cached resources:
 - Images, fonts, stylesheets, scripts
 - Response bodies for replay
 - Assets needed to reconstruct page state
 ## What Traces Capture
 | Category | Details |
 |----------|---------|
 | **Actions** | Clicks, fills, hovers, keyboard input, navigations |
 | **DOM** | Full DOM snapshot before/after each action |
 | **Screenshots** | Visual state at each step |
 | **Network** | All requests, responses, headers, bodies, timing |
 | **Console** | All console.log, warn, error messages |
 | **Timing** | Precise timing for each operation |
 ## Use Cases
 ### Debugging Failed Actions
 ```bash
 playwright-cli tracing-start
 playwright-cli open https://app.example.com
 # This click fails - why?
 playwright-cli click e5
 playwright-cli tracing-stop
 # Open trace to see DOM state when click was attempted
 ```
 ### Analyzing Performance
 ```bash
 playwright-cli tracing-start
 playwright-cli open https://slow-site.com
 playwright-cli tracing-stop
 # View network waterfall to identify slow resources
 ```
 ### Capturing Evidence
 ```bash
 # Record a complete user flow for documentation
 playwright-cli tracing-start
 playwright-cli open https://app.example.com/checkout
 playwright-cli fill e1 "4111111111111111"
 playwright-cli fill e2 "12/25"
 playwright-cli fill e3 "123"
 playwright-cli click e4
 playwright-cli tracing-stop
 # Trace shows exact sequence of events
 ```
 ## Trace vs Video vs Screenshot
 | Feature | Trace | Video | Screenshot |
 |---------|-------|-------|------------|
 | **Format** | .trace file | .webm video | .png/.jpeg image |
 | **DOM inspection** | Yes | No | No |
 | **Network details** | Yes | No | No |
 | **Step-by-step replay** | Yes | Continuous | Single frame |
 | **File size** | Medium | Large | Small |
 | **Best for** | Debugging | Demos | Quick capture |
 ## Best Practices
 ### 1. Start Tracing Before the Problem
 ```bash
 # Trace the entire flow, not just the failing step
 playwright-cli tracing-start
 playwright-cli open https://example.com
 # ... all steps leading to the issue ...
 playwright-cli tracing-stop
 ```
 ### 2. Clean Up Old Traces
 Traces can consume significant disk space:
 ```bash
 # Remove traces older than 7 days
 find .playwright-cli/traces -mtime +7 -delete
 ```
 ## Limitations
 - Traces add overhead to automation
 - Large traces can consume significant disk space
 - Some dynamic content may not replay perfectly
--- a/.claude/skills/playwright-cli/references/video-recording.md
+++ b/.claude/skills/playwright-cli/references/video-recording.md
@@ -0,0 +1,43 @@
 # Video Recording
 Capture browser automation sessions as video for debugging, documentation, or verification. Produces WebM (VP8/VP9 codec).
 ## Basic Recording
 ```bash
 # Start recording
 playwright-cli video-start
 # Perform actions
 playwright-cli open https://example.com
 playwright-cli snapshot
 playwright-cli click e1
 playwright-cli fill e2 "test input"
 # Stop and save
 playwright-cli video-stop demo.webm
 ```
 ## Best Practices
 ### 1. Use Descriptive Filenames
 ```bash
 # Include context in filename
 playwright-cli video-stop recordings/login-flow-2024-01-15.webm
 playwright-cli video-stop recordings/checkout-test-run-42.webm
 ```
 ## Tracing vs Video
 | Feature | Video | Tracing |
 |---------|-------|---------|
 | Output | WebM file | Trace file (viewable in Trace Viewer) |
 | Shows | Visual recording | DOM snapshots, network, console, actions |
 | Use case | Demos, documentation | Debugging, analysis |
 | Size | Larger | Smaller |
 ## Limitations
 - Recording adds slight overhead to automation
 - Large recordings can consume significant disk space
--- a/.claude/templates/coding_prompt.template.md
+++ b/.claude/templates/coding_prompt.template.md
@@ -86,24 +86,33 @@ Implement the chosen feature thoroughly:
 **CRITICAL:** You MUST verify features through the actual UI.
-Use browser automation tools:
+Use `playwright-cli` for browser automation:
- Navigate to the app in a real browser
+- Open the browser: `playwright-cli open http://localhost:PORT`
- Interact like a human user (click, type, scroll)
+- Take a snapshot to see page elements: `playwright-cli snapshot`
- Take screenshots at each step (use inline screenshots only -- do NOT save screenshot files to disk)
+- Read the snapshot YAML file to see element refs
- Verify both functionality AND visual appearance
+- Click elements by ref: `playwright-cli click e5`
 - Type text: `playwright-cli type "search query"`
 - Fill form fields: `playwright-cli fill e3 "value"`
 - Take screenshots: `playwright-cli screenshot`
 - Read the screenshot file to verify visual appearance
 - Check console errors: `playwright-cli console`
 - Close browser when done: `playwright-cli close`
 **Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
 to `.playwright-cli/`. You will see a file link in the output. Read the file only
 when you need to verify visual appearance or find element refs.
 **DO:**
 - Test through the UI with clicks and keyboard input
- Take screenshots to verify visual appearance (inline only, never save to disk)
+- Take screenshots and read them to verify visual appearance
- Check for console errors in browser
+- Check for console errors with `playwright-cli console`
 - Verify complete user workflows end-to-end
 - Always run `playwright-cli close` when finished testing
 **DON'T:**
-
+- Only test with curl commands
- Only test with curl commands (backend testing alone is insufficient)
+- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
 - Use JavaScript evaluation to bypass UI (no shortcuts)
 - Skip visual verification
 - Mark tests passing without thorough verification
@@ -145,7 +154,7 @@ Use the feature_mark_passing tool with feature_id=42
 - Combine or consolidate features
 - Reorder features
-**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**
+**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**
 ### STEP 7: COMMIT YOUR PROGRESS
@@ -192,11 +201,15 @@ Before context fills up:
 ## BROWSER AUTOMATION
-Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in.
+Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
 `snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.
-**Screenshot rule:** Always use inline mode (base64). NEVER save screenshots as files to disk.
+**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
 subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
 save to `.playwright-cli/` -- read the files when you need to verify content.
-Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation.
+Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
 JS errors. Don't bypass UI with JavaScript evaluation.
 ---
--- a/.claude/templates/testing_prompt.template.md
+++ b/.claude/templates/testing_prompt.template.md
@@ -31,26 +31,32 @@ For the feature returned:
 1. Read and understand the feature's verification steps
 2. Navigate to the relevant part of the application
 3. Execute each verification step using browser automation
-4. Take screenshots to document the verification (inline only -- do NOT save to disk)
+4. Take screenshots and read them to verify visual appearance
 5. Check for console errors
-Use browser automation tools:
+### Browser Automation (Playwright CLI)
 **Navigation & Screenshots:**
- browser_navigate - Navigate to a URL
+- `playwright-cli open <url>` - Open browser and navigate
- browser_take_screenshot - Capture screenshot (inline mode only -- never save to disk)
+- `playwright-cli goto <url>` - Navigate to URL
- browser_snapshot - Get accessibility tree snapshot
+- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
 - `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`
 **Element Interaction:**
- browser_click - Click elements
+- `playwright-cli click <ref>` - Click elements (ref from snapshot)
- browser_type - Type text into editable elements
+- `playwright-cli type <text>` - Type text
- browser_fill_form - Fill multiple form fields
+- `playwright-cli fill <ref> <text>` - Fill form fields
- browser_select_option - Select dropdown options
+- `playwright-cli select <ref> <val>` - Select dropdown
- browser_press_key - Press keyboard keys
+- `playwright-cli press <key>` - Keyboard input
 **Debugging:**
- browser_console_messages - Get browser console output (check for errors)
+- `playwright-cli console` - Check for JS errors
- browser_network_requests - Monitor API calls
+- `playwright-cli network` - Monitor API calls
 **Cleanup:**
 - `playwright-cli close` - Close browser when done (ALWAYS do this)
 **Note:** Screenshots and snapshots save to files. Read the file to see the content.
 ### STEP 3: HANDLE RESULTS
@@ -79,7 +85,7 @@ A regression has been introduced. You MUST fix it:
 4. **Verify the fix:**
   - Run through all verification steps again
-   - Take screenshots confirming the fix (inline only, never save to disk)
+   - Take screenshots and read them to confirm the fix
 5. **Mark as passing after fix:**
   ```
@@ -98,7 +104,7 @@ A regression has been introduced. You MUST fix it:
 ---
-## AVAILABLE MCP TOOLS
+## AVAILABLE TOOLS
 ### Feature Management
 - `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
@@ -106,19 +112,17 @@ A regression has been introduced. You MUST fix it:
 - `feature_mark_failing` - Mark a feature as failing (when you find a regression)
 - `feature_mark_passing` - Mark a feature as passing (after fixing a regression)
-### Browser Automation (Playwright)
+### Browser Automation (Playwright CLI)
-All interaction tools have **built-in auto-wait** -- no manual timeouts needed.
+Use `playwright-cli` commands for browser interaction. Key commands:
-
+- `playwright-cli open <url>` - Open browser
- `browser_navigate` - Navigate to URL
+- `playwright-cli goto <url>` - Navigate to URL
- `browser_take_screenshot` - Capture screenshot (inline only, never save to disk)
+- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
- `browser_snapshot` - Get accessibility tree
+- `playwright-cli snapshot` - Get page snapshot with element refs
- `browser_click` - Click elements
+- `playwright-cli click <ref>` - Click element
- `browser_type` - Type text
+- `playwright-cli type <text>` - Type text
- `browser_fill_form` - Fill form fields
+- `playwright-cli fill <ref> <text>` - Fill form field
- `browser_select_option` - Select dropdown
+- `playwright-cli console` - Check for JS errors
- `browser_press_key` - Keyboard input
+- `playwright-cli close` - Close browser (always do this when done)
 - `browser_console_messages` - Check for JS errors
 - `browser_network_requests` - Monitor API calls
 ---
--- a/.env.example
+++ b/.env.example
@@ -30,11 +30,18 @@
 # ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022
 # ===================
-# Alternative API Providers (GLM, Ollama, Kimi, Custom)
+# Alternative API Providers (Azure, GLM, Ollama, Kimi, Custom)
 # ===================
 # Configure via Settings UI (recommended) or set env vars below.
 # When both are set, env vars take precedence.
 #
 # Azure Anthropic (Claude):
 # ANTHROPIC_BASE_URL=https://your-resource.services.ai.azure.com/anthropic
 # ANTHROPIC_API_KEY=your-azure-api-key
 # ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
 # ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5
 # ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5
 #
 # GLM (Zhipu AI):
 # ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
 # ANTHROPIC_AUTH_TOKEN=your-glm-api-key
--- a/.gitignore
+++ b/.gitignore
@@ -10,6 +10,10 @@ issues/
 # Browser profiles for parallel agent execution
 .browser-profiles/
 # Playwright CLI daemon artifacts
 .playwright-cli/
 .playwright/
 # Log files
 logs/
 *.log
--- a/.npmignore
+++ b/.npmignore
@@ -28,5 +28,4 @@ start.sh
 start_ui.sh
 start_ui.py
 .claude/agents/
 .claude/skills/
 .claude/settings.json
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -65,7 +65,7 @@ python autonomous_agent_demo.py --project-dir my-app --yolo
 # Parallel mode: run multiple agents concurrently (1-5 agents)
 python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3
-# Batch mode: implement multiple features per agent session (1-3)
+# Batch mode: implement multiple features per agent session (1-15)
 python autonomous_agent_demo.py --project-dir my-app --batch-size 3
 # Batch specific features by ID
@@ -85,7 +85,7 @@ python autonomous_agent_demo.py --project-dir my-app --yolo
 **What's different in YOLO mode:**
 - No regression testing
- No Playwright MCP server (browser automation disabled)
+- No Playwright CLI (browser automation disabled)
 - Features marked passing after lint/type-check succeeds
 - Faster iteration for prototyping
@@ -163,7 +163,7 @@ Publishing: `npm publish` (triggers `prepublishOnly` which builds UI, then publi
 - `autonomous_agent_demo.py` - Entry point for running the agent (supports `--yolo`, `--parallel`, `--batch-size`, `--batch-features`)
 - `autoforge_paths.py` - Central path resolution with dual-path backward compatibility and migration
 - `agent.py` - Agent session loop using Claude Agent SDK
- `client.py` - ClaudeSDKClient configuration with security hooks, MCP servers, and Vertex AI support
+- `client.py` - ClaudeSDKClient configuration with security hooks, feature MCP server, and Vertex AI support
 - `security.py` - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
 - `prompts.py` - Prompt template loading with project-specific fallback and batch feature prompts
 - `progress.py` - Progress tracking, database queries, webhook notifications
@@ -288,6 +288,9 @@ Projects can be stored in any directory (registered in `~/.autoforge/registry.db
 - `.autoforge/.agent.lock` - Lock file to prevent multiple agent instances
 - `.autoforge/allowed_commands.yaml` - Project-specific bash command allowlist (optional)
 - `.autoforge/.gitignore` - Ignores runtime files
 - `.claude/skills/playwright-cli/` - Playwright CLI skill for browser automation
 - `.playwright/cli.config.json` - Browser configuration (headless, viewport, etc.)
 - `.playwright-cli/` - Playwright CLI daemon artifacts (screenshots, snapshots) - gitignored
 - `CLAUDE.md` - Stays at project root (SDK convention)
 - `app_spec.txt` - Root copy for agent template compatibility
@@ -445,6 +448,7 @@ Alternative providers are configured via the **Settings UI** (gear icon > API Pr
 **Skills** (`.claude/skills/`):
 - `frontend-design` - Distinctive, production-grade UI design
 - `gsd-to-autoforge-spec` - Convert GSD codebase mapping to AutoForge app_spec format
 - `playwright-cli` - Browser automation via Playwright CLI (copied to each project)
 **Other:**
 - `.claude/templates/` - Prompt templates copied to new projects
@@ -479,7 +483,7 @@ When running with `--parallel`, the orchestrator:
 1. Spawns multiple Claude agents as subprocesses (up to `--max-concurrency`)
 2. Each agent claims features atomically via `feature_claim_and_get`
 3. Features blocked by unmet dependencies are skipped
-4. Browser contexts are isolated per agent using `--isolated` flag
+4. Browser sessions are isolated per agent via `PLAYWRIGHT_CLI_SESSION` environment variable
 5. AgentTracker parses output and emits `agent_update` messages for UI
 ### Process Limits (Parallel Mode)
@@ -492,9 +496,9 @@ The orchestrator enforces strict bounds on concurrent processes:
 ### Multi-Feature Batching
-Agents can implement multiple features per session using `--batch-size` (1-3, default: 3):
+Agents can implement multiple features per session using `--batch-size` (1-15, default: 3):
 - `--batch-size N` - Max features per coding agent batch
- `--testing-batch-size N` - Features per testing batch (1-5, default: 3)
+- `--testing-batch-size N` - Features per testing batch (1-15, default: 3)
 - `--batch-features 1,2,3` - Specific feature IDs for batch implementation
 - `--testing-batch-features 1,2,3` - Specific feature IDs for batch regression testing
 - `prompts.py` provides `get_batch_feature_prompt()` for multi-feature prompt generation
--- a/VISION.md
+++ b/VISION.md
@@ -0,0 +1,22 @@
 # VISION
 This document defines the mandatory project vision for AutoForge. All contributions must align with these principles. PRs that deviate from this vision will be rejected. This file itself is immutable via PR — any PR that modifies VISION.md will be rejected outright.
 ## Claude Agent SDK Exclusivity
 AutoForge is a wrapper around the **Claude Agent SDK**. This is a foundational architectural decision, not a preference.
 **What this means:**
 - AutoForge only supports providers, models, and integrations that work through the Claude Agent SDK.
 - We will not integrate with, accommodate, or add support for other AI SDKs, CLIs, or coding agent platforms (e.g., Codex, OpenCode, Aider, Continue, Cursor agents, or similar tools).
 **Why:**
 Each platform has its own approach to MCP tools, skills, context management, and feature integration. Attempting to support multiple agent frameworks creates an unsustainable maintenance burden and dilutes the quality of the core experience. By committing to the Claude Agent SDK exclusively, we can build deep, reliable integration rather than shallow compatibility across many targets.
 **In practice:**
 - PRs adding support for non-Claude agent frameworks will be rejected.
 - PRs introducing abstractions designed to make AutoForge "agent-agnostic" will be rejected.
 - Alternative API providers (e.g., Vertex AI, AWS Bedrock) are acceptable only when accessed through the Claude Agent SDK's own configuration.
--- a/agent.py
+++ b/agent.py
@@ -74,46 +74,65 @@ async def run_agent_session(
        await client.query(message)
        # Collect response text and show tool use
        # Retry receive_response() on MessageParseError — the SDK raises this for
        # unknown CLI message types (e.g. "rate_limit_event") which kills the async
        # generator.  The subprocess is still alive so we restart to read remaining
        # messages from the buffered channel.
        response_text = ""
-        async for msg in client.receive_response():
+        max_parse_retries = 50
-            msg_type = type(msg).__name__
+        parse_retries = 0
        while True:
            try:
                async for msg in client.receive_response():
                    msg_type = type(msg).__name__
-            # Handle AssistantMessage (text and tool use)
+                    # Handle AssistantMessage (text and tool use)
-            if msg_type == "AssistantMessage" and hasattr(msg, "content"):
+                    if msg_type == "AssistantMessage" and hasattr(msg, "content"):
-                for block in msg.content:
+                        for block in msg.content:
-                    block_type = type(block).__name__
+                            block_type = type(block).__name__
-                    if block_type == "TextBlock" and hasattr(block, "text"):
+                            if block_type == "TextBlock" and hasattr(block, "text"):
-                        response_text += block.text
+                                response_text += block.text
-                        print(block.text, end="", flush=True)
+                                print(block.text, end="", flush=True)
-                    elif block_type == "ToolUseBlock" and hasattr(block, "name"):
+                            elif block_type == "ToolUseBlock" and hasattr(block, "name"):
-                        print(f"\n[Tool: {block.name}]", flush=True)
+                                print(f"\n[Tool: {block.name}]", flush=True)
-                        if hasattr(block, "input"):
+                                if hasattr(block, "input"):
-                            input_str = str(block.input)
+                                    input_str = str(block.input)
-                            if len(input_str) > 200:
+                                    if len(input_str) > 200:
-                                print(f"   Input: {input_str[:200]}...", flush=True)
+                                        print(f"   Input: {input_str[:200]}...", flush=True)
-                            else:
+                                    else:
-                                print(f"   Input: {input_str}", flush=True)
+                                        print(f"   Input: {input_str}", flush=True)
-            # Handle UserMessage (tool results)
+                    # Handle UserMessage (tool results)
-            elif msg_type == "UserMessage" and hasattr(msg, "content"):
+                    elif msg_type == "UserMessage" and hasattr(msg, "content"):
-                for block in msg.content:
+                        for block in msg.content:
-                    block_type = type(block).__name__
+                            block_type = type(block).__name__
-                    if block_type == "ToolResultBlock":
+                            if block_type == "ToolResultBlock":
-                        result_content = getattr(block, "content", "")
+                                result_content = getattr(block, "content", "")
-                        is_error = getattr(block, "is_error", False)
+                                is_error = getattr(block, "is_error", False)
-                        # Check if command was blocked by security hook
+                                # Check if command was blocked by security hook
-                        if "blocked" in str(result_content).lower():
+                                if "blocked" in str(result_content).lower():
-                            print(f"   [BLOCKED] {result_content}", flush=True)
+                                    print(f"   [BLOCKED] {result_content}", flush=True)
-                        elif is_error:
+                                elif is_error:
-                            # Show errors (truncated)
+                                    # Show errors (truncated)
-                            error_str = str(result_content)[:500]
+                                    error_str = str(result_content)[:500]
-                            print(f"   [Error] {error_str}", flush=True)
+                                    print(f"   [Error] {error_str}", flush=True)
-                        else:
+                                else:
-                            # Tool succeeded - just show brief confirmation
+                                    # Tool succeeded - just show brief confirmation
-                            print("   [Done]", flush=True)
+                                    print("   [Done]", flush=True)
                break  # Normal completion
            except Exception as inner_exc:
                if type(inner_exc).__name__ == "MessageParseError":
                    parse_retries += 1
                    if parse_retries > max_parse_retries:
                        print(f"Too many unrecognized CLI messages ({parse_retries}), stopping")
                        break
                    print(f"Ignoring unrecognized message from Claude CLI: {inner_exc}")
                    continue
                raise  # Re-raise to outer except
        print("\n" + "-" * 70 + "\n")
        return "continue", response_text
@@ -222,7 +241,7 @@ async def run_autonomous_agent(
        # Check if all features are already complete (before starting a new session)
        # Skip this check if running as initializer (needs to create features first)
        if not is_initializer and iteration == 1:
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, _nhi = count_passing_tests(project_dir)
            if total > 0 and passing == total:
                print("\n" + "=" * 70)
                print("  ALL FEATURES ALREADY COMPLETE!")
@@ -240,17 +259,7 @@ async def run_autonomous_agent(
        print_session_header(iteration, is_initializer)
        # Create client (fresh context)
-        # Pass agent_id for browser isolation in multi-agent scenarios
+        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type)
        import os
        if agent_type == "testing":
            agent_id = f"testing-{os.getpid()}"  # Unique ID for testing agents
        elif feature_ids and len(feature_ids) > 1:
            agent_id = f"batch-{feature_ids[0]}"
        elif feature_id:
            agent_id = f"feature-{feature_id}"
        else:
            agent_id = None
        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_id=agent_id, agent_type=agent_type)
        # Choose prompt based on agent type
        if agent_type == "initializer":
@@ -358,7 +367,7 @@ async def run_autonomous_agent(
            print_progress_summary(project_dir)
            # Check if all features are complete - exit gracefully if done
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, _nhi = count_passing_tests(project_dir)
            if total > 0 and passing == total:
                print("\n" + "=" * 70)
                print("  ALL FEATURES COMPLETE!")
--- a/api/database.py
+++ b/api/database.py
@@ -43,10 +43,10 @@ class Feature(Base):
    __tablename__ = "features"
-    # Composite index for common status query pattern (passes, in_progress)
+    # Composite index for common status query pattern (passes, in_progress, needs_human_input)
    # Used by feature_get_stats, get_ready_features, and other status queries
    __table_args__ = (
-        Index('ix_feature_status', 'passes', 'in_progress'),
+        Index('ix_feature_status', 'passes', 'in_progress', 'needs_human_input'),
    )
    id = Column(Integer, primary_key=True, index=True)
@@ -61,6 +61,11 @@ class Feature(Base):
    # NULL/empty = no dependencies (backwards compatible)
    dependencies = Column(JSON, nullable=True, default=None)
    # Human input: agent can request structured input from a human
    needs_human_input = Column(Boolean, nullable=False, default=False, index=True)
    human_input_request = Column(JSON, nullable=True, default=None)   # Agent's structured request
    human_input_response = Column(JSON, nullable=True, default=None)  # Human's response
    def to_dict(self) -> dict:
        """Convert feature to dictionary for JSON serialization."""
        return {
@@ -75,6 +80,10 @@ class Feature(Base):
            "in_progress": self.in_progress if self.in_progress is not None else False,
            # Dependencies: NULL/empty treated as empty list for backwards compat
            "dependencies": self.dependencies if self.dependencies else [],
            # Human input fields
            "needs_human_input": self.needs_human_input if self.needs_human_input is not None else False,
            "human_input_request": self.human_input_request,
            "human_input_response": self.human_input_response,
        }
    def get_dependencies_safe(self) -> list[int]:
@@ -302,6 +311,21 @@ def _is_network_path(path: Path) -> bool:
    return False
 def _migrate_add_human_input_columns(engine) -> None:
    """Add human input columns to existing databases that don't have them."""
    with engine.connect() as conn:
        result = conn.execute(text("PRAGMA table_info(features)"))
        columns = [row[1] for row in result.fetchall()]
        if "needs_human_input" not in columns:
            conn.execute(text("ALTER TABLE features ADD COLUMN needs_human_input BOOLEAN DEFAULT 0"))
        if "human_input_request" not in columns:
            conn.execute(text("ALTER TABLE features ADD COLUMN human_input_request TEXT DEFAULT NULL"))
        if "human_input_response" not in columns:
            conn.execute(text("ALTER TABLE features ADD COLUMN human_input_response TEXT DEFAULT NULL"))
        conn.commit()
 def _migrate_add_schedules_tables(engine) -> None:
    """Create schedules and schedule_overrides tables if they don't exist."""
    from sqlalchemy import inspect
@@ -425,6 +449,7 @@ def create_database(project_dir: Path) -> tuple:
    _migrate_fix_null_boolean_fields(engine)
    _migrate_add_dependencies_column(engine)
    _migrate_add_testing_columns(engine)
    _migrate_add_human_input_columns(engine)
    # Migrate to add schedules tables
    _migrate_add_schedules_tables(engine)
--- a/autoforge_paths.py
+++ b/autoforge_paths.py
@@ -39,10 +39,12 @@ assistant.db-wal
 assistant.db-shm
 .agent.lock
 .devserver.lock
 .pause_drain
 .claude_settings.json
 .claude_assistant_settings.json
 .claude_settings.expand.*.json
 .progress_cache
 .migration_version
 """
@@ -145,6 +147,15 @@ def get_claude_assistant_settings_path(project_dir: Path) -> Path:
    return _resolve_path(project_dir, ".claude_assistant_settings.json")
 def get_pause_drain_path(project_dir: Path) -> Path:
    """Return the path to the ``.pause_drain`` signal file.
    This file is created to request a graceful pause (drain mode).
    Always uses the new location since it's a transient signal file.
    """
    return project_dir / ".autoforge" / ".pause_drain"
 def get_progress_cache_path(project_dir: Path) -> Path:
    """Resolve the path to ``.progress_cache``."""
    return _resolve_path(project_dir, ".progress_cache")
--- a/autonomous_agent_demo.py
+++ b/autonomous_agent_demo.py
@@ -176,14 +176,14 @@ Authentication:
        "--testing-batch-size",
        type=int,
        default=3,
-        help="Number of features per testing batch (1-5, default: 3)",
+        help="Number of features per testing batch (1-15, default: 3)",
    )
    parser.add_argument(
        "--batch-size",
        type=int,
        default=3,
-        help="Max features per coding agent batch (1-3, default: 3)",
+        help="Max features per coding agent batch (1-15, default: 3)",
    )
    return parser.parse_args()
@@ -237,6 +237,12 @@ def main() -> None:
    if migrated:
        print(f"Migrated project files to .autoforge/: {', '.join(migrated)}", flush=True)
    # Migrate project to current AutoForge version (idempotent, safe)
    from prompts import migrate_project_to_current
    version_migrated = migrate_project_to_current(project_dir)
    if version_migrated:
        print(f"Upgraded project: {', '.join(version_migrated)}", flush=True)
    # Parse batch testing feature IDs (comma-separated string -> list[int])
    testing_feature_ids: list[int] | None = None
    if args.testing_feature_ids:
--- a/client.py
+++ b/client.py
@@ -21,16 +21,6 @@ from security import SENSITIVE_DIRECTORIES, bash_security_hook
 # Load environment variables from .env file if present
 load_dotenv()
 # Default Playwright headless mode - can be overridden via PLAYWRIGHT_HEADLESS env var
 # When True, browser runs invisibly in background (default - saves CPU)
 # When False, browser window is visible (useful for monitoring agent progress)
 DEFAULT_PLAYWRIGHT_HEADLESS = True
 # Default browser for Playwright - can be overridden via PLAYWRIGHT_BROWSER env var
 # Options: chrome, firefox, webkit, msedge
 # Firefox is recommended for lower CPU usage
 DEFAULT_PLAYWRIGHT_BROWSER = "firefox"
 # Extra read paths for cross-project file access (read-only)
 # Set EXTRA_READ_PATHS environment variable with comma-separated absolute paths
 # Example: EXTRA_READ_PATHS=/Volumes/Data/dev,/Users/shared/libs
@@ -41,6 +31,7 @@ EXTRA_READ_PATHS_VAR = "EXTRA_READ_PATHS"
 # this blocklist and the filesystem browser API share a single source of truth.
 EXTRA_READ_PATHS_BLOCKLIST = SENSITIVE_DIRECTORIES
 def convert_model_for_vertex(model: str) -> str:
    """
    Convert model name format for Vertex AI compatibility.
@@ -72,43 +63,6 @@ def convert_model_for_vertex(model: str) -> str:
    return model
 def get_playwright_headless() -> bool:
    """
    Get the Playwright headless mode setting.
    Reads from PLAYWRIGHT_HEADLESS environment variable, defaults to True.
    Returns True for headless mode (invisible browser), False for visible browser.
    """
    value = os.getenv("PLAYWRIGHT_HEADLESS", str(DEFAULT_PLAYWRIGHT_HEADLESS).lower()).strip().lower()
    truthy = {"true", "1", "yes", "on"}
    falsy = {"false", "0", "no", "off"}
    if value not in truthy | falsy:
        print(f"   - Warning: Invalid PLAYWRIGHT_HEADLESS='{value}', defaulting to {DEFAULT_PLAYWRIGHT_HEADLESS}")
        return DEFAULT_PLAYWRIGHT_HEADLESS
    return value in truthy
 # Valid browsers supported by Playwright MCP
 VALID_PLAYWRIGHT_BROWSERS = {"chrome", "firefox", "webkit", "msedge"}
 def get_playwright_browser() -> str:
    """
    Get the browser to use for Playwright.
    Reads from PLAYWRIGHT_BROWSER environment variable, defaults to firefox.
    Options: chrome, firefox, webkit, msedge
    Firefox is recommended for lower CPU usage.
    """
    value = os.getenv("PLAYWRIGHT_BROWSER", DEFAULT_PLAYWRIGHT_BROWSER).strip().lower()
    if value not in VALID_PLAYWRIGHT_BROWSERS:
        print(f"   - Warning: Invalid PLAYWRIGHT_BROWSER='{value}', "
              f"valid options: {', '.join(sorted(VALID_PLAYWRIGHT_BROWSERS))}. "
              f"Defaulting to {DEFAULT_PLAYWRIGHT_BROWSER}")
        return DEFAULT_PLAYWRIGHT_BROWSER
    return value
 def get_extra_read_paths() -> list[Path]:
    """
    Get extra read-only paths from EXTRA_READ_PATHS environment variable.
@@ -187,7 +141,6 @@ def get_extra_read_paths() -> list[Path]:
 # overhead and preventing agents from calling tools meant for other roles.
 #
 # Tools intentionally omitted from ALL agent lists (UI/orchestrator only):
 #   feature_get_ready, feature_get_blocked, feature_get_graph,
 #   feature_remove_dependency
 #
 # The ghost tool "feature_release_testing" was removed entirely -- it was
@@ -197,6 +150,9 @@ CODING_AGENT_TOOLS = [
    "mcp__features__feature_get_stats",
    "mcp__features__feature_get_by_id",
    "mcp__features__feature_get_summary",
    "mcp__features__feature_get_ready",
    "mcp__features__feature_get_blocked",
    "mcp__features__feature_get_graph",
    "mcp__features__feature_claim_and_get",
    "mcp__features__feature_mark_in_progress",
    "mcp__features__feature_mark_passing",
@@ -209,12 +165,18 @@ TESTING_AGENT_TOOLS = [
    "mcp__features__feature_get_stats",
    "mcp__features__feature_get_by_id",
    "mcp__features__feature_get_summary",
    "mcp__features__feature_get_ready",
    "mcp__features__feature_get_blocked",
    "mcp__features__feature_get_graph",
    "mcp__features__feature_mark_passing",
    "mcp__features__feature_mark_failing",
 ]
 INITIALIZER_AGENT_TOOLS = [
    "mcp__features__feature_get_stats",
    "mcp__features__feature_get_ready",
    "mcp__features__feature_get_blocked",
    "mcp__features__feature_get_graph",
    "mcp__features__feature_create_bulk",
    "mcp__features__feature_create",
    "mcp__features__feature_add_dependency",
@@ -228,41 +190,6 @@ ALL_FEATURE_MCP_TOOLS = sorted(
    set(CODING_AGENT_TOOLS) | set(TESTING_AGENT_TOOLS) | set(INITIALIZER_AGENT_TOOLS)
 )
 # Playwright MCP tools for browser automation.
 # Full set of tools for comprehensive UI testing including drag-and-drop,
 # hover menus, file uploads, tab management, etc.
 PLAYWRIGHT_TOOLS = [
    # Core navigation & screenshots
    "mcp__playwright__browser_navigate",
    "mcp__playwright__browser_navigate_back",
    "mcp__playwright__browser_take_screenshot",
    "mcp__playwright__browser_snapshot",
    # Element interaction
    "mcp__playwright__browser_click",
    "mcp__playwright__browser_type",
    "mcp__playwright__browser_fill_form",
    "mcp__playwright__browser_select_option",
    "mcp__playwright__browser_press_key",
    "mcp__playwright__browser_drag",
    "mcp__playwright__browser_hover",
    "mcp__playwright__browser_file_upload",
    # JavaScript & debugging
    "mcp__playwright__browser_evaluate",
    # "mcp__playwright__browser_run_code",  # REMOVED - causes Playwright MCP server crash
    "mcp__playwright__browser_console_messages",
    "mcp__playwright__browser_network_requests",
    # Browser management
    "mcp__playwright__browser_resize",
    "mcp__playwright__browser_wait_for",
    "mcp__playwright__browser_handle_dialog",
    "mcp__playwright__browser_install",
    "mcp__playwright__browser_close",
    "mcp__playwright__browser_tabs",
 ]
 # Built-in tools available to agents.
 # WebFetch and WebSearch are included so coding agents can look up current
 # documentation for frameworks and libraries they are implementing.
@@ -282,7 +209,6 @@ def create_client(
    project_dir: Path,
    model: str,
    yolo_mode: bool = False,
    agent_id: str | None = None,
    agent_type: str = "coding",
 ):
    """
@@ -291,9 +217,7 @@ def create_client(
    Args:
        project_dir: Directory for the project
        model: Claude model to use
-        yolo_mode: If True, skip Playwright MCP server for rapid prototyping
+        yolo_mode: If True, skip browser testing for rapid prototyping
        agent_id: Optional unique identifier for browser isolation in parallel mode.
                  When provided, each agent gets its own browser profile.
        agent_type: One of "coding", "testing", or "initializer". Controls which
                    MCP tools are exposed and the max_turns limit.
@@ -327,11 +251,8 @@ def create_client(
    }
    max_turns = max_turns_map.get(agent_type, 300)
-    # Build allowed tools list based on mode and agent type.
+    # Build allowed tools list based on agent type.
    # In YOLO mode, exclude Playwright tools for faster prototyping.
    allowed_tools = [*BUILTIN_TOOLS, *feature_tools]
    if not yolo_mode:
        allowed_tools.extend(PLAYWRIGHT_TOOLS)
    # Build permissions list.
    # We permit ALL feature MCP tools at the security layer (so the MCP server
@@ -363,10 +284,6 @@ def create_client(
        permissions_list.append(f"Glob({path}/**)")
        permissions_list.append(f"Grep({path}/**)")
    if not yolo_mode:
        # Allow Playwright MCP tools for browser automation (standard mode only)
        permissions_list.extend(PLAYWRIGHT_TOOLS)
    # Create comprehensive security settings
    # Note: Using relative paths ("./**") restricts access to project directory
    # since cwd is set to project_dir
@@ -395,9 +312,9 @@ def create_client(
        print(f"   - Extra read paths (validated): {', '.join(str(p) for p in extra_read_paths)}")
    print("   - Bash commands restricted to allowlist (see security.py)")
    if yolo_mode:
-        print("   - MCP servers: features (database) - YOLO MODE (no Playwright)")
+        print("   - MCP servers: features (database) - YOLO MODE (no browser testing)")
    else:
-        print("   - MCP servers: playwright (browser), features (database)")
+        print("   - MCP servers: features (database)")
    print("   - Project settings enabled (skills, commands, CLAUDE.md)")
    print()
@@ -421,36 +338,6 @@ def create_client(
            },
        },
    }
    if not yolo_mode:
        # Include Playwright MCP server for browser automation (standard mode only)
        # Browser and headless mode configurable via environment variables
        browser = get_playwright_browser()
        playwright_args = [
            "@playwright/mcp@latest",
            "--viewport-size", "1280x720",
            "--browser", browser,
        ]
        if get_playwright_headless():
            playwright_args.append("--headless")
        print(f"   - Browser: {browser} (headless={get_playwright_headless()})")
        # Browser isolation for parallel execution
        # Each agent gets its own isolated browser context to prevent tab conflicts
        if agent_id:
            # Use --isolated for ephemeral browser context
            # This creates a fresh, isolated context without persistent state
            # Note: --isolated and --user-data-dir are mutually exclusive
            playwright_args.append("--isolated")
            print(f"   - Browser isolation enabled for agent: {agent_id}")
        mcp_servers["playwright"] = {
            "command": "npx",
            "args": playwright_args,
            "env": {
                "NODE_COMPILE_CACHE": "",  # Disable V8 compile caching to prevent .node file accumulation in %TEMP%
            },
        }
    # Build environment overrides for API endpoint configuration
    # Uses get_effective_sdk_env() which reads provider settings from the database,
    # ensuring UI-configured alternative providers (GLM, Ollama, Kimi, Custom) propagate
@@ -463,6 +350,7 @@ def create_client(
    is_vertex = sdk_env.get("CLAUDE_CODE_USE_VERTEX") == "1"
    is_alternative_api = bool(base_url) or is_vertex
    is_ollama = "localhost:11434" in base_url or "127.0.0.1:11434" in base_url
    is_azure = "services.ai.azure.com" in base_url
    model = convert_model_for_vertex(model)
    if sdk_env:
        print(f"   - API overrides: {', '.join(sdk_env.keys())}")
@@ -472,8 +360,10 @@ def create_client(
            print(f"   - Vertex AI Mode: Using GCP project '{project_id}' with model '{model}' in region '{region}'")
        elif is_ollama:
            print("   - Ollama Mode: Using local models")
        elif is_azure:
            print(f"   - Azure Mode: Using {base_url}")
        elif "ANTHROPIC_BASE_URL" in sdk_env:
-            print(f"   - GLM Mode: Using {sdk_env['ANTHROPIC_BASE_URL']}")
+            print(f"   - Alternative API: Using {sdk_env['ANTHROPIC_BASE_URL']}")
    # Create a wrapper for bash_security_hook that passes project_dir via context
    async def bash_hook_with_context(input_data, tool_use_id=None, context=None):
--- a/lib/cli.js
+++ b/lib/cli.js
@@ -517,6 +517,41 @@ function killProcess(pid) {
  }
 }
 // ---------------------------------------------------------------------------
 // Playwright CLI
 // ---------------------------------------------------------------------------
 /**
 * Ensure playwright-cli is available globally for browser automation.
 * Returns true if available (already installed or freshly installed).
 *
 * @param {boolean} showProgress - If true, print install progress
 */
 function ensurePlaywrightCli(showProgress) {
  try {
    execSync('playwright-cli --version', {
      timeout: 10_000,
      stdio: ['pipe', 'pipe', 'pipe'],
    });
    return true;
  } catch {
    // Not installed — try to install
  }
  if (showProgress) {
    log('      Installing playwright-cli for browser automation...');
  }
  try {
    execSync('npm install -g @playwright/cli', {
      timeout: 120_000,
      stdio: ['pipe', 'pipe', 'pipe'],
    });
    return true;
  } catch {
    return false;
  }
 }
 // ---------------------------------------------------------------------------
 // CLI commands
 // ---------------------------------------------------------------------------
@@ -613,6 +648,14 @@ function startServer(opts) {
  }
  const wasAlreadyReady = ensureVenv(python, repair);
  // Ensure playwright-cli for browser automation (quick check, installs once)
  if (!ensurePlaywrightCli(!wasAlreadyReady)) {
    log('');
    log('  Note: playwright-cli not available (browser automation will be limited)');
    log('  Install manually: npm install -g @playwright/cli');
    log('');
  }
  // Step 3: Config file
  const configCreated = ensureEnvFile();
--- a/mcp_server/feature_mcp.py
+++ b/mcp_server/feature_mcp.py
@@ -151,17 +151,20 @@ def feature_get_stats() -> str:
        result = session.query(
            func.count(Feature.id).label('total'),
            func.sum(case((Feature.passes == True, 1), else_=0)).label('passing'),
-            func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress')
+            func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress'),
            func.sum(case((Feature.needs_human_input == True, 1), else_=0)).label('needs_human_input')
        ).first()
        total = result.total or 0
        passing = int(result.passing or 0)
        in_progress = int(result.in_progress or 0)
        needs_human_input = int(result.needs_human_input or 0)
        percentage = round((passing / total) * 100, 1) if total > 0 else 0.0
        return json.dumps({
            "passing": passing,
            "in_progress": in_progress,
            "needs_human_input": needs_human_input,
            "total": total,
            "percentage": percentage
        })
@@ -221,6 +224,7 @@ def feature_get_summary(
            "name": feature.name,
            "passes": feature.passes,
            "in_progress": feature.in_progress,
            "needs_human_input": feature.needs_human_input if feature.needs_human_input is not None else False,
            "dependencies": feature.dependencies or []
        })
    finally:
@@ -401,11 +405,11 @@ def feature_mark_in_progress(
    """
    session = get_session()
    try:
-        # Atomic claim: only succeeds if feature is not already claimed or passing
+        # Atomic claim: only succeeds if feature is not already claimed, passing, or blocked for human input
        result = session.execute(text("""
            UPDATE features
            SET in_progress = 1
-            WHERE id = :id AND passes = 0 AND in_progress = 0
+            WHERE id = :id AND passes = 0 AND in_progress = 0 AND needs_human_input = 0
        """), {"id": feature_id})
        session.commit()
@@ -418,6 +422,8 @@ def feature_mark_in_progress(
                return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
            if feature.in_progress:
                return json.dumps({"error": f"Feature with ID {feature_id} is already in-progress"})
            if getattr(feature, 'needs_human_input', False):
                return json.dumps({"error": f"Feature with ID {feature_id} is blocked waiting for human input"})
            return json.dumps({"error": "Failed to mark feature in-progress for unknown reason"})
        # Fetch the claimed feature
@@ -455,11 +461,14 @@ def feature_claim_and_get(
        if feature.passes:
            return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
-        # Try atomic claim: only succeeds if not already claimed
+        if getattr(feature, 'needs_human_input', False):
            return json.dumps({"error": f"Feature with ID {feature_id} is blocked waiting for human input"})
        # Try atomic claim: only succeeds if not already claimed and not blocked for human input
        result = session.execute(text("""
            UPDATE features
            SET in_progress = 1
-            WHERE id = :id AND passes = 0 AND in_progress = 0
+            WHERE id = :id AND passes = 0 AND in_progress = 0 AND needs_human_input = 0
        """), {"id": feature_id})
        session.commit()
@@ -806,6 +815,8 @@ def feature_get_ready(
        for f in all_features:
            if f.passes or f.in_progress:
                continue
            if getattr(f, 'needs_human_input', False):
                continue
            deps = f.dependencies or []
            if all(dep_id in passing_ids for dep_id in deps):
                ready.append(f.to_dict())
@@ -888,6 +899,8 @@ def feature_get_graph() -> str:
            if f.passes:
                status = "done"
            elif getattr(f, 'needs_human_input', False):
                status = "needs_human_input"
            elif blocking:
                status = "blocked"
            elif f.in_progress:
@@ -984,6 +997,116 @@ def feature_set_dependencies(
        return json.dumps({"error": f"Failed to set dependencies: {str(e)}"})
@mcp.tool()
 def feature_request_human_input(
    feature_id: Annotated[int, Field(description="The ID of the feature that needs human input", ge=1)],
    prompt: Annotated[str, Field(min_length=1, description="Explain what you need from the human and why")],
    fields: Annotated[list[dict], Field(min_length=1, description="List of input fields to collect")]
 ) -> str:
    """Request structured input from a human for a feature that is blocked.
    Use this ONLY when the feature genuinely cannot proceed without human intervention:
    - Creating API keys or external accounts
    - Choosing between design approaches that require human preference
    - Configuring external services the agent cannot access
    - Providing credentials or secrets
    Do NOT use this for issues you can solve yourself (debugging, reading docs, etc.).
    The feature will be moved out of in_progress and into a "needs human input" state.
    Once the human provides their response, the feature returns to the pending queue
    and will include the human's response when you pick it up again.
    Args:
        feature_id: The ID of the feature that needs human input
        prompt: A clear explanation of what you need and why
        fields: List of input fields, each with:
            - id (str): Unique field identifier
            - label (str): Human-readable label
            - type (str): "text", "textarea", "select", or "boolean" (default: "text")
            - required (bool): Whether the field is required (default: true)
            - placeholder (str, optional): Placeholder text
            - options (list, optional): For select type: [{value, label}]
    Returns:
        JSON with success confirmation or error message
    """
    # Validate fields
    VALID_FIELD_TYPES = {"text", "textarea", "select", "boolean"}
    seen_ids: set[str] = set()
    for i, field in enumerate(fields):
        if "id" not in field or "label" not in field:
            return json.dumps({"error": f"Field at index {i} missing required 'id' or 'label'"})
        fid = field["id"]
        flabel = field["label"]
        if not isinstance(fid, str) or not fid.strip():
            return json.dumps({"error": f"Field at index {i} has empty or invalid 'id'"})
        if not isinstance(flabel, str) or not flabel.strip():
            return json.dumps({"error": f"Field at index {i} has empty or invalid 'label'"})
        if fid in seen_ids:
            return json.dumps({"error": f"Duplicate field id '{fid}' at index {i}"})
        seen_ids.add(fid)
        ftype = field.get("type", "text")
        if ftype not in VALID_FIELD_TYPES:
            return json.dumps({"error": f"Field at index {i} has invalid type '{ftype}'. Must be one of: {', '.join(sorted(VALID_FIELD_TYPES))}"})
        if ftype == "select":
            options = field.get("options")
            if not options or not isinstance(options, list):
                return json.dumps({"error": f"Field at index {i} is type 'select' but missing or invalid 'options' array"})
            for j, opt in enumerate(options):
                if not isinstance(opt, dict):
                    return json.dumps({"error": f"Field at index {i}, option {j} must be an object with 'value' and 'label'"})
                if "value" not in opt or "label" not in opt:
                    return json.dumps({"error": f"Field at index {i}, option {j} missing required 'value' or 'label'"})
                if not isinstance(opt["value"], str) or not opt["value"].strip():
                    return json.dumps({"error": f"Field at index {i}, option {j} has empty or invalid 'value'"})
                if not isinstance(opt["label"], str) or not opt["label"].strip():
                    return json.dumps({"error": f"Field at index {i}, option {j} has empty or invalid 'label'"})
        elif field.get("options"):
            return json.dumps({"error": f"Field at index {i} has 'options' but type is '{ftype}' (only 'select' uses options)"})
    request_data = {
        "prompt": prompt,
        "fields": fields,
    }
    session = get_session()
    try:
        # Atomically set needs_human_input, clear in_progress, store request, clear previous response
        result = session.execute(text("""
            UPDATE features
            SET needs_human_input = 1,
                in_progress = 0,
                human_input_request = :request,
                human_input_response = NULL
            WHERE id = :id AND passes = 0 AND in_progress = 1
        """), {"id": feature_id, "request": json.dumps(request_data)})
        session.commit()
        if result.rowcount == 0:
            feature = session.query(Feature).filter(Feature.id == feature_id).first()
            if feature is None:
                return json.dumps({"error": f"Feature with ID {feature_id} not found"})
            if feature.passes:
                return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
            if not feature.in_progress:
                return json.dumps({"error": f"Feature with ID {feature_id} is not in progress"})
            return json.dumps({"error": "Failed to request human input for unknown reason"})
        feature = session.query(Feature).filter(Feature.id == feature_id).first()
        return json.dumps({
            "success": True,
            "feature_id": feature_id,
            "name": feature.name,
            "message": f"Feature '{feature.name}' is now blocked waiting for human input"
        })
    except Exception as e:
        session.rollback()
        return json.dumps({"error": f"Failed to request human input: {str(e)}"})
    finally:
        session.close()
@mcp.tool()
 def ask_user(
    questions: Annotated[list[dict], Field(description="List of questions to ask, each with question, header, options (list of {label, description}), and multiSelect (bool)")]
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "autoforge-ai",
-  "version": "0.1.10",
+  "version": "0.1.17",
  "description": "Autonomous coding agent with web UI - build complete apps with AI",
  "license": "AGPL-3.0",
  "bin": {
@@ -19,6 +19,7 @@
    "ui/dist/",
    "ui/package.json",
    ".claude/commands/",
    ".claude/skills/",
    ".claude/templates/",
    "examples/",
    "start.py",
--- a/parallel_orchestrator.py
+++ b/parallel_orchestrator.py
@@ -131,7 +131,7 @@ def _dump_database_state(feature_dicts: list[dict], label: str = ""):
 MAX_PARALLEL_AGENTS = 5
 MAX_TOTAL_AGENTS = 10
 DEFAULT_CONCURRENCY = 3
-DEFAULT_TESTING_BATCH_SIZE = 3  # Number of features per testing batch (1-5)
+DEFAULT_TESTING_BATCH_SIZE = 3  # Number of features per testing batch (1-15)
 POLL_INTERVAL = 5  # seconds between checking for ready features
 MAX_FEATURE_RETRIES = 3  # Maximum times to retry a failed feature
 INITIALIZER_TIMEOUT = 1800  # 30 minutes timeout for initializer
@@ -168,7 +168,7 @@ class ParallelOrchestrator:
            yolo_mode: Whether to run in YOLO mode (skip testing agents entirely)
            testing_agent_ratio: Number of regression testing agents to maintain (0-3).
                0 = disabled, 1-3 = maintain that many testing agents running independently.
-            testing_batch_size: Number of features to include per testing session (1-5).
+            testing_batch_size: Number of features to include per testing session (1-15).
                Each testing agent receives this many features to regression test.
            on_output: Callback for agent output (feature_id, line)
            on_status: Callback for agent status changes (feature_id, status)
@@ -178,8 +178,8 @@ class ParallelOrchestrator:
        self.model = model
        self.yolo_mode = yolo_mode
        self.testing_agent_ratio = min(max(testing_agent_ratio, 0), 3)  # Clamp 0-3
-        self.testing_batch_size = min(max(testing_batch_size, 1), 5)  # Clamp 1-5
+        self.testing_batch_size = min(max(testing_batch_size, 1), 15)  # Clamp 1-15
-        self.batch_size = min(max(batch_size, 1), 3)  # Clamp 1-3
+        self.batch_size = min(max(batch_size, 1), 15)  # Clamp 1-15
        self.on_output = on_output
        self.on_status = on_status
@@ -194,6 +194,7 @@ class ParallelOrchestrator:
        # Legacy alias for backward compatibility
        self.running_agents = self.running_coding_agents
        self.abort_events: dict[int, threading.Event] = {}
        self._testing_session_counter = 0
        self.is_running = False
        # Track feature failures to prevent infinite retry loops
@@ -212,6 +213,9 @@ class ParallelOrchestrator:
        # Signal handlers only set this flag; cleanup happens in the main loop
        self._shutdown_requested = False
        # Graceful pause (drain mode) flag
        self._drain_requested = False
        # Session tracking for logging/debugging
        self.session_start_time: datetime | None = None
@@ -492,6 +496,9 @@ class ParallelOrchestrator:
        for fd in feature_dicts:
            if not fd.get("in_progress") or fd.get("passes"):
                continue
            # Skip if blocked for human input
            if fd.get("needs_human_input"):
                continue
            # Skip if already running in this orchestrator instance
            if fd["id"] in running_ids:
                continue
@@ -536,11 +543,14 @@ class ParallelOrchestrator:
                running_ids.update(batch_ids)
        ready = []
-        skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0}
+        skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0, "needs_human_input": 0}
        for fd in feature_dicts:
            if fd.get("passes"):
                skipped_reasons["passes"] += 1
                continue
            if fd.get("needs_human_input"):
                skipped_reasons["needs_human_input"] += 1
                continue
            if fd.get("in_progress"):
                skipped_reasons["in_progress"] += 1
                continue
@@ -846,7 +856,7 @@ class ParallelOrchestrator:
                "encoding": "utf-8",
                "errors": "replace",
                "cwd": str(self.project_dir),  # Run from project dir so CLI creates .claude/ in project
-                "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": ""},
+                "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": "", "PLAYWRIGHT_CLI_SESSION": f"coding-{feature_id}"},
            }
            if sys.platform == "win32":
                popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW
@@ -909,7 +919,7 @@ class ParallelOrchestrator:
                "encoding": "utf-8",
                "errors": "replace",
                "cwd": str(self.project_dir),  # Run from project dir so CLI creates .claude/ in project
-                "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": ""},
+                "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": "", "PLAYWRIGHT_CLI_SESSION": f"coding-{primary_id}"},
            }
            if sys.platform == "win32":
                popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW
@@ -1013,8 +1023,9 @@ class ParallelOrchestrator:
                    "encoding": "utf-8",
                    "errors": "replace",
                    "cwd": str(self.project_dir),  # Run from project dir so CLI creates .claude/ in project
-                    "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": ""},
+                    "env": {**os.environ, "PYTHONUNBUFFERED": "1", "NODE_COMPILE_CACHE": "", "PLAYWRIGHT_CLI_SESSION": f"testing-{self._testing_session_counter}"},
                }
                self._testing_session_counter += 1
                if sys.platform == "win32":
                    popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW
@@ -1385,6 +1396,9 @@ class ParallelOrchestrator:
        # Must happen before any debug_log.log() calls
        debug_log.start_session()
        # Clear any stale drain signal from a previous session
        self._clear_drain_signal()
        # Log startup to debug file
        debug_log.section("ORCHESTRATOR STARTUP")
        debug_log.log("STARTUP", "Orchestrator run_loop starting",
@@ -1506,6 +1520,34 @@ class ParallelOrchestrator:
                    print("\nAll features complete!", flush=True)
                    break
                # --- Graceful pause (drain mode) ---
                if not self._drain_requested and self._check_drain_signal():
                    self._drain_requested = True
                    print("Graceful pause requested - draining running agents...", flush=True)
                    debug_log.log("DRAIN", "Graceful pause requested, draining running agents")
                if self._drain_requested:
                    with self._lock:
                        coding_count = len(self.running_coding_agents)
                        testing_count = len(self.running_testing_agents)
                    if coding_count == 0 and testing_count == 0:
                        print("All agents drained - paused.", flush=True)
                        debug_log.log("DRAIN", "All agents drained, entering paused state")
                        # Wait until signal file is removed (resume) or shutdown
                        while self._check_drain_signal() and self.is_running and not self._shutdown_requested:
                            await asyncio.sleep(1)
                        if not self.is_running or self._shutdown_requested:
                            break
                        self._drain_requested = False
                        print("Resuming from graceful pause...", flush=True)
                        debug_log.log("DRAIN", "Resuming from graceful pause")
                        continue
                    else:
                        debug_log.log("DRAIN", f"Waiting for agents to finish: coding={coding_count}, testing={testing_count}")
                        await self._wait_for_agent_completion()
                        continue
                # Maintain testing agents independently (runs every iteration)
                self._maintain_testing_agents(feature_dicts)
@@ -1630,6 +1672,17 @@ class ParallelOrchestrator:
                "yolo_mode": self.yolo_mode,
            }
    def _check_drain_signal(self) -> bool:
        """Check if the graceful pause (drain) signal file exists."""
        from autoforge_paths import get_pause_drain_path
        return get_pause_drain_path(self.project_dir).exists()
    def _clear_drain_signal(self) -> None:
        """Delete the drain signal file and reset the flag."""
        from autoforge_paths import get_pause_drain_path
        get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
        self._drain_requested = False
    def cleanup(self) -> None:
        """Clean up database resources. Safe to call multiple times.
--- a/progress.py
+++ b/progress.py
@@ -62,54 +62,71 @@ def has_features(project_dir: Path) -> bool:
        return False
-def count_passing_tests(project_dir: Path) -> tuple[int, int, int]:
+def count_passing_tests(project_dir: Path) -> tuple[int, int, int, int]:
    """
-    Count passing, in_progress, and total tests via direct database access.
+    Count passing, in_progress, total, and needs_human_input tests via direct database access.
    Args:
        project_dir: Directory containing the project
    Returns:
-        (passing_count, in_progress_count, total_count)
+        (passing_count, in_progress_count, total_count, needs_human_input_count)
    """
    from autoforge_paths import get_features_db_path
    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
-        return 0, 0, 0
+        return 0, 0, 0, 0
    try:
        with closing(_get_connection(db_file)) as conn:
            cursor = conn.cursor()
-            # Single aggregate query instead of 3 separate COUNT queries
+            # Single aggregate query instead of separate COUNT queries
-            # Handle case where in_progress column doesn't exist yet (legacy DBs)
+            # Handle case where columns don't exist yet (legacy DBs)
            try:
                cursor.execute("""
                    SELECT
                        COUNT(*) as total,
                        SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
-                        SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
+                        SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress,
                        SUM(CASE WHEN needs_human_input = 1 THEN 1 ELSE 0 END) as needs_human_input
                    FROM features
                """)
                row = cursor.fetchone()
                total = row[0] or 0
                passing = row[1] or 0
                in_progress = row[2] or 0
                needs_human_input = row[3] or 0
            except sqlite3.OperationalError:
-                # Fallback for databases without in_progress column
+                # Fallback for databases without newer columns
-                cursor.execute("""
+                try:
-                    SELECT
+                    cursor.execute("""
-                        COUNT(*) as total,
+                        SELECT
-                        SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
+                            COUNT(*) as total,
-                    FROM features
+                            SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
-                """)
+                            SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
-                row = cursor.fetchone()
+                        FROM features
-                total = row[0] or 0
+                    """)
-                passing = row[1] or 0
+                    row = cursor.fetchone()
-                in_progress = 0
+                    total = row[0] or 0
-            return passing, in_progress, total
+                    passing = row[1] or 0
                    in_progress = row[2] or 0
                    needs_human_input = 0
                except sqlite3.OperationalError:
                    cursor.execute("""
                        SELECT
                            COUNT(*) as total,
                            SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
                        FROM features
                    """)
                    row = cursor.fetchone()
                    total = row[0] or 0
                    passing = row[1] or 0
                    in_progress = 0
                    needs_human_input = 0
            return passing, in_progress, total, needs_human_input
    except Exception as e:
        print(f"[Database error in count_passing_tests: {e}]")
-        return 0, 0, 0
+        return 0, 0, 0, 0
 def get_all_passing_features(project_dir: Path) -> list[dict]:
@@ -234,7 +251,7 @@ def print_session_header(session_num: int, is_initializer: bool) -> None:
 def print_progress_summary(project_dir: Path) -> None:
    """Print a summary of current progress."""
-    passing, in_progress, total = count_passing_tests(project_dir)
+    passing, in_progress, total, _needs_human_input = count_passing_tests(project_dir)
    if total > 0:
        percentage = (passing / total) * 100
--- a/prompts.py
+++ b/prompts.py
@@ -16,6 +16,9 @@ from pathlib import Path
 # Base templates location (generic templates)
 TEMPLATES_DIR = Path(__file__).parent / ".claude" / "templates"
 # Migration version — bump when adding new migration steps
 CURRENT_MIGRATION_VERSION = 1
 def get_project_prompts_dir(project_dir: Path) -> Path:
    """Get the prompts directory for a specific project."""
@@ -99,9 +102,9 @@ def _strip_browser_testing_sections(prompt: str) -> str:
        flags=re.DOTALL,
    )
-    # Replace the screenshots-only marking rule with YOLO-appropriate wording
+    # Replace the marking rule with YOLO-appropriate wording
    prompt = prompt.replace(
-        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**",
+        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
        "**YOLO mode: Mark a feature as passing after lint/type-check succeeds and server starts cleanly.**",
    )
@@ -351,9 +354,70 @@ def scaffold_project_prompts(project_dir: Path) -> Path:
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy allowed_commands.yaml: {e}")
    # Copy Playwright CLI skill for browser automation
    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
    if skills_src.exists() and not skills_dest.exists():
        try:
            shutil.copytree(skills_src, skills_dest)
            copied_files.append(".claude/skills/playwright-cli/")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy playwright-cli skill: {e}")
    # Ensure .playwright-cli/ and .playwright/ are in project .gitignore
    project_gitignore = project_dir / ".gitignore"
    entries_to_add = [".playwright-cli/", ".playwright/"]
    existing_lines: list[str] = []
    if project_gitignore.exists():
        try:
            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
        except (OSError, PermissionError):
            pass
    missing_entries = [e for e in entries_to_add if e not in existing_lines]
    if missing_entries:
        try:
            with open(project_gitignore, "a", encoding="utf-8") as f:
                # Add newline before entries if file doesn't end with one
                if existing_lines and existing_lines[-1].strip():
                    f.write("\n")
                for entry in missing_entries:
                    f.write(f"{entry}\n")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update .gitignore: {e}")
    # Scaffold .playwright/cli.config.json for browser settings
    playwright_config_dir = project_dir / ".playwright"
    playwright_config_file = playwright_config_dir / "cli.config.json"
    if not playwright_config_file.exists():
        try:
            playwright_config_dir.mkdir(parents=True, exist_ok=True)
            import json
            config = {
                "browser": {
                    "browserName": "chromium",
                    "launchOptions": {
                        "channel": "chrome",
                        "headless": True,
                    },
                    "contextOptions": {
                        "viewport": {"width": 1280, "height": 720},
                    },
                    "isolated": True,
                },
            }
            with open(playwright_config_file, "w", encoding="utf-8") as f:
                json.dump(config, f, indent=2)
                f.write("\n")
            copied_files.append(".playwright/cli.config.json")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not create playwright config: {e}")
    if copied_files:
        print(f"  Created project files: {', '.join(copied_files)}")
    # Stamp new projects at the current migration version so they never trigger migration
    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
    return project_prompts
@@ -425,3 +489,330 @@ def copy_spec_to_project(project_dir: Path) -> None:
            return
    print("Warning: No app_spec.txt found to copy to project directory")
 # ---------------------------------------------------------------------------
 # Project version migration
 # ---------------------------------------------------------------------------
 # Replacement content: coding_prompt.md STEP 5 section (Playwright CLI)
 _CLI_STEP5_CONTENT = """\
 ### STEP 5: VERIFY WITH BROWSER AUTOMATION
 **CRITICAL:** You MUST verify features through the actual UI.
 Use `playwright-cli` for browser automation:
 - Open the browser: `playwright-cli open http://localhost:PORT`
 - Take a snapshot to see page elements: `playwright-cli snapshot`
 - Read the snapshot YAML file to see element refs
 - Click elements by ref: `playwright-cli click e5`
 - Type text: `playwright-cli type "search query"`
 - Fill form fields: `playwright-cli fill e3 "value"`
 - Take screenshots: `playwright-cli screenshot`
 - Read the screenshot file to verify visual appearance
 - Check console errors: `playwright-cli console`
 - Close browser when done: `playwright-cli close`
 **Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
 to `.playwright-cli/`. You will see a file link in the output. Read the file only
 when you need to verify visual appearance or find element refs.
 **DO:**
 - Test through the UI with clicks and keyboard input
 - Take screenshots and read them to verify visual appearance
 - Check for console errors with `playwright-cli console`
 - Verify complete user workflows end-to-end
 - Always run `playwright-cli close` when finished testing
 **DON'T:**
 - Only test with curl commands
 - Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
 - Skip visual verification
 - Mark tests passing without thorough verification
 """
 # Replacement content: coding_prompt.md BROWSER AUTOMATION reference section
 _CLI_BROWSER_SECTION = """\
 ## BROWSER AUTOMATION
 Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
 `snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.
 **How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
 subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
 save to `.playwright-cli/` -- read the files when you need to verify content.
 Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
 JS errors. Don't bypass UI with JavaScript evaluation.
 """
 # Replacement content: testing_prompt.md STEP 2 section (Playwright CLI)
 _CLI_TESTING_STEP2 = """\
 ### STEP 2: VERIFY THE FEATURE
 **CRITICAL:** You MUST verify the feature through the actual UI using browser automation.
 For the feature returned:
 1. Read and understand the feature's verification steps
 2. Navigate to the relevant part of the application
 3. Execute each verification step using browser automation
 4. Take screenshots and read them to verify visual appearance
 5. Check for console errors
 ### Browser Automation (Playwright CLI)
 **Navigation & Screenshots:**
 - `playwright-cli open <url>` - Open browser and navigate
 - `playwright-cli goto <url>` - Navigate to URL
 - `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
 - `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`
 **Element Interaction:**
 - `playwright-cli click <ref>` - Click elements (ref from snapshot)
 - `playwright-cli type <text>` - Type text
 - `playwright-cli fill <ref> <text>` - Fill form fields
 - `playwright-cli select <ref> <val>` - Select dropdown
 - `playwright-cli press <key>` - Keyboard input
 **Debugging:**
 - `playwright-cli console` - Check for JS errors
 - `playwright-cli network` - Monitor API calls
 **Cleanup:**
 - `playwright-cli close` - Close browser when done (ALWAYS do this)
 **Note:** Screenshots and snapshots save to files. Read the file to see the content.
 """
 # Replacement content: testing_prompt.md AVAILABLE TOOLS browser subsection
 _CLI_TESTING_TOOLS = """\
 ### Browser Automation (Playwright CLI)
 Use `playwright-cli` commands for browser interaction. Key commands:
 - `playwright-cli open <url>` - Open browser
 - `playwright-cli goto <url>` - Navigate to URL
 - `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
 - `playwright-cli snapshot` - Get page snapshot with element refs
 - `playwright-cli click <ref>` - Click element
 - `playwright-cli type <text>` - Type text
 - `playwright-cli fill <ref> <text>` - Fill form field
 - `playwright-cli console` - Check for JS errors
 - `playwright-cli close` - Close browser (always do this when done)
 """
 def _get_migration_version(project_dir: Path) -> int:
    """Read the migration version from .autoforge/.migration_version."""
    from autoforge_paths import get_autoforge_dir
    version_file = get_autoforge_dir(project_dir) / ".migration_version"
    if not version_file.exists():
        return 0
    try:
        return int(version_file.read_text().strip())
    except (ValueError, OSError):
        return 0
 def _set_migration_version(project_dir: Path, version: int) -> None:
    """Write the migration version to .autoforge/.migration_version."""
    from autoforge_paths import get_autoforge_dir
    version_file = get_autoforge_dir(project_dir) / ".migration_version"
    version_file.parent.mkdir(parents=True, exist_ok=True)
    version_file.write_text(str(version))
 def _migrate_coding_prompt_to_cli(content: str) -> str:
    """Replace MCP-based Playwright sections with CLI-based content in coding prompt."""
    # Replace STEP 5 section (from header to just before STEP 5.5)
    content = re.sub(
        r"### STEP 5: VERIFY WITH BROWSER AUTOMATION.*?(?=### STEP 5\.5:)",
        _CLI_STEP5_CONTENT,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace BROWSER AUTOMATION reference section (from header to next ---)
    content = re.sub(
        r"## BROWSER AUTOMATION\n\n.*?(?=---)",
        _CLI_BROWSER_SECTION,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace inline screenshot rule
    content = content.replace(
        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**",
        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
    )
    # Replace inline screenshot references (various phrasings from old templates)
    for old_phrase in (
        "(inline only -- do NOT save to disk)",
        "(inline only, never save to disk)",
        "(inline mode only -- never save to disk)",
    ):
        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
    return content
 def _migrate_testing_prompt_to_cli(content: str) -> str:
    """Replace MCP-based Playwright sections with CLI-based content in testing prompt."""
    # Replace AVAILABLE TOOLS browser subsection FIRST (before STEP 2, to avoid
    # matching the new CLI subsection header that the STEP 2 replacement inserts).
    # In old prompts, ### Browser Automation (Playwright) only exists in AVAILABLE TOOLS.
    content = re.sub(
        r"### Browser Automation \(Playwright[^)]*\)\n.*?(?=---)",
        _CLI_TESTING_TOOLS,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace STEP 2 verification section (from header to just before STEP 3)
    content = re.sub(
        r"### STEP 2: VERIFY THE FEATURE.*?(?=### STEP 3:)",
        _CLI_TESTING_STEP2,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace inline screenshot references (various phrasings from old templates)
    for old_phrase in (
        "(inline only -- do NOT save to disk)",
        "(inline only, never save to disk)",
        "(inline mode only -- never save to disk)",
    ):
        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
    return content
 def _migrate_v0_to_v1(project_dir: Path) -> list[str]:
    """Migrate from v0 (MCP-based Playwright) to v1 (Playwright CLI).
    Four idempotent sub-steps:
    A. Copy playwright-cli skill to project
    B. Scaffold .playwright/cli.config.json
    C. Update .gitignore with .playwright-cli/ and .playwright/
    D. Update coding_prompt.md and testing_prompt.md
    """
    import json
    migrated: list[str] = []
    # A. Copy Playwright CLI skill
    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
    if skills_src.exists() and not skills_dest.exists():
        try:
            shutil.copytree(skills_src, skills_dest)
            migrated.append("Copied playwright-cli skill")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy playwright-cli skill: {e}")
    # B. Scaffold .playwright/cli.config.json
    playwright_config_dir = project_dir / ".playwright"
    playwright_config_file = playwright_config_dir / "cli.config.json"
    if not playwright_config_file.exists():
        try:
            playwright_config_dir.mkdir(parents=True, exist_ok=True)
            config = {
                "browser": {
                    "browserName": "chromium",
                    "launchOptions": {
                        "channel": "chrome",
                        "headless": True,
                    },
                    "contextOptions": {
                        "viewport": {"width": 1280, "height": 720},
                    },
                    "isolated": True,
                },
            }
            with open(playwright_config_file, "w", encoding="utf-8") as f:
                json.dump(config, f, indent=2)
                f.write("\n")
            migrated.append("Created .playwright/cli.config.json")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not create playwright config: {e}")
    # C. Update .gitignore
    project_gitignore = project_dir / ".gitignore"
    entries_to_add = [".playwright-cli/", ".playwright/"]
    existing_lines: list[str] = []
    if project_gitignore.exists():
        try:
            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
        except (OSError, PermissionError):
            pass
    missing_entries = [e for e in entries_to_add if e not in existing_lines]
    if missing_entries:
        try:
            with open(project_gitignore, "a", encoding="utf-8") as f:
                if existing_lines and existing_lines[-1].strip():
                    f.write("\n")
                for entry in missing_entries:
                    f.write(f"{entry}\n")
            migrated.append(f"Added {', '.join(missing_entries)} to .gitignore")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update .gitignore: {e}")
    # D. Update prompts
    prompts_dir = get_project_prompts_dir(project_dir)
    # D1. Update coding_prompt.md
    coding_prompt_path = prompts_dir / "coding_prompt.md"
    if coding_prompt_path.exists():
        try:
            content = coding_prompt_path.read_text(encoding="utf-8")
            if "Playwright MCP" in content or "browser_navigate" in content or "browser_take_screenshot" in content:
                updated = _migrate_coding_prompt_to_cli(content)
                if updated != content:
                    coding_prompt_path.write_text(updated, encoding="utf-8")
                    migrated.append("Updated coding_prompt.md to Playwright CLI")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update coding_prompt.md: {e}")
    # D2. Update testing_prompt.md
    testing_prompt_path = prompts_dir / "testing_prompt.md"
    if testing_prompt_path.exists():
        try:
            content = testing_prompt_path.read_text(encoding="utf-8")
            if "browser_navigate" in content or "browser_take_screenshot" in content:
                updated = _migrate_testing_prompt_to_cli(content)
                if updated != content:
                    testing_prompt_path.write_text(updated, encoding="utf-8")
                    migrated.append("Updated testing_prompt.md to Playwright CLI")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update testing_prompt.md: {e}")
    return migrated
 def migrate_project_to_current(project_dir: Path) -> list[str]:
    """Migrate an existing project to the current AutoForge version.
    Idempotent — safe to call on every agent start. Returns list of
    human-readable descriptions of what was migrated.
    """
    current = _get_migration_version(project_dir)
    if current >= CURRENT_MIGRATION_VERSION:
        return []
    migrated: list[str] = []
    if current < 1:
        migrated.extend(_migrate_v0_to_v1(project_dir))
    # Future: if current < 2: migrated.extend(_migrate_v1_to_v2(project_dir))
    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
    return migrated
--- a/registry.py
+++ b/registry.py
@@ -671,11 +671,24 @@ API_PROVIDERS: dict[str, dict[str, Any]] = {
        "requires_auth": True,
        "auth_env_var": "ANTHROPIC_AUTH_TOKEN",
        "models": [
            {"id": "glm-5", "name": "GLM 5"},
            {"id": "glm-4.7", "name": "GLM 4.7"},
            {"id": "glm-4.5-air", "name": "GLM 4.5 Air"},
        ],
        "default_model": "glm-4.7",
    },
    "azure": {
        "name": "Azure Anthropic (Claude)",
        "base_url": "",
        "requires_auth": True,
        "auth_env_var": "ANTHROPIC_API_KEY",
        "models": [
            {"id": "claude-opus-4-6", "name": "Claude Opus"},
            {"id": "claude-sonnet-4-5", "name": "Claude Sonnet"},
            {"id": "claude-haiku-4-5", "name": "Claude Haiku"},
        ],
        "default_model": "claude-opus-4-6",
    },
    "ollama": {
        "name": "Ollama (Local)",
        "base_url": "http://localhost:11434",
@@ -731,7 +744,7 @@ def get_effective_sdk_env() -> dict[str, str]:
                sdk_env[var] = value
        return sdk_env
-    sdk_env: dict[str, str] = {}
+    sdk_env = {}
    # Explicitly clear credentials that could leak from the server process env.
    # For providers using ANTHROPIC_AUTH_TOKEN (GLM, Custom), clear ANTHROPIC_API_KEY.
--- a/requirements-prod.txt
+++ b/requirements-prod.txt
@@ -1,6 +1,6 @@
 # Production runtime dependencies only
 # For development, use requirements.txt (includes ruff, mypy, pytest)
-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk>=0.1.39,<0.2.0
 python-dotenv>=1.0.0
 sqlalchemy>=2.0.0
 fastapi>=0.115.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk>=0.1.39,<0.2.0
 python-dotenv>=1.0.0
 sqlalchemy>=2.0.0
 fastapi>=0.115.0
--- a/security.py
+++ b/security.py
@@ -66,10 +66,12 @@ ALLOWED_COMMANDS = {
    "bash",
    # Script execution
    "init.sh",  # Init scripts; validated separately
    # Browser automation
    "playwright-cli",  # Playwright CLI for browser testing; validated separately
 }
 # Commands that need additional validation even when in the allowlist
-COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"}
+COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh", "playwright-cli"}
 # Commands that are NEVER allowed, even with user approval
 # These commands can cause permanent system damage or security breaches
@@ -438,6 +440,37 @@ def validate_init_script(command_string: str) -> tuple[bool, str]:
    return False, f"Only ./init.sh is allowed, got: {script}"
 def validate_playwright_command(command_string: str) -> tuple[bool, str]:
    """
    Validate playwright-cli commands - block dangerous subcommands.
    Blocks `run-code` (arbitrary Node.js execution) and `eval` (arbitrary JS
    evaluation) which bypass the security sandbox.
    Returns:
        Tuple of (is_allowed, reason_if_blocked)
    """
    try:
        tokens = shlex.split(command_string)
    except ValueError:
        return False, "Could not parse playwright-cli command"
    if not tokens:
        return False, "Empty command"
    BLOCKED_SUBCOMMANDS = {"run-code", "eval"}
    # Find the subcommand: first non-flag token after 'playwright-cli'
    for token in tokens[1:]:
        if token.startswith("-"):
            continue  # skip flags like -s=agent-1
        if token in BLOCKED_SUBCOMMANDS:
            return False, f"playwright-cli '{token}' is not allowed"
        break  # first non-flag token is the subcommand
    return True, ""
 def matches_pattern(command: str, pattern: str) -> bool:
    """
    Check if a command matches a pattern.
@@ -955,5 +988,9 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):
                allowed, reason = validate_init_script(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
            elif cmd == "playwright-cli":
                allowed, reason = validate_playwright_command(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
    return {}
--- a/server/main.py
+++ b/server/main.py
@@ -36,6 +36,7 @@ from .routers import (
    features_router,
    filesystem_router,
    projects_router,
    scaffold_router,
    schedules_router,
    settings_router,
    spec_creation_router,
@@ -169,6 +170,7 @@ app.include_router(filesystem_router)
 app.include_router(assistant_chat_router)
 app.include_router(settings_router)
 app.include_router(terminal_router)
 app.include_router(scaffold_router)
 # ============================================================================
--- a/server/routers/init.py
+++ b/server/routers/init.py
@@ -12,6 +12,7 @@ from .expand_project import router as expand_project_router
 from .features import router as features_router
 from .filesystem import router as filesystem_router
 from .projects import router as projects_router
 from .scaffold import router as scaffold_router
 from .schedules import router as schedules_router
 from .settings import router as settings_router
 from .spec_creation import router as spec_creation_router
@@ -29,4 +30,5 @@ __all__ = [
    "assistant_chat_router",
    "settings_router",
    "terminal_router",
    "scaffold_router",
 ]
--- a/server/routers/agent.py
+++ b/server/routers/agent.py
@@ -17,11 +17,11 @@ from ..utils.project_helpers import get_project_path as _get_project_path
 from ..utils.validation import validate_project_name
-def _get_settings_defaults() -> tuple[bool, str, int, bool, int]:
+def _get_settings_defaults() -> tuple[bool, str, int, bool, int, int]:
    """Get defaults from global settings.
    Returns:
-        Tuple of (yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size)
+        Tuple of (yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size, testing_batch_size)
    """
    import sys
    root = Path(__file__).parent.parent.parent
@@ -47,7 +47,12 @@ def _get_settings_defaults() -> tuple[bool, str, int, bool, int]:
    except (ValueError, TypeError):
        batch_size = 3
-    return yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size
+    try:
        testing_batch_size = int(settings.get("testing_batch_size", "3"))
    except (ValueError, TypeError):
        testing_batch_size = 3
    return yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size, testing_batch_size
 router = APIRouter(prefix="/api/projects/{project_name}/agent", tags=["agent"])
@@ -96,7 +101,7 @@ async def start_agent(
    manager = get_project_manager(project_name)
    # Get defaults from global settings if not provided in request
-    default_yolo, default_model, default_testing_ratio, playwright_headless, default_batch_size = _get_settings_defaults()
+    default_yolo, default_model, default_testing_ratio, playwright_headless, default_batch_size, default_testing_batch_size = _get_settings_defaults()
    yolo_mode = request.yolo_mode if request.yolo_mode is not None else default_yolo
    model = request.model if request.model else default_model
@@ -104,6 +109,7 @@ async def start_agent(
    testing_agent_ratio = request.testing_agent_ratio if request.testing_agent_ratio is not None else default_testing_ratio
    batch_size = default_batch_size
    testing_batch_size = default_testing_batch_size
    success, message = await manager.start(
        yolo_mode=yolo_mode,
@@ -112,6 +118,7 @@ async def start_agent(
        testing_agent_ratio=testing_agent_ratio,
        playwright_headless=playwright_headless,
        batch_size=batch_size,
        testing_batch_size=testing_batch_size,
    )
    # Notify scheduler of manual start (to prevent auto-stop during scheduled window)
@@ -175,3 +182,31 @@ async def resume_agent(project_name: str):
        status=manager.status,
        message=message,
    )
@router.post("/graceful-pause", response_model=AgentActionResponse)
 async def graceful_pause_agent(project_name: str):
    """Request a graceful pause (drain mode) - finish current work then pause."""
    manager = get_project_manager(project_name)
    success, message = await manager.graceful_pause()
    return AgentActionResponse(
        success=success,
        status=manager.status,
        message=message,
    )
@router.post("/graceful-resume", response_model=AgentActionResponse)
 async def graceful_resume_agent(project_name: str):
    """Resume from a graceful pause."""
    manager = get_project_manager(project_name)
    success, message = await manager.graceful_resume()
    return AgentActionResponse(
        success=success,
        status=manager.status,
        message=message,
    )
--- a/server/routers/features.py
+++ b/server/routers/features.py
@@ -23,6 +23,7 @@ from ..schemas import (
    FeatureListResponse,
    FeatureResponse,
    FeatureUpdate,
    HumanInputResponse,
 )
 from ..utils.project_helpers import get_project_path as _get_project_path
 from ..utils.validation import validate_project_name
@@ -104,6 +105,9 @@ def feature_to_response(f, passing_ids: set[int] | None = None) -> FeatureRespon
        in_progress=f.in_progress if f.in_progress is not None else False,
        blocked=blocked,
        blocking_dependencies=blocking,
        needs_human_input=getattr(f, 'needs_human_input', False) or False,
        human_input_request=getattr(f, 'human_input_request', None),
        human_input_response=getattr(f, 'human_input_response', None),
    )
@@ -143,11 +147,14 @@ async def list_features(project_name: str):
            pending = []
            in_progress = []
            done = []
            needs_human_input_list = []
            for f in all_features:
                feature_response = feature_to_response(f, passing_ids)
                if f.passes:
                    done.append(feature_response)
                elif getattr(f, 'needs_human_input', False):
                    needs_human_input_list.append(feature_response)
                elif f.in_progress:
                    in_progress.append(feature_response)
                else:
@@ -157,6 +164,7 @@ async def list_features(project_name: str):
                pending=pending,
                in_progress=in_progress,
                done=done,
                needs_human_input=needs_human_input_list,
            )
    except HTTPException:
        raise
@@ -341,9 +349,11 @@ async def get_dependency_graph(project_name: str):
                deps = f.dependencies or []
                blocking = [d for d in deps if d not in passing_ids]
-                status: Literal["pending", "in_progress", "done", "blocked"]
+                status: Literal["pending", "in_progress", "done", "blocked", "needs_human_input"]
                if f.passes:
                    status = "done"
                elif getattr(f, 'needs_human_input', False):
                    status = "needs_human_input"
                elif blocking:
                    status = "blocked"
                elif f.in_progress:
@@ -564,6 +574,71 @@ async def skip_feature(project_name: str, feature_id: int):
        raise HTTPException(status_code=500, detail="Failed to skip feature")
@router.post("/{feature_id}/resolve-human-input", response_model=FeatureResponse)
 async def resolve_human_input(project_name: str, feature_id: int, response: HumanInputResponse):
    """Resolve a human input request for a feature.
    Validates all required fields have values, stores the response,
    and returns the feature to the pending queue for agents to pick up.
    """
    project_name = validate_project_name(project_name)
    project_dir = _get_project_path(project_name)
    if not project_dir:
        raise HTTPException(status_code=404, detail=f"Project '{project_name}' not found in registry")
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")
    _, Feature = _get_db_classes()
    try:
        with get_db_session(project_dir) as session:
            feature = session.query(Feature).filter(Feature.id == feature_id).first()
            if not feature:
                raise HTTPException(status_code=404, detail=f"Feature {feature_id} not found")
            if not getattr(feature, 'needs_human_input', False):
                raise HTTPException(status_code=400, detail="Feature is not waiting for human input")
            # Validate required fields
            request_data = feature.human_input_request
            if request_data and isinstance(request_data, dict):
                for field_def in request_data.get("fields", []):
                    if field_def.get("required", True):
                        field_id = field_def.get("id")
                        if field_id not in response.fields or response.fields[field_id] in (None, ""):
                            raise HTTPException(
                                status_code=400,
                                detail=f"Required field '{field_def.get('label', field_id)}' is missing"
                            )
            # Store response and return to pending queue
            from datetime import datetime, timezone
            response_data = {
                "fields": {k: v for k, v in response.fields.items()},
                "responded_at": datetime.now(timezone.utc).isoformat(),
            }
            feature.human_input_response = response_data
            feature.needs_human_input = False
            # Keep in_progress=False, passes=False so it returns to pending
            session.commit()
            session.refresh(feature)
            # Compute passing IDs for response
            all_features = session.query(Feature).all()
            passing_ids = {f.id for f in all_features if f.passes}
            return feature_to_response(feature, passing_ids)
    except HTTPException:
        raise
    except Exception:
        logger.exception("Failed to resolve human input")
        raise HTTPException(status_code=500, detail="Failed to resolve human input")
 # ============================================================================
 # Dependency Management Endpoints
 # ============================================================================
--- a/server/routers/projects.py
+++ b/server/routers/projects.py
@@ -102,7 +102,7 @@ def get_project_stats(project_dir: Path) -> ProjectStats:
    """Get statistics for a project."""
    _init_imports()
    assert _count_passing_tests is not None  # guaranteed by _init_imports()
-    passing, in_progress, total = _count_passing_tests(project_dir)
+    passing, in_progress, total, _needs_human_input = _count_passing_tests(project_dir)
    percentage = (passing / total * 100) if total > 0 else 0.0
    return ProjectStats(
        passing=passing,
--- a/server/routers/scaffold.py
+++ b/server/routers/scaffold.py
@@ -0,0 +1,136 @@
 """
 Scaffold Router
 ================
 SSE streaming endpoint for running project scaffold commands.
 Supports templated project creation (e.g., Next.js agentic starter).
 """
 import asyncio
 import json
 import logging
 import shutil
 import subprocess
 import sys
 from pathlib import Path
 from fastapi import APIRouter, Request
 from fastapi.responses import StreamingResponse
 from pydantic import BaseModel
 from .filesystem import is_path_blocked
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/scaffold", tags=["scaffold"])
 # Hardcoded templates — no arbitrary commands allowed
 TEMPLATES: dict[str, list[str]] = {
    "agentic-starter": ["npx", "create-agentic-app@latest", ".", "-y", "-p", "npm", "--skip-git"],
 }
 class ScaffoldRequest(BaseModel):
    template: str
    target_path: str
 def _sse_event(data: dict) -> str:
    """Format a dict as an SSE data line."""
    return f"data: {json.dumps(data)}\n\n"
 async def _stream_scaffold(template: str, target_path: str, request: Request):
    """Run the scaffold command and yield SSE events."""
    # Validate template
    if template not in TEMPLATES:
        yield _sse_event({"type": "error", "message": f"Unknown template: {template}"})
        return
    # Validate path
    path = Path(target_path)
    try:
        path = path.resolve()
    except (OSError, ValueError) as e:
        yield _sse_event({"type": "error", "message": f"Invalid path: {e}"})
        return
    if is_path_blocked(path):
        yield _sse_event({"type": "error", "message": "Access to this directory is not allowed"})
        return
    if not path.exists() or not path.is_dir():
        yield _sse_event({"type": "error", "message": "Target directory does not exist"})
        return
    # Check npx is available
    npx_name = "npx"
    if sys.platform == "win32":
        npx_name = "npx.cmd"
    if not shutil.which(npx_name):
        yield _sse_event({"type": "error", "message": "npx is not available. Please install Node.js."})
        return
    # Build command
    argv = list(TEMPLATES[template])
    if sys.platform == "win32" and not argv[0].lower().endswith(".cmd"):
        argv[0] = argv[0] + ".cmd"
    process = None
    try:
        popen_kwargs: dict = {
            "stdout": subprocess.PIPE,
            "stderr": subprocess.STDOUT,
            "stdin": subprocess.DEVNULL,
            "cwd": str(path),
        }
        if sys.platform == "win32":
            popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW
        process = subprocess.Popen(argv, **popen_kwargs)
        logger.info("Scaffold process started: pid=%s, template=%s, path=%s", process.pid, template, target_path)
        # Stream stdout lines
        assert process.stdout is not None
        for raw_line in iter(process.stdout.readline, b""):
            # Check if client disconnected
            if await request.is_disconnected():
                logger.info("Client disconnected during scaffold, terminating process")
                break
            line = raw_line.decode("utf-8", errors="replace").rstrip("\n\r")
            yield _sse_event({"type": "output", "line": line})
            # Yield control to event loop so disconnect checks work
            await asyncio.sleep(0)
        process.wait()
        exit_code = process.returncode
        success = exit_code == 0
        logger.info("Scaffold process completed: exit_code=%s, template=%s", exit_code, template)
        yield _sse_event({"type": "complete", "success": success, "exit_code": exit_code})
    except Exception as e:
        logger.error("Scaffold error: %s", e)
        yield _sse_event({"type": "error", "message": str(e)})
    finally:
        if process and process.poll() is None:
            try:
                process.terminate()
                process.wait(timeout=5)
            except Exception:
                process.kill()
@router.post("/run")
 async def run_scaffold(body: ScaffoldRequest, request: Request):
    """Run a scaffold template command with SSE streaming output."""
    return StreamingResponse(
        _stream_scaffold(body.template, body.target_path, request),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
        },
    )
--- a/server/routers/settings.py
+++ b/server/routers/settings.py
@@ -113,6 +113,7 @@ async def get_settings():
        testing_agent_ratio=_parse_int(all_settings.get("testing_agent_ratio"), 1),
        playwright_headless=_parse_bool(all_settings.get("playwright_headless"), default=True),
        batch_size=_parse_int(all_settings.get("batch_size"), 3),
        testing_batch_size=_parse_int(all_settings.get("testing_batch_size"), 3),
        api_provider=api_provider,
        api_base_url=all_settings.get("api_base_url"),
        api_has_auth_token=bool(all_settings.get("api_auth_token")),
@@ -138,6 +139,9 @@ async def update_settings(update: SettingsUpdate):
    if update.batch_size is not None:
        set_setting("batch_size", str(update.batch_size))
    if update.testing_batch_size is not None:
        set_setting("testing_batch_size", str(update.testing_batch_size))
    # API provider settings
    if update.api_provider is not None:
        old_provider = get_setting("api_provider", "claude")
@@ -177,6 +181,7 @@ async def update_settings(update: SettingsUpdate):
        testing_agent_ratio=_parse_int(all_settings.get("testing_agent_ratio"), 1),
        playwright_headless=_parse_bool(all_settings.get("playwright_headless"), default=True),
        batch_size=_parse_int(all_settings.get("batch_size"), 3),
        testing_batch_size=_parse_int(all_settings.get("testing_batch_size"), 3),
        api_provider=api_provider,
        api_base_url=all_settings.get("api_base_url"),
        api_has_auth_token=bool(all_settings.get("api_auth_token")),
--- a/server/schemas.py
+++ b/server/schemas.py
@@ -120,16 +120,41 @@ class FeatureResponse(FeatureBase):
    in_progress: bool
    blocked: bool = False  # Computed: has unmet dependencies
    blocking_dependencies: list[int] = Field(default_factory=list)  # Computed
    needs_human_input: bool = False
    human_input_request: dict | None = None
    human_input_response: dict | None = None
    class Config:
        from_attributes = True
 class HumanInputField(BaseModel):
    """Schema for a single human input field."""
    id: str
    label: str
    type: Literal["text", "textarea", "select", "boolean"] = "text"
    required: bool = True
    placeholder: str | None = None
    options: list[dict] | None = None  # For select: [{value, label}]
 class HumanInputRequest(BaseModel):
    """Schema for an agent's human input request."""
    prompt: str
    fields: list[HumanInputField]
 class HumanInputResponse(BaseModel):
    """Schema for a human's response to an input request."""
    fields: dict[str, str | bool | list[str]]
 class FeatureListResponse(BaseModel):
    """Response containing list of features organized by status."""
    pending: list[FeatureResponse]
    in_progress: list[FeatureResponse]
    done: list[FeatureResponse]
    needs_human_input: list[FeatureResponse] = Field(default_factory=list)
 class FeatureBulkCreate(BaseModel):
@@ -153,7 +178,7 @@ class DependencyGraphNode(BaseModel):
    id: int
    name: str
    category: str
-    status: Literal["pending", "in_progress", "done", "blocked"]
+    status: Literal["pending", "in_progress", "done", "blocked", "needs_human_input"]
    priority: int
    dependencies: list[int]
@@ -217,7 +242,7 @@ class AgentStartRequest(BaseModel):
 class AgentStatus(BaseModel):
    """Current agent status."""
-    status: Literal["stopped", "running", "paused", "crashed"]
+    status: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]
    pid: int | None = None
    started_at: datetime | None = None
    yolo_mode: bool = False
@@ -257,6 +282,7 @@ class WSProgressMessage(BaseModel):
    in_progress: int
    total: int
    percentage: float
    needs_human_input: int = 0
 class WSFeatureUpdateMessage(BaseModel):
@@ -418,7 +444,8 @@ class SettingsResponse(BaseModel):
    ollama_mode: bool = False  # True when api_provider is "ollama"
    testing_agent_ratio: int = 1  # Regression testing agents (0-3)
    playwright_headless: bool = True
-    batch_size: int = 3  # Features per coding agent batch (1-3)
+    batch_size: int = 3  # Features per coding agent batch (1-15)
    testing_batch_size: int = 3  # Features per testing agent batch (1-15)
    api_provider: str = "claude"
    api_base_url: str | None = None
    api_has_auth_token: bool = False  # Never expose actual token
@@ -437,7 +464,8 @@ class SettingsUpdate(BaseModel):
    model: str | None = None
    testing_agent_ratio: int | None = None  # 0-3
    playwright_headless: bool | None = None
-    batch_size: int | None = None  # Features per agent batch (1-3)
+    batch_size: int | None = None  # Features per agent batch (1-15)
    testing_batch_size: int | None = None  # Features per testing agent batch (1-15)
    api_provider: str | None = None
    api_base_url: str | None = Field(None, max_length=500)
    api_auth_token: str | None = Field(None, max_length=500)  # Write-only, never returned
@@ -474,8 +502,15 @@ class SettingsUpdate(BaseModel):
    @field_validator('batch_size')
    @classmethod
    def validate_batch_size(cls, v: int | None) -> int | None:
-        if v is not None and (v < 1 or v > 3):
+        if v is not None and (v < 1 or v > 15):
-            raise ValueError("batch_size must be between 1 and 3")
+            raise ValueError("batch_size must be between 1 and 15")
        return v
    @field_validator('testing_batch_size')
    @classmethod
    def validate_testing_batch_size(cls, v: int | None) -> int | None:
        if v is not None and (v < 1 or v > 15):
            raise ValueError("testing_batch_size must be between 1 and 15")
        return v
--- a/server/services/assistant_chat_session.py
+++ b/server/services/assistant_chat_session.py
@@ -25,7 +25,11 @@ from .assistant_database import (
    create_conversation,
    get_messages,
 )
-from .chat_constants import ROOT_DIR
+from .chat_constants import (
    ROOT_DIR,
    check_rate_limit_error,
    safe_receive_response,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -394,38 +398,46 @@ class AssistantChatSession:
        full_response = ""
        # Stream the response
-        async for msg in self.client.receive_response():
+        try:
-            msg_type = type(msg).__name__
+            async for msg in safe_receive_response(self.client, logger):
                msg_type = type(msg).__name__
-            if msg_type == "AssistantMessage" and hasattr(msg, "content"):
+                if msg_type == "AssistantMessage" and hasattr(msg, "content"):
-                for block in msg.content:
+                    for block in msg.content:
-                    block_type = type(block).__name__
+                        block_type = type(block).__name__
-                    if block_type == "TextBlock" and hasattr(block, "text"):
+                        if block_type == "TextBlock" and hasattr(block, "text"):
-                        text = block.text
+                            text = block.text
-                        if text:
+                            if text:
-                            full_response += text
+                                full_response += text
-                            yield {"type": "text", "content": text}
+                                yield {"type": "text", "content": text}
-                    elif block_type == "ToolUseBlock" and hasattr(block, "name"):
+                        elif block_type == "ToolUseBlock" and hasattr(block, "name"):
-                        tool_name = block.name
+                            tool_name = block.name
-                        tool_input = getattr(block, "input", {})
+                            tool_input = getattr(block, "input", {})
-                        # Intercept ask_user tool calls -> yield as question message
+                            # Intercept ask_user tool calls -> yield as question message
-                        if tool_name == "mcp__features__ask_user":
+                            if tool_name == "mcp__features__ask_user":
-                            questions = tool_input.get("questions", [])
+                                questions = tool_input.get("questions", [])
-                            if questions:
+                                if questions:
-                                yield {
+                                    yield {
-                                    "type": "question",
+                                        "type": "question",
-                                    "questions": questions,
+                                        "questions": questions,
-                                }
+                                    }
-                                continue
+                                    continue
-                        yield {
+                            yield {
-                            "type": "tool_call",
+                                "type": "tool_call",
-                            "tool": tool_name,
+                                "tool": tool_name,
-                            "input": tool_input,
+                                "input": tool_input,
-                        }
+                            }
        except Exception as exc:
            is_rate_limit, _ = check_rate_limit_error(exc)
            if is_rate_limit:
                logger.warning(f"Rate limited: {exc}")
                yield {"type": "error", "content": "Rate limited. Please try again later."}
                return
            raise
        # Store the complete response in the database
        if full_response and self.conversation_id:
--- a/server/services/chat_constants.py
+++ b/server/services/chat_constants.py
@@ -9,9 +9,10 @@ project root and is re-exported here for convenience so that existing
 imports (``from .chat_constants import API_ENV_VARS``) continue to work.
 """
 import logging
 import sys
 from pathlib import Path
-from typing import AsyncGenerator
+from typing import Any, AsyncGenerator
 # -------------------------------------------------------------------
 # Root directory of the autoforge project (repository root).
@@ -32,6 +33,59 @@ if _root_str not in sys.path:
 # imports continue to work unchanged.
 # -------------------------------------------------------------------
 from env_constants import API_ENV_VARS  # noqa: E402, F401
 from rate_limit_utils import is_rate_limit_error, parse_retry_after  # noqa: E402, F401
 logger = logging.getLogger(__name__)
 def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
    """Inspect an exception and determine if it represents a rate-limit.
    Returns ``(is_rate_limit, retry_seconds)``.  ``retry_seconds`` is the
    parsed Retry-After value when available, otherwise ``None`` (caller
    should use exponential backoff).
    """
    # MessageParseError = unknown CLI message type (e.g. "rate_limit_event").
    # These are informational events, NOT actual rate limit errors.
    # The word "rate_limit" in the type name would false-positive the regex.
    if type(exc).__name__ == "MessageParseError":
        return False, None
    # For all other exceptions: match error text against known rate-limit patterns
    exc_str = str(exc)
    if is_rate_limit_error(exc_str):
        retry = parse_retry_after(exc_str)
        return True, retry
    return False, None
 async def safe_receive_response(client: Any, log: logging.Logger) -> AsyncGenerator:
    """Wrap ``client.receive_response()`` to skip ``MessageParseError``.
    The Claude Code CLI may emit message types (e.g. ``rate_limit_event``)
    that the installed Python SDK does not recognise, causing
    ``MessageParseError`` which kills the async generator.  The CLI
    subprocess is still alive and the SDK uses a buffered memory channel,
    so we restart ``receive_response()`` to continue reading remaining
    messages without losing data.
    """
    max_retries = 50
    retries = 0
    while True:
        try:
            async for msg in client.receive_response():
                yield msg
            return  # Normal completion
        except Exception as exc:
            if type(exc).__name__ == "MessageParseError":
                retries += 1
                if retries > max_retries:
                    log.error(f"Too many unrecognized CLI messages ({retries}), stopping")
                    return
                log.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
                continue
            raise
 async def make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
--- a/server/services/expand_chat_session.py
+++ b/server/services/expand_chat_session.py
@@ -22,7 +22,12 @@ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv
 from ..schemas import ImageAttachment
-from .chat_constants import ROOT_DIR, make_multimodal_message
+from .chat_constants import (
    ROOT_DIR,
    check_rate_limit_error,
    make_multimodal_message,
    safe_receive_response,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -299,23 +304,31 @@ class ExpandChatSession:
            await self.client.query(message)
        # Stream the response
-        async for msg in self.client.receive_response():
+        try:
-            msg_type = type(msg).__name__
+            async for msg in safe_receive_response(self.client, logger):
                msg_type = type(msg).__name__
-            if msg_type == "AssistantMessage" and hasattr(msg, "content"):
+                if msg_type == "AssistantMessage" and hasattr(msg, "content"):
-                for block in msg.content:
+                    for block in msg.content:
-                    block_type = type(block).__name__
+                        block_type = type(block).__name__
-                    if block_type == "TextBlock" and hasattr(block, "text"):
+                        if block_type == "TextBlock" and hasattr(block, "text"):
-                        text = block.text
+                            text = block.text
-                        if text:
+                            if text:
-                            yield {"type": "text", "content": text}
+                                yield {"type": "text", "content": text}
-                            self.messages.append({
+                                self.messages.append({
-                                "role": "assistant",
+                                    "role": "assistant",
-                                "content": text,
+                                    "content": text,
-                                "timestamp": datetime.now().isoformat()
+                                    "timestamp": datetime.now().isoformat()
-                            })
+                                })
        except Exception as exc:
            is_rate_limit, _ = check_rate_limit_error(exc)
            if is_rate_limit:
                logger.warning(f"Rate limited: {exc}")
                yield {"type": "error", "content": "Rate limited. Please try again later."}
                return
            raise
    def get_features_created(self) -> int:
        """Get the total number of features created in this session."""
--- a/server/services/process_manager.py
+++ b/server/services/process_manager.py
@@ -77,7 +77,7 @@ class AgentProcessManager:
        self.project_dir = project_dir
        self.root_dir = root_dir
        self.process: subprocess.Popen | None = None
-        self._status: Literal["stopped", "running", "paused", "crashed"] = "stopped"
+        self._status: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"] = "stopped"
        self.started_at: datetime | None = None
        self._output_task: asyncio.Task | None = None
        self.yolo_mode: bool = False  # YOLO mode for rapid prototyping
@@ -96,11 +96,11 @@ class AgentProcessManager:
        self.lock_file = get_agent_lock_path(self.project_dir)
    @property
-    def status(self) -> Literal["stopped", "running", "paused", "crashed"]:
+    def status(self) -> Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]:
        return self._status
    @status.setter
-    def status(self, value: Literal["stopped", "running", "paused", "crashed"]):
+    def status(self, value: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]):
        old_status = self._status
        self._status = value
        if old_status != value:
@@ -227,6 +227,28 @@ class AgentProcessManager:
        """Remove lock file."""
        self.lock_file.unlink(missing_ok=True)
    def _apply_playwright_headless(self, headless: bool) -> None:
        """Update .playwright/cli.config.json with the current headless setting.
        playwright-cli reads this config file on each ``open`` command, so
        updating it before the agent starts is sufficient.
        """
        config_file = self.project_dir / ".playwright" / "cli.config.json"
        if not config_file.exists():
            return
        try:
            import json
            config = json.loads(config_file.read_text(encoding="utf-8"))
            launch_opts = config.get("browser", {}).get("launchOptions", {})
            if launch_opts.get("headless") == headless:
                return  # already correct
            launch_opts["headless"] = headless
            config.setdefault("browser", {})["launchOptions"] = launch_opts
            config_file.write_text(json.dumps(config, indent=2) + "\n", encoding="utf-8")
            logger.info("Set playwright headless=%s for %s", headless, self.project_name)
        except Exception:
            logger.warning("Failed to update playwright config", exc_info=True)
    def _cleanup_stale_features(self) -> None:
        """Clear in_progress flag for all features when agent stops/crashes.
@@ -255,7 +277,7 @@ class AgentProcessManager:
                ).all()
                if stuck:
                    for f in stuck:
-                        f.in_progress = False
+                        f.in_progress = False  # type: ignore[assignment]
                    session.commit()
                    logger.info(
                        "Cleaned up %d stuck feature(s) for %s",
@@ -308,6 +330,12 @@ class AgentProcessManager:
                    for help_line in AUTH_ERROR_HELP.strip().split('\n'):
                        await self._broadcast_output(help_line)
                # Detect graceful pause status transitions from orchestrator output
                if "All agents drained - paused." in decoded:
                    self.status = "paused_graceful"
                elif "Resuming from graceful pause..." in decoded:
                    self.status = "running"
                await self._broadcast_output(sanitized)
        except asyncio.CancelledError:
@@ -318,7 +346,7 @@ class AgentProcessManager:
            # Check if process ended
            if self.process and self.process.poll() is not None:
                exit_code = self.process.returncode
-                if exit_code != 0 and self.status == "running":
+                if exit_code != 0 and self.status in ("running", "pausing", "paused_graceful"):
                    # Check buffered output for auth errors if we haven't detected one yet
                    if not auth_error_detected:
                        combined_output = '\n'.join(output_buffer)
@@ -326,10 +354,16 @@ class AgentProcessManager:
                            for help_line in AUTH_ERROR_HELP.strip().split('\n'):
                                await self._broadcast_output(help_line)
                    self.status = "crashed"
-                elif self.status == "running":
+                elif self.status in ("running", "pausing", "paused_graceful"):
                    self.status = "stopped"
                self._cleanup_stale_features()
                self._remove_lock()
                # Clean up drain signal file if present
                try:
                    from autoforge_paths import get_pause_drain_path
                    get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
                except Exception:
                    pass
    async def start(
        self,
@@ -340,6 +374,7 @@ class AgentProcessManager:
        testing_agent_ratio: int = 1,
        playwright_headless: bool = True,
        batch_size: int = 3,
        testing_batch_size: int = 3,
    ) -> tuple[bool, str]:
        """
        Start the agent as a subprocess.
@@ -355,12 +390,21 @@ class AgentProcessManager:
        Returns:
            Tuple of (success, message)
        """
-        if self.status in ("running", "paused"):
+        if self.status in ("running", "paused", "pausing", "paused_graceful"):
            return False, f"Agent is already {self.status}"
        if not self._check_lock():
            return False, "Another agent instance is already running for this project"
        # Clean up stale browser daemons from previous runs
        try:
            subprocess.run(
                ["playwright-cli", "kill-all"],
                timeout=5, capture_output=True,
            )
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
        # Clean up features stuck from a previous crash/stop
        self._cleanup_stale_features()
@@ -397,6 +441,13 @@ class AgentProcessManager:
        # Add --batch-size flag for multi-feature batching
        cmd.extend(["--batch-size", str(batch_size)])
        # Add --testing-batch-size flag for testing agent batching
        cmd.extend(["--testing-batch-size", str(testing_batch_size)])
        # Apply headless setting to .playwright/cli.config.json so playwright-cli
        # picks it up (the only mechanism it supports for headless control)
        self._apply_playwright_headless(playwright_headless)
        try:
            # Start subprocess with piped stdout/stderr
            # Use project_dir as cwd so Claude SDK sandbox allows access to project files
@@ -409,7 +460,7 @@ class AgentProcessManager:
            subprocess_env = {
                **os.environ,
                "PYTHONUNBUFFERED": "1",
-                "PLAYWRIGHT_HEADLESS": "true" if playwright_headless else "false",
+                "PLAYWRIGHT_CLI_SESSION": f"agent-{self.project_name}-{os.getpid()}",
                "NODE_COMPILE_CACHE": "",  # Disable V8 compile caching to prevent .node file accumulation in %TEMP%
                **api_env,
            }
@@ -469,6 +520,15 @@ class AgentProcessManager:
                except asyncio.CancelledError:
                    pass
            # Kill browser daemons before stopping agent
            try:
                subprocess.run(
                    ["playwright-cli", "kill-all"],
                    timeout=5, capture_output=True,
                )
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                pass
            # CRITICAL: Kill entire process tree, not just orchestrator
            # This ensures all spawned coding/testing agents are also terminated
            proc = self.process  # Capture reference before async call
@@ -482,6 +542,12 @@ class AgentProcessManager:
            self._cleanup_stale_features()
            self._remove_lock()
            # Clean up drain signal file if present
            try:
                from autoforge_paths import get_pause_drain_path
                get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
            except Exception:
                pass
            self.status = "stopped"
            self.process = None
            self.started_at = None
@@ -542,6 +608,47 @@ class AgentProcessManager:
            logger.exception("Failed to resume agent")
            return False, f"Failed to resume agent: {e}"
    async def graceful_pause(self) -> tuple[bool, str]:
        """Request a graceful pause (drain mode).
        Creates a signal file that the orchestrator polls. Running agents
        finish their current work before the orchestrator enters a paused state.
        Returns:
            Tuple of (success, message)
        """
        if not self.process or self.status not in ("running",):
            return False, "Agent is not running"
        try:
            from autoforge_paths import get_pause_drain_path
            drain_path = get_pause_drain_path(self.project_dir)
            drain_path.parent.mkdir(parents=True, exist_ok=True)
            drain_path.write_text(str(self.process.pid))
            self.status = "pausing"
            return True, "Graceful pause requested"
        except Exception as e:
            logger.exception("Failed to request graceful pause")
            return False, f"Failed to request graceful pause: {e}"
    async def graceful_resume(self) -> tuple[bool, str]:
        """Resume from a graceful pause by removing the drain signal file.
        Returns:
            Tuple of (success, message)
        """
        if not self.process or self.status not in ("pausing", "paused_graceful"):
            return False, "Agent is not in a graceful pause state"
        try:
            from autoforge_paths import get_pause_drain_path
            get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
            self.status = "running"
            return True, "Agent resumed from graceful pause"
        except Exception as e:
            logger.exception("Failed to resume from graceful pause")
            return False, f"Failed to resume: {e}"
    async def healthcheck(self) -> bool:
        """
        Check if the agent process is still alive.
@@ -557,8 +664,14 @@ class AgentProcessManager:
        poll = self.process.poll()
        if poll is not None:
            # Process has terminated
-            if self.status in ("running", "paused"):
+            if self.status in ("running", "paused", "pausing", "paused_graceful"):
                self._cleanup_stale_features()
                # Clean up drain signal file if present
                try:
                    from autoforge_paths import get_pause_drain_path
                    get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
                except Exception:
                    pass
                self.status = "crashed"
                self._remove_lock()
            return False
@@ -643,8 +756,14 @@ def cleanup_orphaned_locks() -> int:
            if not project_path.exists():
                continue
            # Clean up stale drain signal files
            from autoforge_paths import get_autoforge_dir, get_pause_drain_path
            drain_file = get_pause_drain_path(project_path)
            if drain_file.exists():
                drain_file.unlink(missing_ok=True)
                logger.info("Removed stale drain signal file for project '%s'", name)
            # Check both legacy and new locations for lock files
            from autoforge_paths import get_autoforge_dir
            lock_locations = [
                project_path / ".agent.lock",
                get_autoforge_dir(project_path) / ".agent.lock",
--- a/server/services/spec_chat_session.py
+++ b/server/services/spec_chat_session.py
@@ -19,7 +19,12 @@ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv
 from ..schemas import ImageAttachment
-from .chat_constants import ROOT_DIR, make_multimodal_message
+from .chat_constants import (
    ROOT_DIR,
    check_rate_limit_error,
    make_multimodal_message,
    safe_receive_response,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -304,117 +309,125 @@ class SpecChatSession:
        # Store paths for the completion message
        spec_path = None
-        # Stream the response using receive_response
+        # Stream the response
-        async for msg in self.client.receive_response():
+        try:
-            msg_type = type(msg).__name__
+            async for msg in safe_receive_response(self.client, logger):
                msg_type = type(msg).__name__
-            if msg_type == "AssistantMessage" and hasattr(msg, "content"):
+                if msg_type == "AssistantMessage" and hasattr(msg, "content"):
-                # Process content blocks in the assistant message
+                    # Process content blocks in the assistant message
-                for block in msg.content:
+                    for block in msg.content:
-                    block_type = type(block).__name__
+                        block_type = type(block).__name__
-                    if block_type == "TextBlock" and hasattr(block, "text"):
+                        if block_type == "TextBlock" and hasattr(block, "text"):
-                        # Accumulate text and yield it
+                            # Accumulate text and yield it
-                        text = block.text
+                            text = block.text
-                        if text:
+                            if text:
-                            current_text += text
+                                current_text += text
-                            yield {"type": "text", "content": text}
+                                yield {"type": "text", "content": text}
-                            # Store in message history
+                                # Store in message history
-                            self.messages.append({
+                                self.messages.append({
-                                "role": "assistant",
+                                    "role": "assistant",
-                                "content": text,
+                                    "content": text,
-                                "timestamp": datetime.now().isoformat()
+                                    "timestamp": datetime.now().isoformat()
-                            })
+                                })
-                    elif block_type == "ToolUseBlock" and hasattr(block, "name"):
+                        elif block_type == "ToolUseBlock" and hasattr(block, "name"):
-                        tool_name = block.name
+                            tool_name = block.name
-                        tool_input = getattr(block, "input", {})
+                            tool_input = getattr(block, "input", {})
-                        tool_id = getattr(block, "id", "")
+                            tool_id = getattr(block, "id", "")
-                        if tool_name in ("Write", "Edit"):
+                            if tool_name in ("Write", "Edit"):
-                            # File being written or edited - track for verification
+                                # File being written or edited - track for verification
-                            file_path = tool_input.get("file_path", "")
+                                file_path = tool_input.get("file_path", "")
-                            # Track app_spec.txt
+                                # Track app_spec.txt
-                            if "app_spec.txt" in str(file_path):
+                                if "app_spec.txt" in str(file_path):
-                                pending_writes["app_spec"] = {
+                                    pending_writes["app_spec"] = {
-                                    "tool_id": tool_id,
+                                        "tool_id": tool_id,
-                                    "path": file_path
+                                        "path": file_path
                                }
                                logger.info(f"{tool_name} tool called for app_spec.txt: {file_path}")
                            # Track initializer_prompt.md
                            elif "initializer_prompt.md" in str(file_path):
                                pending_writes["initializer"] = {
                                    "tool_id": tool_id,
                                    "path": file_path
                                }
                                logger.info(f"{tool_name} tool called for initializer_prompt.md: {file_path}")
            elif msg_type == "UserMessage" and hasattr(msg, "content"):
                # Tool results - check for write confirmations and errors
                for block in msg.content:
                    block_type = type(block).__name__
                    if block_type == "ToolResultBlock":
                        is_error = getattr(block, "is_error", False)
                        tool_use_id = getattr(block, "tool_use_id", "")
                        if is_error:
                            content = getattr(block, "content", "Unknown error")
                            logger.warning(f"Tool error: {content}")
                            # Clear any pending writes that failed
                            for key in pending_writes:
                                pending_write = pending_writes[key]
                                if pending_write is not None and tool_use_id == pending_write.get("tool_id"):
                                    logger.error(f"{key} write failed: {content}")
                                    pending_writes[key] = None
                        else:
                            # Tool succeeded - check which file was written
                            # Check app_spec.txt
                            if pending_writes["app_spec"] and tool_use_id == pending_writes["app_spec"].get("tool_id"):
                                file_path = pending_writes["app_spec"]["path"]
                                full_path = Path(file_path) if Path(file_path).is_absolute() else self.project_dir / file_path
                                if full_path.exists():
                                    logger.info(f"app_spec.txt verified at: {full_path}")
                                    files_written["app_spec"] = True
                                    spec_path = file_path
                                    # Notify about file write (but NOT completion yet)
                                    yield {
                                        "type": "file_written",
                                        "path": str(file_path)
                                    }
-                                else:
+                                    logger.info(f"{tool_name} tool called for app_spec.txt: {file_path}")
                                    logger.error(f"app_spec.txt not found after write: {full_path}")
                                pending_writes["app_spec"] = None
-                            # Check initializer_prompt.md
+                                # Track initializer_prompt.md
-                            if pending_writes["initializer"] and tool_use_id == pending_writes["initializer"].get("tool_id"):
+                                elif "initializer_prompt.md" in str(file_path):
-                                file_path = pending_writes["initializer"]["path"]
+                                    pending_writes["initializer"] = {
-                                full_path = Path(file_path) if Path(file_path).is_absolute() else self.project_dir / file_path
+                                        "tool_id": tool_id,
-                                if full_path.exists():
+                                        "path": file_path
                                    logger.info(f"initializer_prompt.md verified at: {full_path}")
                                    files_written["initializer"] = True
                                    # Notify about file write
                                    yield {
                                        "type": "file_written",
                                        "path": str(file_path)
                                    }
-                                else:
+                                    logger.info(f"{tool_name} tool called for initializer_prompt.md: {file_path}")
                                    logger.error(f"initializer_prompt.md not found after write: {full_path}")
                                pending_writes["initializer"] = None
-                            # Check if BOTH files are now written - only then signal completion
+                elif msg_type == "UserMessage" and hasattr(msg, "content"):
-                            if files_written["app_spec"] and files_written["initializer"]:
+                    # Tool results - check for write confirmations and errors
-                                logger.info("Both app_spec.txt and initializer_prompt.md verified - signaling completion")
+                    for block in msg.content:
-                                self.complete = True
+                        block_type = type(block).__name__
-                                yield {
+                        if block_type == "ToolResultBlock":
-                                    "type": "spec_complete",
+                            is_error = getattr(block, "is_error", False)
-                                    "path": str(spec_path)
+                            tool_use_id = getattr(block, "tool_use_id", "")
-                                }
+
                            if is_error:
                                content = getattr(block, "content", "Unknown error")
                                logger.warning(f"Tool error: {content}")
                                # Clear any pending writes that failed
                                for key in pending_writes:
                                    pending_write = pending_writes[key]
                                    if pending_write is not None and tool_use_id == pending_write.get("tool_id"):
                                        logger.error(f"{key} write failed: {content}")
                                        pending_writes[key] = None
                            else:
                                # Tool succeeded - check which file was written
                                # Check app_spec.txt
                                if pending_writes["app_spec"] and tool_use_id == pending_writes["app_spec"].get("tool_id"):
                                    file_path = pending_writes["app_spec"]["path"]
                                    full_path = Path(file_path) if Path(file_path).is_absolute() else self.project_dir / file_path
                                    if full_path.exists():
                                        logger.info(f"app_spec.txt verified at: {full_path}")
                                        files_written["app_spec"] = True
                                        spec_path = file_path
                                        # Notify about file write (but NOT completion yet)
                                        yield {
                                            "type": "file_written",
                                            "path": str(file_path)
                                        }
                                    else:
                                        logger.error(f"app_spec.txt not found after write: {full_path}")
                                    pending_writes["app_spec"] = None
                                # Check initializer_prompt.md
                                if pending_writes["initializer"] and tool_use_id == pending_writes["initializer"].get("tool_id"):
                                    file_path = pending_writes["initializer"]["path"]
                                    full_path = Path(file_path) if Path(file_path).is_absolute() else self.project_dir / file_path
                                    if full_path.exists():
                                        logger.info(f"initializer_prompt.md verified at: {full_path}")
                                        files_written["initializer"] = True
                                        # Notify about file write
                                        yield {
                                            "type": "file_written",
                                            "path": str(file_path)
                                        }
                                    else:
                                        logger.error(f"initializer_prompt.md not found after write: {full_path}")
                                    pending_writes["initializer"] = None
                                # Check if BOTH files are now written - only then signal completion
                                if files_written["app_spec"] and files_written["initializer"]:
                                    logger.info("Both app_spec.txt and initializer_prompt.md verified - signaling completion")
                                    self.complete = True
                                    yield {
                                        "type": "spec_complete",
                                        "path": str(spec_path)
                                    }
        except Exception as exc:
            is_rate_limit, _ = check_rate_limit_error(exc)
            if is_rate_limit:
                logger.warning(f"Rate limited: {exc}")
                yield {"type": "error", "content": "Rate limited. Please try again later."}
                return
            raise
    def is_complete(self) -> bool:
        """Check if spec creation is complete."""
--- a/server/websocket.py
+++ b/server/websocket.py
@@ -61,7 +61,7 @@ THOUGHT_PATTERNS = [
    (re.compile(r'(?:Testing|Verifying|Running tests|Validating)\s+(.+)', re.I), 'testing'),
    (re.compile(r'(?:Error|Failed|Cannot|Unable to|Exception)\s+(.+)', re.I), 'struggling'),
    # Test results
-    (re.compile(r'(?:PASS|passed|success)', re.I), 'success'),
+    (re.compile(r'(?:PASS|passed|success)', re.I), 'testing'),
    (re.compile(r'(?:FAIL|failed|error)', re.I), 'struggling'),
 ]
@@ -78,6 +78,9 @@ ORCHESTRATOR_PATTERNS = {
    'testing_complete': re.compile(r'Feature #(\d+) testing (completed|failed)'),
    'all_complete': re.compile(r'All features complete'),
    'blocked_features': re.compile(r'(\d+) blocked by dependencies'),
    'drain_start': re.compile(r'Graceful pause requested'),
    'drain_complete': re.compile(r'All agents drained'),
    'drain_resume': re.compile(r'Resuming from graceful pause'),
 }
@@ -562,6 +565,30 @@ class OrchestratorTracker:
                    'All features complete!'
                )
            # Graceful pause (drain mode) events
            elif ORCHESTRATOR_PATTERNS['drain_start'].search(line):
                self.state = 'draining'
                update = self._create_update(
                    'drain_start',
                    'Draining active agents...'
                )
            elif ORCHESTRATOR_PATTERNS['drain_complete'].search(line):
                self.state = 'paused'
                self.coding_agents = 0
                self.testing_agents = 0
                update = self._create_update(
                    'drain_complete',
                    'All agents drained. Paused.'
                )
            elif ORCHESTRATOR_PATTERNS['drain_resume'].search(line):
                self.state = 'scheduling'
                update = self._create_update(
                    'drain_resume',
                    'Resuming feature scheduling'
                )
            return update
    def _create_update(
@@ -689,15 +716,19 @@ async def poll_progress(websocket: WebSocket, project_name: str, project_dir: Pa
    last_in_progress = -1
    last_total = -1
    last_needs_human_input = -1
    while True:
        try:
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, needs_human_input = count_passing_tests(project_dir)
            # Only send if changed
-            if passing != last_passing or in_progress != last_in_progress or total != last_total:
+            if (passing != last_passing or in_progress != last_in_progress
                    or total != last_total or needs_human_input != last_needs_human_input):
                last_passing = passing
                last_in_progress = in_progress
                last_total = total
                last_needs_human_input = needs_human_input
                percentage = (passing / total * 100) if total > 0 else 0
                await websocket.send_json({
@@ -706,6 +737,7 @@ async def poll_progress(websocket: WebSocket, project_name: str, project_dir: Pa
                    "in_progress": in_progress,
                    "total": total,
                    "percentage": round(percentage, 1),
                    "needs_human_input": needs_human_input,
                })
            await asyncio.sleep(2)  # Poll every 2 seconds
@@ -858,7 +890,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):
        # Send initial progress
        count_passing_tests = _get_count_passing_tests()
-        passing, in_progress, total = count_passing_tests(project_dir)
+        passing, in_progress, total, needs_human_input = count_passing_tests(project_dir)
        percentage = (passing / total * 100) if total > 0 else 0
        await websocket.send_json({
            "type": "progress",
@@ -866,6 +898,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):
            "in_progress": in_progress,
            "total": total,
            "percentage": round(percentage, 1),
            "needs_human_input": needs_human_input,
        })
        # Keep connection alive and handle incoming messages
--- a/start.bat
+++ b/start.bat
@@ -54,5 +54,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet
 REM Ensure playwright-cli is available for browser automation
 where playwright-cli >nul 2>&1
 if %ERRORLEVEL% neq 0 (
    echo Installing playwright-cli for browser automation...
    call npm install -g @playwright/cli >nul 2>&1
    if %ERRORLEVEL% neq 0 (
        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
    )
 )
 REM Run the app
 python start.py
--- a/start.sh
+++ b/start.sh
@@ -74,5 +74,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet
 # Ensure playwright-cli is available for browser automation
 if ! command -v playwright-cli &> /dev/null; then
    echo "Installing playwright-cli for browser automation..."
    npm install -g @playwright/cli --quiet 2>/dev/null
    if [ $? -ne 0 ]; then
        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
    fi
 fi
 # Run the app
 python start.py
--- a/start_ui.bat
+++ b/start_ui.bat
@@ -37,5 +37,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet
 REM Ensure playwright-cli is available for browser automation
 where playwright-cli >nul 2>&1
 if %ERRORLEVEL% neq 0 (
    echo Installing playwright-cli for browser automation...
    call npm install -g @playwright/cli >nul 2>&1
    if %ERRORLEVEL% neq 0 (
        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
    )
 )
 REM Run the Python launcher
 python "%~dp0start_ui.py" %*
--- a/start_ui.sh
+++ b/start_ui.sh
@@ -80,5 +80,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet
 # Ensure playwright-cli is available for browser automation
 if ! command -v playwright-cli &> /dev/null; then
    echo "Installing playwright-cli for browser automation..."
    npm install -g @playwright/cli --quiet 2>/dev/null
    if [ $? -ne 0 ]; then
        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
    fi
 fi
 # Run the Python launcher
 python start_ui.py "$@"
--- a/temp_cleanup.py
+++ b/temp_cleanup.py
@@ -125,14 +125,18 @@ def cleanup_stale_temp(max_age_seconds: int = MAX_AGE_SECONDS) -> dict:
 def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -> dict:
    """
-    Clean up stale screenshot files from the project root.
+    Clean up stale Playwright CLI artifacts from the project.
-    Playwright browser verification can leave .png files in the project
+    The Playwright CLI daemon saves screenshots, snapshots, and other artifacts
-    directory. This removes them after they've aged out (default 5 minutes).
+    to `{project_dir}/.playwright-cli/`. This removes them after they've aged
    out (default 5 minutes).
    Also cleans up legacy screenshot patterns from the project root (from the
    old Playwright MCP server approach).
    Args:
        project_dir: Path to the project directory.
-        max_age_seconds: Maximum age in seconds before a screenshot is deleted.
+        max_age_seconds: Maximum age in seconds before an artifact is deleted.
                        Defaults to 5 minutes (300 seconds).
    Returns:
@@ -141,13 +145,33 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -
    cutoff_time = time.time() - max_age_seconds
    stats: dict = {"files_deleted": 0, "bytes_freed": 0, "errors": []}
-    screenshot_patterns = [
+    # Clean up .playwright-cli/ directory (new CLI approach)
    playwright_cli_dir = project_dir / ".playwright-cli"
    if playwright_cli_dir.exists():
        for item in playwright_cli_dir.iterdir():
            if not item.is_file():
                continue
            try:
                mtime = item.stat().st_mtime
                if mtime < cutoff_time:
                    size = item.stat().st_size
                    item.unlink(missing_ok=True)
                    if not item.exists():
                        stats["files_deleted"] += 1
                        stats["bytes_freed"] += size
                        logger.debug(f"Deleted playwright-cli artifact: {item}")
            except Exception as e:
                stats["errors"].append(f"Failed to delete {item}: {e}")
                logger.debug(f"Failed to delete artifact {item}: {e}")
    # Legacy cleanup: root-level screenshot patterns (from old MCP server approach)
    legacy_patterns = [
        "feature*-*.png",
        "screenshot-*.png",
        "step-*.png",
    ]
-    for pattern in screenshot_patterns:
+    for pattern in legacy_patterns:
        for item in project_dir.glob(pattern):
            if not item.is_file():
                continue
@@ -159,14 +183,14 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -
                    if not item.exists():
                        stats["files_deleted"] += 1
                        stats["bytes_freed"] += size
-                        logger.debug(f"Deleted project screenshot: {item}")
+                        logger.debug(f"Deleted legacy screenshot: {item}")
            except Exception as e:
                stats["errors"].append(f"Failed to delete {item}: {e}")
                logger.debug(f"Failed to delete screenshot {item}: {e}")
    if stats["files_deleted"] > 0:
        mb_freed = stats["bytes_freed"] / (1024 * 1024)
-        logger.info(f"Screenshot cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
+        logger.info(f"Artifact cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
    return stats
--- a/test_security.py
+++ b/test_security.py
@@ -25,6 +25,7 @@ from security import (
    validate_chmod_command,
    validate_init_script,
    validate_pkill_command,
    validate_playwright_command,
    validate_project_command,
 )
@@ -923,6 +924,70 @@ pkill_processes:
    return passed, failed
 def test_playwright_cli_validation():
    """Test playwright-cli subcommand validation."""
    print("\nTesting playwright-cli validation:\n")
    passed = 0
    failed = 0
    # Test cases: (command, should_be_allowed, description)
    test_cases = [
        # Allowed cases
        ("playwright-cli screenshot", True, "screenshot allowed"),
        ("playwright-cli snapshot", True, "snapshot allowed"),
        ("playwright-cli click e5", True, "click with ref"),
        ("playwright-cli open http://localhost:3000", True, "open URL"),
        ("playwright-cli -s=agent-1 click e5", True, "session flag with click"),
        ("playwright-cli close", True, "close browser"),
        ("playwright-cli goto http://localhost:3000/page", True, "goto URL"),
        ("playwright-cli fill e3 'test value'", True, "fill form field"),
        ("playwright-cli console", True, "console messages"),
        # Blocked cases
        ("playwright-cli run-code 'await page.evaluate(() => {})'", False, "run-code blocked"),
        ("playwright-cli eval 'document.title'", False, "eval blocked"),
        ("playwright-cli -s=test eval 'document.title'", False, "eval with session flag blocked"),
    ]
    for cmd, should_allow, description in test_cases:
        allowed, reason = validate_playwright_command(cmd)
        if allowed == should_allow:
            print(f"  PASS: {cmd!r} ({description})")
            passed += 1
        else:
            expected = "allowed" if should_allow else "blocked"
            actual = "allowed" if allowed else "blocked"
            print(f"  FAIL: {cmd!r} ({description})")
            print(f"         Expected: {expected}, Got: {actual}")
            if reason:
                print(f"         Reason: {reason}")
            failed += 1
    # Integration test: verify through the security hook
    print("\n  Integration tests (via security hook):\n")
    # playwright-cli screenshot should be allowed
    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli screenshot"}}
    result = asyncio.run(bash_security_hook(input_data))
    if result.get("decision") != "block":
        print("  PASS: playwright-cli screenshot allowed via hook")
        passed += 1
    else:
        print(f"  FAIL: playwright-cli screenshot should be allowed: {result.get('reason')}")
        failed += 1
    # playwright-cli run-code should be blocked
    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli run-code 'code'"}}
    result = asyncio.run(bash_security_hook(input_data))
    if result.get("decision") == "block":
        print("  PASS: playwright-cli run-code blocked via hook")
        passed += 1
    else:
        print("  FAIL: playwright-cli run-code should be blocked via hook")
        failed += 1
    return passed, failed
 def main():
    print("=" * 70)
    print("  SECURITY HOOK TESTS")
@@ -991,6 +1056,11 @@ def main():
    passed += pkill_passed
    failed += pkill_failed
    # Test playwright-cli validation
    pw_passed, pw_failed = test_playwright_cli_validation()
    passed += pw_passed
    failed += pw_failed
    # Commands that SHOULD be blocked
    # Note: blocklisted commands (sudo, shutdown, dd, aws) are tested in
    # test_blocklist_enforcement(). chmod validation is tested in
@@ -1012,6 +1082,9 @@ def main():
        # Shell injection attempts
        "$(echo pkill) node",
        'eval "pkill node"',
        # playwright-cli dangerous subcommands
        "playwright-cli run-code 'await page.goto(\"http://evil.com\")'",
        "playwright-cli eval 'document.cookie'",
    ]
    for cmd in dangerous:
@@ -1077,6 +1150,12 @@ def main():
        "/usr/local/bin/node app.js",
        # Combined chmod and init.sh (integration test for both validators)
        "chmod +x init.sh && ./init.sh",
        # Playwright CLI allowed commands
        "playwright-cli open http://localhost:3000",
        "playwright-cli screenshot",
        "playwright-cli snapshot",
        "playwright-cli click e5",
        "playwright-cli -s=agent-1 close",
    ]
    for cmd in safe:
--- a/ui/package-lock.json
+++ b/ui/package-lock.json
@@ -56,7 +56,7 @@
    },
    "..": {
      "name": "autoforge-ai",
-      "version": "0.1.10",
+      "version": "0.1.17",
      "license": "AGPL-3.0",
      "bin": {
        "autoforge": "bin/autoforge.js"
@@ -1991,9 +1991,9 @@
      "license": "MIT"
    },
    "node_modules/@rollup/rollup-android-arm-eabi": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.59.0.tgz",
-      "integrity": "sha512-OywsdRHrFvCdvsewAInDKCNyR3laPA2mc9bRYJ6LBp5IyvF3fvXbbNR0bSzHlZVFtn6E0xw2oZlyjg4rKCVcng==",
+      "integrity": "sha512-upnNBkA6ZH2VKGcBj9Fyl9IGNPULcjXRlg0LLeaioQWueH30p6IXtJEbKAgvyv+mJaMxSm1l6xwDXYjpEMiLMg==",
      "cpu": [
        "arm"
      ],
@@ -2005,9 +2005,9 @@
      ]
    },
    "node_modules/@rollup/rollup-android-arm64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.59.0.tgz",
-      "integrity": "sha512-Skx39Uv+u7H224Af+bDgNinitlmHyQX1K/atIA32JP3JQw6hVODX5tkbi2zof/E69M1qH2UoN3Xdxgs90mmNYw==",
+      "integrity": "sha512-hZ+Zxj3SySm4A/DylsDKZAeVg0mvi++0PYVceVyX7hemkw7OreKdCvW2oQ3T1FMZvCaQXqOTHb8qmBShoqk69Q==",
      "cpu": [
        "arm64"
      ],
@@ -2019,9 +2019,9 @@
      ]
    },
    "node_modules/@rollup/rollup-darwin-arm64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.59.0.tgz",
-      "integrity": "sha512-k43D4qta/+6Fq+nCDhhv9yP2HdeKeP56QrUUTW7E6PhZP1US6NDqpJj4MY0jBHlJivVJD5P8NxrjuobZBJTCRw==",
+      "integrity": "sha512-W2Psnbh1J8ZJw0xKAd8zdNgF9HRLkdWwwdWqubSVk0pUuQkoHnv7rx4GiF9rT4t5DIZGAsConRE3AxCdJ4m8rg==",
      "cpu": [
        "arm64"
      ],
@@ -2033,9 +2033,9 @@
      ]
    },
    "node_modules/@rollup/rollup-darwin-x64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.59.0.tgz",
-      "integrity": "sha512-cOo7biqwkpawslEfox5Vs8/qj83M/aZCSSNIWpVzfU2CYHa2G3P1UN5WF01RdTHSgCkri7XOlTdtk17BezlV3A==",
+      "integrity": "sha512-ZW2KkwlS4lwTv7ZVsYDiARfFCnSGhzYPdiOU4IM2fDbL+QGlyAbjgSFuqNRbSthybLbIJ915UtZBtmuLrQAT/w==",
      "cpu": [
        "x64"
      ],
@@ -2047,9 +2047,9 @@
      ]
    },
    "node_modules/@rollup/rollup-freebsd-arm64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.59.0.tgz",
-      "integrity": "sha512-miSvuFkmvFbgJ1BevMa4CPCFt5MPGw094knM64W9I0giUIMMmRYcGW/JWZDriaw/k1kOBtsWh1z6nIFV1vPNtA==",
+      "integrity": "sha512-EsKaJ5ytAu9jI3lonzn3BgG8iRBjV4LxZexygcQbpiU0wU0ATxhNVEpXKfUa0pS05gTcSDMKpn3Sx+QB9RlTTA==",
      "cpu": [
        "arm64"
      ],
@@ -2061,9 +2061,9 @@
      ]
    },
    "node_modules/@rollup/rollup-freebsd-x64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.59.0.tgz",
-      "integrity": "sha512-KGXIs55+b/ZfZsq9aR026tmr/+7tq6VG6MsnrvF4H8VhwflTIuYh+LFUlIsRdQSgrgmtM3fVATzEAj4hBQlaqQ==",
+      "integrity": "sha512-d3DuZi2KzTMjImrxoHIAODUZYoUUMsuUiY4SRRcJy6NJoZ6iIqWnJu9IScV9jXysyGMVuW+KNzZvBLOcpdl3Vg==",
      "cpu": [
        "x64"
      ],
@@ -2075,9 +2075,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-arm-gnueabihf": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.59.0.tgz",
-      "integrity": "sha512-EHMUcDwhtdRGlXZsGSIuXSYwD5kOT9NVnx9sqzYiwAc91wfYOE1g1djOEDseZJKKqtHAHGwnGPQu3kytmfaXLQ==",
+      "integrity": "sha512-t4ONHboXi/3E0rT6OZl1pKbl2Vgxf9vJfWgmUoCEVQVxhW6Cw/c8I6hbbu7DAvgp82RKiH7TpLwxnJeKv2pbsw==",
      "cpu": [
        "arm"
      ],
@@ -2089,9 +2089,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-arm-musleabihf": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.59.0.tgz",
-      "integrity": "sha512-+pBrqEjaakN2ySv5RVrj/qLytYhPKEUwk+e3SFU5jTLHIcAtqh2rLrd/OkbNuHJpsBgxsD8ccJt5ga/SeG0JmA==",
+      "integrity": "sha512-CikFT7aYPA2ufMD086cVORBYGHffBo4K8MQ4uPS/ZnY54GKj36i196u8U+aDVT2LX4eSMbyHtyOh7D7Zvk2VvA==",
      "cpu": [
        "arm"
      ],
@@ -2103,9 +2103,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-arm64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.59.0.tgz",
-      "integrity": "sha512-NSqc7rE9wuUaRBsBp5ckQ5CVz5aIRKCwsoa6WMF7G01sX3/qHUw/z4pv+D+ahL1EIKy6Enpcnz1RY8pf7bjwng==",
+      "integrity": "sha512-jYgUGk5aLd1nUb1CtQ8E+t5JhLc9x5WdBKew9ZgAXg7DBk0ZHErLHdXM24rfX+bKrFe+Xp5YuJo54I5HFjGDAA==",
      "cpu": [
        "arm64"
      ],
@@ -2117,9 +2117,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-arm64-musl": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.59.0.tgz",
-      "integrity": "sha512-gr5vDbg3Bakga5kbdpqx81m2n9IX8M6gIMlQQIXiLTNeQW6CucvuInJ91EuCJ/JYvc+rcLLsDFcfAD1K7fMofg==",
+      "integrity": "sha512-peZRVEdnFWZ5Bh2KeumKG9ty7aCXzzEsHShOZEFiCQlDEepP1dpUl/SrUNXNg13UmZl+gzVDPsiCwnV1uI0RUA==",
      "cpu": [
        "arm64"
      ],
@@ -2131,9 +2131,23 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-loong64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.59.0.tgz",
-      "integrity": "sha512-gsrtB1NA3ZYj2vq0Rzkylo9ylCtW/PhpLEivlgWe0bpgtX5+9j9EZa0wtZiCjgu6zmSeZWyI/e2YRX1URozpIw==",
+      "integrity": "sha512-gbUSW/97f7+r4gHy3Jlup8zDG190AuodsWnNiXErp9mT90iCy9NKKU0Xwx5k8VlRAIV2uU9CsMnEFg/xXaOfXg==",
      "cpu": [
        "loong64"
      ],
      "dev": true,
      "license": "MIT",
      "optional": true,
      "os": [
        "linux"
      ]
    },
    "node_modules/@rollup/rollup-linux-loong64-musl": {
      "version": "4.59.0",
      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-musl/-/rollup-linux-loong64-musl-4.59.0.tgz",
      "integrity": "sha512-yTRONe79E+o0FWFijasoTjtzG9EBedFXJMl888NBEDCDV9I2wGbFFfJQQe63OijbFCUZqxpHz1GzpbtSFikJ4Q==",
      "cpu": [
        "loong64"
      ],
@@ -2145,9 +2159,23 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-ppc64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.59.0.tgz",
-      "integrity": "sha512-y3qNOfTBStmFNq+t4s7Tmc9hW2ENtPg8FeUD/VShI7rKxNW7O4fFeaYbMsd3tpFlIg1Q8IapFgy7Q9i2BqeBvA==",
+      "integrity": "sha512-sw1o3tfyk12k3OEpRddF68a1unZ5VCN7zoTNtSn2KndUE+ea3m3ROOKRCZxEpmT9nsGnogpFP9x6mnLTCaoLkA==",
      "cpu": [
        "ppc64"
      ],
      "dev": true,
      "license": "MIT",
      "optional": true,
      "os": [
        "linux"
      ]
    },
    "node_modules/@rollup/rollup-linux-ppc64-musl": {
      "version": "4.59.0",
      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-musl/-/rollup-linux-ppc64-musl-4.59.0.tgz",
      "integrity": "sha512-+2kLtQ4xT3AiIxkzFVFXfsmlZiG5FXYW7ZyIIvGA7Bdeuh9Z0aN4hVyXS/G1E9bTP/vqszNIN/pUKCk/BTHsKA==",
      "cpu": [
        "ppc64"
      ],
@@ -2159,9 +2187,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-riscv64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.59.0.tgz",
-      "integrity": "sha512-89sepv7h2lIVPsFma8iwmccN7Yjjtgz0Rj/Ou6fEqg3HDhpCa+Et+YSufy27i6b0Wav69Qv4WBNl3Rs6pwhebQ==",
+      "integrity": "sha512-NDYMpsXYJJaj+I7UdwIuHHNxXZ/b/N2hR15NyH3m2qAtb/hHPA4g4SuuvrdxetTdndfj9b1WOmy73kcPRoERUg==",
      "cpu": [
        "riscv64"
      ],
@@ -2173,9 +2201,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-riscv64-musl": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.59.0.tgz",
-      "integrity": "sha512-ZcU77ieh0M2Q8Ur7D5X7KvK+UxbXeDHwiOt/CPSBTI1fBmeDMivW0dPkdqkT4rOgDjrDDBUed9x4EgraIKoR2A==",
+      "integrity": "sha512-nLckB8WOqHIf1bhymk+oHxvM9D3tyPndZH8i8+35p/1YiVoVswPid2yLzgX7ZJP0KQvnkhM4H6QZ5m0LzbyIAg==",
      "cpu": [
        "riscv64"
      ],
@@ -2187,9 +2215,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-s390x-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.59.0.tgz",
-      "integrity": "sha512-2AdWy5RdDF5+4YfG/YesGDDtbyJlC9LHmL6rZw6FurBJ5n4vFGupsOBGfwMRjBYH7qRQowT8D/U4LoSvVwOhSQ==",
+      "integrity": "sha512-oF87Ie3uAIvORFBpwnCvUzdeYUqi2wY6jRFWJAy1qus/udHFYIkplYRW+wo+GRUP4sKzYdmE1Y3+rY5Gc4ZO+w==",
      "cpu": [
        "s390x"
      ],
@@ -2201,9 +2229,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-x64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.59.0.tgz",
-      "integrity": "sha512-WGt5J8Ij/rvyqpFexxk3ffKqqbLf9AqrTBbWDk7ApGUzaIs6V+s2s84kAxklFwmMF/vBNGrVdYgbblCOFFezMQ==",
+      "integrity": "sha512-3AHmtQq/ppNuUspKAlvA8HtLybkDflkMuLK4DPo77DfthRb71V84/c4MlWJXixZz4uruIH4uaa07IqoAkG64fg==",
      "cpu": [
        "x64"
      ],
@@ -2215,9 +2243,9 @@
      ]
    },
    "node_modules/@rollup/rollup-linux-x64-musl": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.59.0.tgz",
-      "integrity": "sha512-JzQmb38ATzHjxlPHuTH6tE7ojnMKM2kYNzt44LO/jJi8BpceEC8QuXYA908n8r3CNuG/B3BV8VR3Hi1rYtmPiw==",
+      "integrity": "sha512-2UdiwS/9cTAx7qIUZB/fWtToJwvt0Vbo0zmnYt7ED35KPg13Q0ym1g442THLC7VyI6JfYTP4PiSOWyoMdV2/xg==",
      "cpu": [
        "x64"
      ],
@@ -2228,10 +2256,24 @@
        "linux"
      ]
    },
    "node_modules/@rollup/rollup-openbsd-x64": {
      "version": "4.59.0",
      "resolved": "https://registry.npmjs.org/@rollup/rollup-openbsd-x64/-/rollup-openbsd-x64-4.59.0.tgz",
      "integrity": "sha512-M3bLRAVk6GOwFlPTIxVBSYKUaqfLrn8l0psKinkCFxl4lQvOSz8ZrKDz2gxcBwHFpci0B6rttydI4IpS4IS/jQ==",
      "cpu": [
        "x64"
      ],
      "dev": true,
      "license": "MIT",
      "optional": true,
      "os": [
        "openbsd"
      ]
    },
    "node_modules/@rollup/rollup-openharmony-arm64": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.59.0.tgz",
-      "integrity": "sha512-huT3fd0iC7jigGh7n3q/+lfPcXxBi+om/Rs3yiFxjvSxbSB6aohDFXbWvlspaqjeOh+hx7DDHS+5Es5qRkWkZg==",
+      "integrity": "sha512-tt9KBJqaqp5i5HUZzoafHZX8b5Q2Fe7UjYERADll83O4fGqJ49O1FsL6LpdzVFQcpwvnyd0i+K/VSwu/o/nWlA==",
      "cpu": [
        "arm64"
      ],
@@ -2243,9 +2285,9 @@
      ]
    },
    "node_modules/@rollup/rollup-win32-arm64-msvc": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.59.0.tgz",
-      "integrity": "sha512-c2V0W1bsKIKfbLMBu/WGBz6Yci8nJ/ZJdheE0EwB73N3MvHYKiKGs3mVilX4Gs70eGeDaMqEob25Tw2Gb9Nqyw==",
+      "integrity": "sha512-V5B6mG7OrGTwnxaNUzZTDTjDS7F75PO1ae6MJYdiMu60sq0CqN5CVeVsbhPxalupvTX8gXVSU9gq+Rx1/hvu6A==",
      "cpu": [
        "arm64"
      ],
@@ -2257,9 +2299,9 @@
      ]
    },
    "node_modules/@rollup/rollup-win32-ia32-msvc": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.59.0.tgz",
-      "integrity": "sha512-woEHgqQqDCkAzrDhvDipnSirm5vxUXtSKDYTVpZG3nUdW/VVB5VdCYA2iReSj/u3yCZzXID4kuKG7OynPnB3WQ==",
+      "integrity": "sha512-UKFMHPuM9R0iBegwzKF4y0C4J9u8C6MEJgFuXTBerMk7EJ92GFVFYBfOZaSGLu6COf7FxpQNqhNS4c4icUPqxA==",
      "cpu": [
        "ia32"
      ],
@@ -2271,9 +2313,9 @@
      ]
    },
    "node_modules/@rollup/rollup-win32-x64-gnu": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.59.0.tgz",
-      "integrity": "sha512-dzAc53LOuFvHwbCEOS0rPbXp6SIhAf2txMP5p6mGyOXXw5mWY8NGGbPMPrs4P1WItkfApDathBj/NzMLUZ9rtQ==",
+      "integrity": "sha512-laBkYlSS1n2L8fSo1thDNGrCTQMmxjYY5G0WFWjFFYZkKPjsMBsgJfGf4TLxXrF6RyhI60L8TMOjBMvXiTcxeA==",
      "cpu": [
        "x64"
      ],
@@ -2285,9 +2327,9 @@
      ]
    },
    "node_modules/@rollup/rollup-win32-x64-msvc": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.59.0.tgz",
-      "integrity": "sha512-hYT5d3YNdSh3mbCU1gwQyPgQd3T2ne0A3KG8KSBdav5TiBg6eInVmV+TeR5uHufiIgSFg0XsOWGW5/RhNcSvPg==",
+      "integrity": "sha512-2HRCml6OztYXyJXAvdDXPKcawukWY2GpR5/nxKp4iBgiO3wcoEGkAaqctIbZcNB6KlUQBIqt8VYkNSj2397EfA==",
      "cpu": [
        "x64"
      ],
@@ -3042,24 +3084,37 @@
        "typescript": ">=4.8.4 <6.0.0"
      }
    },
    "node_modules/@typescript-eslint/typescript-estree/node_modules/balanced-match": {
      "version": "4.0.4",
      "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-4.0.4.tgz",
      "integrity": "sha512-BLrgEcRTwX2o6gGxGOCNyMvGSp35YofuYzw9h1IMTRmKqttAZZVU67bdb9Pr2vUHA8+j3i2tJfjO6C6+4myGTA==",
      "dev": true,
      "license": "MIT",
      "engines": {
        "node": "18 || 20 || >=22"
      }
    },
    "node_modules/@typescript-eslint/typescript-estree/node_modules/brace-expansion": {
-      "version": "2.0.2",
+      "version": "5.0.3",
-      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.2.tgz",
+      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.3.tgz",
-      "integrity": "sha512-Jt0vHyM+jmUBqojB7E1NIYadt0vI0Qxjxd2TErW94wDz+E2LAm5vKMXXwg6ZZBTHPuUlDgQHKXvjGBdfcF1ZDQ==",
+      "integrity": "sha512-fy6KJm2RawA5RcHkLa1z/ScpBeA762UF9KmZQxwIbDtRJrgLzM10depAiEQ+CXYcoiqW1/m96OAAoke2nE9EeA==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
-        "balanced-match": "^1.0.0"
+        "balanced-match": "^4.0.2"
      },
      "engines": {
        "node": "18 || 20 || >=22"
      }
    },
    "node_modules/@typescript-eslint/typescript-estree/node_modules/minimatch": {
-      "version": "9.0.5",
+      "version": "9.0.8",
-      "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-9.0.5.tgz",
+      "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-9.0.8.tgz",
-      "integrity": "sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==",
+      "integrity": "sha512-reYkDYtj/b19TeqbNZCV4q9t+Yxylf/rYBsLb42SXJatTv4/ylq5lEiAmhA/IToxO7NI2UzNMghHoHuaqDkAjw==",
      "dev": true,
      "license": "ISC",
      "dependencies": {
-        "brace-expansion": "^2.0.1"
+        "brace-expansion": "^5.0.2"
      },
      "engines": {
        "node": ">=16 || 14 >=14.17"
@@ -3227,9 +3282,9 @@
      }
    },
    "node_modules/ajv": {
-      "version": "6.12.6",
+      "version": "6.14.0",
-      "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.12.6.tgz",
+      "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.14.0.tgz",
-      "integrity": "sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g==",
+      "integrity": "sha512-IWrosm/yrn43eiKqkfkHis7QioDleaXQHdDVPKg0FSwwd/DuvyX79TZnFOnYpB7dcsFAMmtFztZuXPDvSePkFw==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
@@ -4757,9 +4812,9 @@
      }
    },
    "node_modules/lodash": {
-      "version": "4.17.21",
+      "version": "4.17.23",
-      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
+      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.23.tgz",
-      "integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
+      "integrity": "sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==",
      "license": "MIT"
    },
    "node_modules/lodash.merge": {
@@ -5664,9 +5719,9 @@
      "license": "MIT"
    },
    "node_modules/minimatch": {
-      "version": "3.1.2",
+      "version": "3.1.5",
-      "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz",
+      "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.5.tgz",
-      "integrity": "sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
+      "integrity": "sha512-VgjWUsnnT6n+NUk6eZq77zeFdpW2LWDzP6zFGrCbHXiYNul5Dzqk2HHQ5uFH2DNW5Xbp8+jVzaeNt94ssEEl4w==",
      "dev": true,
      "license": "ISC",
      "dependencies": {
@@ -6150,9 +6205,9 @@
      }
    },
    "node_modules/rollup": {
-      "version": "4.54.0",
+      "version": "4.59.0",
-      "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.54.0.tgz",
+      "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.59.0.tgz",
-      "integrity": "sha512-3nk8Y3a9Ea8szgKhinMlGMhGMw89mqule3KWczxhIzqudyHdCIOHw8WJlj/r329fACjKLEh13ZSk7oE22kyeIw==",
+      "integrity": "sha512-2oMpl67a3zCH9H79LeMcbDhXW/UmWG/y2zuqnF2jQq5uq9TbM9TVyXvA4+t+ne2IIkBdrLpAaRQAvo7YI/Yyeg==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
@@ -6166,28 +6221,31 @@
        "npm": ">=8.0.0"
      },
      "optionalDependencies": {
-        "@rollup/rollup-android-arm-eabi": "4.54.0",
+        "@rollup/rollup-android-arm-eabi": "4.59.0",
-        "@rollup/rollup-android-arm64": "4.54.0",
+        "@rollup/rollup-android-arm64": "4.59.0",
-        "@rollup/rollup-darwin-arm64": "4.54.0",
+        "@rollup/rollup-darwin-arm64": "4.59.0",
-        "@rollup/rollup-darwin-x64": "4.54.0",
+        "@rollup/rollup-darwin-x64": "4.59.0",
-        "@rollup/rollup-freebsd-arm64": "4.54.0",
+        "@rollup/rollup-freebsd-arm64": "4.59.0",
-        "@rollup/rollup-freebsd-x64": "4.54.0",
+        "@rollup/rollup-freebsd-x64": "4.59.0",
-        "@rollup/rollup-linux-arm-gnueabihf": "4.54.0",
+        "@rollup/rollup-linux-arm-gnueabihf": "4.59.0",
-        "@rollup/rollup-linux-arm-musleabihf": "4.54.0",
+        "@rollup/rollup-linux-arm-musleabihf": "4.59.0",
-        "@rollup/rollup-linux-arm64-gnu": "4.54.0",
+        "@rollup/rollup-linux-arm64-gnu": "4.59.0",
-        "@rollup/rollup-linux-arm64-musl": "4.54.0",
+        "@rollup/rollup-linux-arm64-musl": "4.59.0",
-        "@rollup/rollup-linux-loong64-gnu": "4.54.0",
+        "@rollup/rollup-linux-loong64-gnu": "4.59.0",
-        "@rollup/rollup-linux-ppc64-gnu": "4.54.0",
+        "@rollup/rollup-linux-loong64-musl": "4.59.0",
-        "@rollup/rollup-linux-riscv64-gnu": "4.54.0",
+        "@rollup/rollup-linux-ppc64-gnu": "4.59.0",
-        "@rollup/rollup-linux-riscv64-musl": "4.54.0",
+        "@rollup/rollup-linux-ppc64-musl": "4.59.0",
-        "@rollup/rollup-linux-s390x-gnu": "4.54.0",
+        "@rollup/rollup-linux-riscv64-gnu": "4.59.0",
-        "@rollup/rollup-linux-x64-gnu": "4.54.0",
+        "@rollup/rollup-linux-riscv64-musl": "4.59.0",
-        "@rollup/rollup-linux-x64-musl": "4.54.0",
+        "@rollup/rollup-linux-s390x-gnu": "4.59.0",
-        "@rollup/rollup-openharmony-arm64": "4.54.0",
+        "@rollup/rollup-linux-x64-gnu": "4.59.0",
-        "@rollup/rollup-win32-arm64-msvc": "4.54.0",
+        "@rollup/rollup-linux-x64-musl": "4.59.0",
-        "@rollup/rollup-win32-ia32-msvc": "4.54.0",
+        "@rollup/rollup-openbsd-x64": "4.59.0",
-        "@rollup/rollup-win32-x64-gnu": "4.54.0",
+        "@rollup/rollup-openharmony-arm64": "4.59.0",
-        "@rollup/rollup-win32-x64-msvc": "4.54.0",
+        "@rollup/rollup-win32-arm64-msvc": "4.59.0",
        "@rollup/rollup-win32-ia32-msvc": "4.59.0",
        "@rollup/rollup-win32-x64-gnu": "4.59.0",
        "@rollup/rollup-win32-x64-msvc": "4.59.0",
        "fsevents": "~2.3.2"
      }
    },
--- a/ui/src/App.tsx
+++ b/ui/src/App.tsx
@@ -130,7 +130,8 @@ function App() {
    const allFeatures = [
      ...(features?.pending ?? []),
      ...(features?.in_progress ?? []),
-      ...(features?.done ?? [])
+      ...(features?.done ?? []),
      ...(features?.needs_human_input ?? [])
    ]
    const feature = allFeatures.find(f => f.id === nodeId)
    if (feature) setSelectedFeature(feature)
@@ -181,7 +182,7 @@ function App() {
      // E : Expand project with AI (when project selected, has spec and has features)
      if ((e.key === 'e' || e.key === 'E') && selectedProject && hasSpec && features &&
-          (features.pending.length + features.in_progress.length + features.done.length) > 0) {
+          (features.pending.length + features.in_progress.length + features.done.length + (features.needs_human_input?.length || 0)) > 0) {
        e.preventDefault()
        setShowExpandProject(true)
      }
@@ -210,8 +211,8 @@ function App() {
        setShowKeyboardHelp(true)
      }
-      // R : Open reset modal (when project selected and agent not running)
+      // R : Open reset modal (when project selected and agent not running/draining)
-      if ((e.key === 'r' || e.key === 'R') && selectedProject && wsState.agentStatus !== 'running') {
+      if ((e.key === 'r' || e.key === 'R') && selectedProject && !['running', 'pausing', 'paused_graceful'].includes(wsState.agentStatus)) {
        e.preventDefault()
        setShowResetModal(true)
      }
@@ -245,7 +246,7 @@ function App() {
  // Combine WebSocket progress with feature data
  const progress = wsState.progress.total > 0 ? wsState.progress : {
    passing: features?.done.length ?? 0,
-    total: (features?.pending.length ?? 0) + (features?.in_progress.length ?? 0) + (features?.done.length ?? 0),
+    total: (features?.pending.length ?? 0) + (features?.in_progress.length ?? 0) + (features?.done.length ?? 0) + (features?.needs_human_input?.length ?? 0),
    percentage: 0,
  }
@@ -380,7 +381,7 @@ function App() {
                      variant="outline"
                      size="sm"
                      aria-label="Reset Project"
-                      disabled={wsState.agentStatus === 'running'}
+                      disabled={['running', 'pausing', 'paused_graceful'].includes(wsState.agentStatus)}
                    >
                      <RotateCcw size={18} />
                    </Button>
@@ -443,6 +444,7 @@ function App() {
             features.pending.length === 0 &&
             features.in_progress.length === 0 &&
             features.done.length === 0 &&
             (features.needs_human_input?.length || 0) === 0 &&
             wsState.agentStatus === 'running' && (
              <Card className="p-8 text-center">
                <CardContent className="p-0">
@@ -458,7 +460,7 @@ function App() {
            )}
            {/* View Toggle - only show when there are features */}
-            {features && (features.pending.length + features.in_progress.length + features.done.length) > 0 && (
+            {features && (features.pending.length + features.in_progress.length + features.done.length + (features.needs_human_input?.length || 0)) > 0 && (
              <div className="flex justify-center">
                <ViewToggle viewMode={viewMode} onViewModeChange={setViewMode} />
              </div>
--- a/ui/src/components/AgentControl.tsx
+++ b/ui/src/components/AgentControl.tsx
@@ -1,8 +1,10 @@
 import { useState, useEffect, useRef, useCallback } from 'react'
-import { Play, Square, Loader2, GitBranch, Clock } from 'lucide-react'
+import { Play, Square, Loader2, GitBranch, Clock, Pause, PlayCircle } from 'lucide-react'
 import {
  useStartAgent,
  useStopAgent,
  useGracefulPauseAgent,
  useGracefulResumeAgent,
  useSettings,
  useUpdateProjectSettings,
 } from '../hooks/useProjects'
@@ -60,12 +62,14 @@ export function AgentControl({ projectName, status, defaultConcurrency = 3 }: Ag
  const startAgent = useStartAgent(projectName)
  const stopAgent = useStopAgent(projectName)
  const gracefulPause = useGracefulPauseAgent(projectName)
  const gracefulResume = useGracefulResumeAgent(projectName)
  const { data: nextRun } = useNextScheduledRun(projectName)
  const [showScheduleModal, setShowScheduleModal] = useState(false)
-  const isLoading = startAgent.isPending || stopAgent.isPending
+  const isLoading = startAgent.isPending || stopAgent.isPending || gracefulPause.isPending || gracefulResume.isPending
-  const isRunning = status === 'running' || status === 'paused'
+  const isRunning = status === 'running' || status === 'paused' || status === 'pausing' || status === 'paused_graceful'
  const isLoadingStatus = status === 'loading'
  const isParallel = concurrency > 1
@@ -126,7 +130,7 @@ export function AgentControl({ projectName, status, defaultConcurrency = 3 }: Ag
          </Badge>
        )}
-        {/* Start/Stop button */}
+        {/* Start/Stop/Pause/Resume buttons */}
        {isLoadingStatus ? (
          <Button disabled variant="outline" size="sm">
            <Loader2 size={18} className="animate-spin" />
@@ -146,19 +150,69 @@ export function AgentControl({ projectName, status, defaultConcurrency = 3 }: Ag
            )}
          </Button>
        ) : (
-          <Button
+          <div className="flex items-center gap-1.5">
-            onClick={handleStop}
+            {/* Pausing indicator */}
-            disabled={isLoading}
+            {status === 'pausing' && (
-            variant="destructive"
+              <Badge variant="secondary" className="gap-1 animate-pulse">
-            size="sm"
+                <Loader2 size={12} className="animate-spin" />
-            title={yoloMode ? 'Stop Agent (YOLO Mode)' : 'Stop Agent'}
+                Pausing...
-          >
+              </Badge>
            {isLoading ? (
              <Loader2 size={18} className="animate-spin" />
            ) : (
              <Square size={18} />
            )}
-          </Button>
+
            {/* Paused indicator + Resume button */}
            {status === 'paused_graceful' && (
              <>
                <Badge variant="outline" className="gap-1">
                  Paused
                </Badge>
                <Button
                  onClick={() => gracefulResume.mutate()}
                  disabled={isLoading}
                  variant="default"
                  size="sm"
                  title="Resume agent"
                >
                  {gracefulResume.isPending ? (
                    <Loader2 size={18} className="animate-spin" />
                  ) : (
                    <PlayCircle size={18} />
                  )}
                </Button>
              </>
            )}
            {/* Graceful pause button (only when running normally) */}
            {status === 'running' && (
              <Button
                onClick={() => gracefulPause.mutate()}
                disabled={isLoading}
                variant="outline"
                size="sm"
                title="Pause agent (finish current work first)"
              >
                {gracefulPause.isPending ? (
                  <Loader2 size={18} className="animate-spin" />
                ) : (
                  <Pause size={18} />
                )}
              </Button>
            )}
            {/* Stop button (always available) */}
            <Button
              onClick={handleStop}
              disabled={isLoading}
              variant="destructive"
              size="sm"
              title="Stop Agent (immediate)"
            >
              {stopAgent.isPending ? (
                <Loader2 size={18} className="animate-spin" />
              ) : (
                <Square size={18} />
              )}
            </Button>
          </div>
        )}
        {/* Clock button to open schedule modal */}
--- a/ui/src/components/AgentMissionControl.tsx
+++ b/ui/src/components/AgentMissionControl.tsx
@@ -72,9 +72,13 @@ export function AgentMissionControl({
              ? `${agents.length} ${agents.length === 1 ? 'agent' : 'agents'} active`
              : orchestratorStatus?.state === 'initializing'
                ? 'Initializing'
-                : orchestratorStatus?.state === 'complete'
+                : orchestratorStatus?.state === 'draining'
-                  ? 'Complete'
+                  ? 'Draining'
-                  : 'Orchestrating'
+                  : orchestratorStatus?.state === 'paused'
                    ? 'Paused'
                    : orchestratorStatus?.state === 'complete'
                      ? 'Complete'
                      : 'Orchestrating'
            }
          </Badge>
        </div>
--- a/ui/src/components/AgentThought.tsx
+++ b/ui/src/components/AgentThought.tsx
@@ -63,7 +63,7 @@ export function AgentThought({ logs, agentStatus }: AgentThoughtProps) {
  // Determine if component should be visible
  const shouldShow = useMemo(() => {
    if (!thought) return false
-    if (agentStatus === 'running') return true
+    if (agentStatus === 'running' || agentStatus === 'pausing') return true
    if (agentStatus === 'paused') {
      return Date.now() - lastLogTimestamp < IDLE_TIMEOUT
    }
--- a/ui/src/components/DependencyGraph.tsx
+++ b/ui/src/components/DependencyGraph.tsx
@@ -15,7 +15,7 @@ import {
  Handle,
 } from '@xyflow/react'
 import dagre from 'dagre'
-import { CheckCircle2, Circle, Loader2, AlertTriangle, RefreshCw } from 'lucide-react'
+import { CheckCircle2, Circle, Loader2, AlertTriangle, RefreshCw, UserCircle } from 'lucide-react'
 import type { DependencyGraph as DependencyGraphData, GraphNode, ActiveAgent, AgentMascot, AgentState } from '../lib/types'
 import { AgentAvatar } from './AgentAvatar'
 import { Button } from '@/components/ui/button'
@@ -93,18 +93,20 @@ class GraphErrorBoundary extends Component<ErrorBoundaryProps, ErrorBoundaryStat
 // Custom node component
 function FeatureNode({ data }: { data: GraphNode & { onClick?: () => void; agent?: NodeAgentInfo } }) {
-  const statusColors = {
+  const statusColors: Record<string, string> = {
    pending: 'bg-yellow-100 border-yellow-300 dark:bg-yellow-900/30 dark:border-yellow-700',
    in_progress: 'bg-cyan-100 border-cyan-300 dark:bg-cyan-900/30 dark:border-cyan-700',
    done: 'bg-green-100 border-green-300 dark:bg-green-900/30 dark:border-green-700',
    blocked: 'bg-red-50 border-red-300 dark:bg-red-900/20 dark:border-red-700',
    needs_human_input: 'bg-amber-100 border-amber-300 dark:bg-amber-900/30 dark:border-amber-700',
  }
-  const textColors = {
+  const textColors: Record<string, string> = {
    pending: 'text-yellow-900 dark:text-yellow-100',
    in_progress: 'text-cyan-900 dark:text-cyan-100',
    done: 'text-green-900 dark:text-green-100',
    blocked: 'text-red-900 dark:text-red-100',
    needs_human_input: 'text-amber-900 dark:text-amber-100',
  }
  const StatusIcon = () => {
@@ -115,6 +117,8 @@ function FeatureNode({ data }: { data: GraphNode & { onClick?: () => void; agent
        return <Loader2 size={16} className={`${textColors[data.status]} animate-spin`} />
      case 'blocked':
        return <AlertTriangle size={16} className="text-destructive" />
      case 'needs_human_input':
        return <UserCircle size={16} className={textColors[data.status]} />
      default:
        return <Circle size={16} className={textColors[data.status]} />
    }
@@ -323,6 +327,8 @@ function DependencyGraphInner({ graphData, onNodeClick, activeAgents = [] }: Dep
        return '#06b6d4' // cyan-500
      case 'blocked':
        return '#ef4444' // red-500
      case 'needs_human_input':
        return '#f59e0b' // amber-500
      default:
        return '#eab308' // yellow-500
    }
--- a/ui/src/components/FeatureCard.tsx
+++ b/ui/src/components/FeatureCard.tsx
@@ -1,4 +1,4 @@
-import { CheckCircle2, Circle, Loader2, MessageCircle } from 'lucide-react'
+import { CheckCircle2, Circle, Loader2, MessageCircle, UserCircle } from 'lucide-react'
 import type { Feature, ActiveAgent } from '../lib/types'
 import { DependencyBadge } from './DependencyBadge'
 import { AgentAvatar } from './AgentAvatar'
@@ -45,7 +45,8 @@ export function FeatureCard({ feature, onClick, isInProgress, allFeatures = [],
        cursor-pointer transition-all hover:border-primary py-3
        ${isInProgress ? 'animate-pulse' : ''}
        ${feature.passes ? 'border-primary/50' : ''}
-        ${isBlocked && !feature.passes ? 'border-destructive/50 opacity-80' : ''}
+        ${feature.needs_human_input ? 'border-amber-500/50' : ''}
        ${isBlocked && !feature.passes && !feature.needs_human_input ? 'border-destructive/50 opacity-80' : ''}
        ${hasActiveAgent ? 'ring-2 ring-primary ring-offset-2' : ''}
      `}
    >
@@ -105,6 +106,11 @@ export function FeatureCard({ feature, onClick, isInProgress, allFeatures = [],
              <CheckCircle2 size={16} className="text-primary" />
              <span className="text-primary font-medium">Complete</span>
            </>
          ) : feature.needs_human_input ? (
            <>
              <UserCircle size={16} className="text-amber-500" />
              <span className="text-amber-500 font-medium">Needs Your Input</span>
            </>
          ) : isBlocked ? (
            <>
              <Circle size={16} className="text-destructive" />
--- a/ui/src/components/FeatureModal.tsx
+++ b/ui/src/components/FeatureModal.tsx
@@ -1,7 +1,8 @@
 import { useState } from 'react'
-import { X, CheckCircle2, Circle, SkipForward, Trash2, Loader2, AlertCircle, Pencil, Link2, AlertTriangle } from 'lucide-react'
+import { X, CheckCircle2, Circle, SkipForward, Trash2, Loader2, AlertCircle, Pencil, Link2, AlertTriangle, UserCircle } from 'lucide-react'
-import { useSkipFeature, useDeleteFeature, useFeatures } from '../hooks/useProjects'
+import { useSkipFeature, useDeleteFeature, useFeatures, useResolveHumanInput } from '../hooks/useProjects'
 import { EditFeatureForm } from './EditFeatureForm'
 import { HumanInputForm } from './HumanInputForm'
 import type { Feature } from '../lib/types'
 import {
  Dialog,
@@ -50,10 +51,12 @@ export function FeatureModal({ feature, projectName, onClose }: FeatureModalProp
  const deleteFeature = useDeleteFeature(projectName)
  const { data: allFeatures } = useFeatures(projectName)
  const resolveHumanInput = useResolveHumanInput(projectName)
  // Build a map of feature ID to feature for looking up dependency names
  const featureMap = new Map<number, Feature>()
  if (allFeatures) {
-    ;[...allFeatures.pending, ...allFeatures.in_progress, ...allFeatures.done].forEach(f => {
+    ;[...allFeatures.pending, ...allFeatures.in_progress, ...allFeatures.done, ...(allFeatures.needs_human_input || [])].forEach(f => {
      featureMap.set(f.id, f)
    })
  }
@@ -141,6 +144,11 @@ export function FeatureModal({ feature, projectName, onClose }: FeatureModalProp
                <CheckCircle2 size={24} className="text-primary" />
                <span className="font-semibold text-primary">COMPLETE</span>
              </>
            ) : feature.needs_human_input ? (
              <>
                <UserCircle size={24} className="text-amber-500" />
                <span className="font-semibold text-amber-500">NEEDS YOUR INPUT</span>
              </>
            ) : (
              <>
                <Circle size={24} className="text-muted-foreground" />
@@ -152,6 +160,38 @@ export function FeatureModal({ feature, projectName, onClose }: FeatureModalProp
            </span>
          </div>
          {/* Human Input Request */}
          {feature.needs_human_input && feature.human_input_request && (
            <HumanInputForm
              request={feature.human_input_request}
              onSubmit={async (fields) => {
                setError(null)
                try {
                  await resolveHumanInput.mutateAsync({ featureId: feature.id, fields })
                  onClose()
                } catch (err) {
                  setError(err instanceof Error ? err.message : 'Failed to submit response')
                }
              }}
              isLoading={resolveHumanInput.isPending}
            />
          )}
          {/* Previous Human Input Response */}
          {feature.human_input_response && !feature.needs_human_input && (
            <Alert className="border-green-500 bg-green-50 dark:bg-green-950/20">
              <CheckCircle2 className="h-4 w-4 text-green-600" />
              <AlertDescription>
                <h4 className="font-semibold mb-1 text-green-700 dark:text-green-400">Human Input Provided</h4>
                <p className="text-sm text-green-600 dark:text-green-300">
                  Response submitted{feature.human_input_response.responded_at
                    ? ` at ${new Date(feature.human_input_response.responded_at).toLocaleString()}`
                    : ''}.
                </p>
              </AlertDescription>
            </Alert>
          )}
          {/* Description */}
          <div>
            <h3 className="font-semibold mb-2 text-sm uppercase tracking-wide text-muted-foreground">
--- a/ui/src/components/HumanInputForm.tsx
+++ b/ui/src/components/HumanInputForm.tsx
@@ -0,0 +1,150 @@
 import { useState } from 'react'
 import { Loader2, UserCircle, Send } from 'lucide-react'
 import type { HumanInputRequest } from '../lib/types'
 import { Button } from '@/components/ui/button'
 import { Input } from '@/components/ui/input'
 import { Textarea } from '@/components/ui/textarea'
 import { Label } from '@/components/ui/label'
 import { Alert, AlertDescription } from '@/components/ui/alert'
 import { Switch } from '@/components/ui/switch'
 interface HumanInputFormProps {
  request: HumanInputRequest
  onSubmit: (fields: Record<string, string | boolean | string[]>) => Promise<void>
  isLoading: boolean
 }
 export function HumanInputForm({ request, onSubmit, isLoading }: HumanInputFormProps) {
  const [values, setValues] = useState<Record<string, string | boolean | string[]>>(() => {
    const initial: Record<string, string | boolean | string[]> = {}
    for (const field of request.fields) {
      if (field.type === 'boolean') {
        initial[field.id] = false
      } else {
        initial[field.id] = ''
      }
    }
    return initial
  })
  const [validationError, setValidationError] = useState<string | null>(null)
  const handleSubmit = async () => {
    // Validate required fields
    for (const field of request.fields) {
      if (field.required) {
        const val = values[field.id]
        if (val === undefined || val === null || val === '') {
          setValidationError(`"${field.label}" is required`)
          return
        }
      }
    }
    setValidationError(null)
    await onSubmit(values)
  }
  return (
    <Alert className="border-amber-500 bg-amber-50 dark:bg-amber-950/20">
      <UserCircle className="h-5 w-5 text-amber-600" />
      <AlertDescription className="space-y-4">
        <div>
          <h4 className="font-semibold text-amber-700 dark:text-amber-400">Agent needs your help</h4>
          <p className="text-sm text-amber-600 dark:text-amber-300 mt-1">
            {request.prompt}
          </p>
        </div>
        <div className="space-y-3">
          {request.fields.map((field) => (
            <div key={field.id} className="space-y-1.5">
              <Label htmlFor={`human-input-${field.id}`} className="text-sm font-medium text-foreground">
                {field.label}
                {field.required && <span className="text-destructive ml-1">*</span>}
              </Label>
              {field.type === 'text' && (
                <Input
                  id={`human-input-${field.id}`}
                  value={values[field.id] as string}
                  onChange={(e) => setValues(prev => ({ ...prev, [field.id]: e.target.value }))}
                  placeholder={field.placeholder || ''}
                  disabled={isLoading}
                />
              )}
              {field.type === 'textarea' && (
                <Textarea
                  id={`human-input-${field.id}`}
                  value={values[field.id] as string}
                  onChange={(e) => setValues(prev => ({ ...prev, [field.id]: e.target.value }))}
                  placeholder={field.placeholder || ''}
                  disabled={isLoading}
                  rows={3}
                />
              )}
              {field.type === 'select' && field.options && (
                <div className="space-y-1.5">
                  {field.options.map((option) => (
                    <label
                      key={option.value}
                      className={`flex items-center gap-2 p-2 rounded-md border cursor-pointer transition-colors
                        ${values[field.id] === option.value
                          ? 'border-primary bg-primary/10'
                          : 'border-border hover:border-primary/50'}`}
                    >
                      <input
                        type="radio"
                        name={`human-input-${field.id}`}
                        value={option.value}
                        checked={values[field.id] === option.value}
                        onChange={(e) => setValues(prev => ({ ...prev, [field.id]: e.target.value }))}
                        disabled={isLoading}
                        className="accent-primary"
                      />
                      <span className="text-sm">{option.label}</span>
                    </label>
                  ))}
                </div>
              )}
              {field.type === 'boolean' && (
                <div className="flex items-center gap-2">
                  <Switch
                    id={`human-input-${field.id}`}
                    checked={values[field.id] as boolean}
                    onCheckedChange={(checked) => setValues(prev => ({ ...prev, [field.id]: checked }))}
                    disabled={isLoading}
                  />
                  <Label htmlFor={`human-input-${field.id}`} className="text-sm">
                    {values[field.id] ? 'Yes' : 'No'}
                  </Label>
                </div>
              )}
            </div>
          ))}
        </div>
        {validationError && (
          <p className="text-sm text-destructive">{validationError}</p>
        )}
        <Button
          onClick={handleSubmit}
          disabled={isLoading}
          className="w-full"
        >
          {isLoading ? (
            <Loader2 size={16} className="animate-spin" />
          ) : (
            <>
              <Send size={16} />
              Submit Response
            </>
          )}
        </Button>
      </AlertDescription>
    </Alert>
  )
 }
--- a/ui/src/components/KanbanBoard.tsx
+++ b/ui/src/components/KanbanBoard.tsx
@@ -13,13 +13,16 @@ interface KanbanBoardProps {
 }
 export function KanbanBoard({ features, onFeatureClick, onAddFeature, onExpandProject, activeAgents = [], onCreateSpec, hasSpec = true }: KanbanBoardProps) {
-  const hasFeatures = features && (features.pending.length + features.in_progress.length + features.done.length) > 0
+  const hasFeatures = features && (features.pending.length + features.in_progress.length + features.done.length + (features.needs_human_input?.length || 0)) > 0
  // Combine all features for dependency status calculation
  const allFeatures = features
-    ? [...features.pending, ...features.in_progress, ...features.done]
+    ? [...features.pending, ...features.in_progress, ...features.done, ...(features.needs_human_input || [])]
    : []
  const needsInputCount = features?.needs_human_input?.length || 0
  const showNeedsInput = needsInputCount > 0
  if (!features) {
    return (
      <div className="grid grid-cols-1 md:grid-cols-3 gap-6">
@@ -40,7 +43,7 @@ export function KanbanBoard({ features, onFeatureClick, onAddFeature, onExpandPr
  }
  return (
-    <div className="grid grid-cols-1 md:grid-cols-3 gap-6">
+    <div className={`grid grid-cols-1 ${showNeedsInput ? 'md:grid-cols-4' : 'md:grid-cols-3'} gap-6`}>
      <KanbanColumn
        title="Pending"
        count={features.pending.length}
@@ -64,6 +67,17 @@ export function KanbanBoard({ features, onFeatureClick, onAddFeature, onExpandPr
        color="progress"
        onFeatureClick={onFeatureClick}
      />
      {showNeedsInput && (
        <KanbanColumn
          title="Needs Input"
          count={needsInputCount}
          features={features.needs_human_input}
          allFeatures={allFeatures}
          activeAgents={activeAgents}
          color="human_input"
          onFeatureClick={onFeatureClick}
        />
      )}
      <KanbanColumn
        title="Done"
        count={features.done.length}
--- a/ui/src/components/KanbanColumn.tsx
+++ b/ui/src/components/KanbanColumn.tsx
@@ -11,7 +11,7 @@ interface KanbanColumnProps {
  features: Feature[]
  allFeatures?: Feature[]
  activeAgents?: ActiveAgent[]
-  color: 'pending' | 'progress' | 'done'
+  color: 'pending' | 'progress' | 'done' | 'human_input'
  onFeatureClick: (feature: Feature) => void
  onAddFeature?: () => void
  onExpandProject?: () => void
@@ -24,6 +24,7 @@ const colorMap = {
  pending: 'border-t-4 border-t-muted',
  progress: 'border-t-4 border-t-primary',
  done: 'border-t-4 border-t-primary',
  human_input: 'border-t-4 border-t-amber-500',
 }
 export function KanbanColumn({
--- a/ui/src/components/NewProjectModal.tsx
+++ b/ui/src/components/NewProjectModal.tsx
@@ -4,14 +4,15 @@
 * Multi-step modal for creating new projects:
 * 1. Enter project name
 * 2. Select project folder
- * 3. Choose spec method (Claude or manual)
+ * 3. Choose project template (blank or agentic starter)
- * 4a. If Claude: Show SpecCreationChat
+ * 4. Choose spec method (Claude or manual)
- * 4b. If manual: Create project and close
+ * 5a. If Claude: Show SpecCreationChat
 * 5b. If manual: Create project and close
 */
-import { useState } from 'react'
+import { useRef, useState } from 'react'
 import { createPortal } from 'react-dom'
-import { Bot, FileEdit, ArrowRight, ArrowLeft, Loader2, CheckCircle2, Folder } from 'lucide-react'
+import { Bot, FileEdit, ArrowRight, ArrowLeft, Loader2, CheckCircle2, Folder, Zap, FileCode2, AlertCircle, RotateCcw } from 'lucide-react'
 import { useCreateProject } from '../hooks/useProjects'
 import { SpecCreationChat } from './SpecCreationChat'
 import { FolderBrowser } from './FolderBrowser'
@@ -32,8 +33,9 @@ import { Badge } from '@/components/ui/badge'
 import { Card, CardContent } from '@/components/ui/card'
 type InitializerStatus = 'idle' | 'starting' | 'error'
 type ScaffoldStatus = 'idle' | 'running' | 'success' | 'error'
-type Step = 'name' | 'folder' | 'method' | 'chat' | 'complete'
+type Step = 'name' | 'folder' | 'template' | 'method' | 'chat' | 'complete'
 type SpecMethod = 'claude' | 'manual'
 interface NewProjectModalProps {
@@ -57,6 +59,10 @@ export function NewProjectModal({
  const [initializerStatus, setInitializerStatus] = useState<InitializerStatus>('idle')
  const [initializerError, setInitializerError] = useState<string | null>(null)
  const [yoloModeSelected, setYoloModeSelected] = useState(false)
  const [scaffoldStatus, setScaffoldStatus] = useState<ScaffoldStatus>('idle')
  const [scaffoldOutput, setScaffoldOutput] = useState<string[]>([])
  const [scaffoldError, setScaffoldError] = useState<string | null>(null)
  const scaffoldLogRef = useRef<HTMLDivElement>(null)
  // Suppress unused variable warning - specMethod may be used in future
  void _specMethod
@@ -91,13 +97,84 @@ export function NewProjectModal({
  const handleFolderSelect = (path: string) => {
    setProjectPath(path)
-    changeStep('method')
+    changeStep('template')
  }
  const handleFolderCancel = () => {
    changeStep('name')
  }
  const handleTemplateSelect = async (choice: 'blank' | 'agentic-starter') => {
    if (choice === 'blank') {
      changeStep('method')
      return
    }
    if (!projectPath) return
    setScaffoldStatus('running')
    setScaffoldOutput([])
    setScaffoldError(null)
    try {
      const res = await fetch('/api/scaffold/run', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ template: 'agentic-starter', target_path: projectPath }),
      })
      if (!res.ok || !res.body) {
        setScaffoldStatus('error')
        setScaffoldError(`Server error: ${res.status}`)
        return
      }
      const reader = res.body.getReader()
      const decoder = new TextDecoder()
      let buffer = ''
      while (true) {
        const { done, value } = await reader.read()
        if (done) break
        buffer += decoder.decode(value, { stream: true })
        const lines = buffer.split('\n')
        buffer = lines.pop() || ''
        for (const line of lines) {
          if (!line.startsWith('data: ')) continue
          try {
            const event = JSON.parse(line.slice(6))
            if (event.type === 'output') {
              setScaffoldOutput(prev => {
                const next = [...prev, event.line]
                return next.length > 100 ? next.slice(-100) : next
              })
              // Auto-scroll
              setTimeout(() => scaffoldLogRef.current?.scrollTo(0, scaffoldLogRef.current.scrollHeight), 0)
            } else if (event.type === 'complete') {
              if (event.success) {
                setScaffoldStatus('success')
                setTimeout(() => changeStep('method'), 1200)
              } else {
                setScaffoldStatus('error')
                setScaffoldError(`Scaffold exited with code ${event.exit_code}`)
              }
            } else if (event.type === 'error') {
              setScaffoldStatus('error')
              setScaffoldError(event.message)
            }
          } catch {
            // skip malformed SSE lines
          }
        }
      }
    } catch (err) {
      setScaffoldStatus('error')
      setScaffoldError(err instanceof Error ? err.message : 'Failed to run scaffold')
    }
  }
  const handleMethodSelect = async (method: SpecMethod) => {
    setSpecMethod(method)
@@ -188,13 +265,21 @@ export function NewProjectModal({
    setInitializerStatus('idle')
    setInitializerError(null)
    setYoloModeSelected(false)
    setScaffoldStatus('idle')
    setScaffoldOutput([])
    setScaffoldError(null)
    onClose()
  }
  const handleBack = () => {
    if (step === 'method') {
-      changeStep('folder')
+      changeStep('template')
      setSpecMethod(null)
    } else if (step === 'template') {
      changeStep('folder')
      setScaffoldStatus('idle')
      setScaffoldOutput([])
      setScaffoldError(null)
    } else if (step === 'folder') {
      changeStep('name')
      setProjectPath(null)
@@ -255,6 +340,7 @@ export function NewProjectModal({
        <DialogHeader>
          <DialogTitle>
            {step === 'name' && 'Create New Project'}
            {step === 'template' && 'Choose Project Template'}
            {step === 'method' && 'Choose Setup Method'}
            {step === 'complete' && 'Project Created!'}
          </DialogTitle>
@@ -294,7 +380,127 @@ export function NewProjectModal({
          </form>
        )}
-        {/* Step 2: Spec Method */}
+        {/* Step 2: Project Template */}
        {step === 'template' && (
          <div className="space-y-4">
            {scaffoldStatus === 'idle' && (
              <>
                <DialogDescription>
                  Start with a blank project or use a pre-configured template.
                </DialogDescription>
                <div className="space-y-3">
                  <Card
                    className="cursor-pointer hover:border-primary transition-colors"
                    onClick={() => handleTemplateSelect('blank')}
                  >
                    <CardContent className="p-4">
                      <div className="flex items-start gap-4">
                        <div className="p-2 bg-secondary rounded-lg">
                          <FileCode2 size={24} className="text-secondary-foreground" />
                        </div>
                        <div className="flex-1">
                          <span className="font-semibold">Blank Project</span>
                          <p className="text-sm text-muted-foreground mt-1">
                            Start from scratch. AutoForge will scaffold your app based on the spec you define.
                          </p>
                        </div>
                      </div>
                    </CardContent>
                  </Card>
                  <Card
                    className="cursor-pointer hover:border-primary transition-colors"
                    onClick={() => handleTemplateSelect('agentic-starter')}
                  >
                    <CardContent className="p-4">
                      <div className="flex items-start gap-4">
                        <div className="p-2 bg-primary/10 rounded-lg">
                          <Zap size={24} className="text-primary" />
                        </div>
                        <div className="flex-1">
                          <div className="flex items-center gap-2">
                            <span className="font-semibold">Agentic Starter</span>
                            <Badge variant="secondary">Next.js</Badge>
                          </div>
                          <p className="text-sm text-muted-foreground mt-1">
                            Pre-configured Next.js app with BetterAuth, Drizzle ORM, Postgres, and AI capabilities.
                          </p>
                        </div>
                      </div>
                    </CardContent>
                  </Card>
                </div>
                <DialogFooter className="sm:justify-start">
                  <Button variant="ghost" onClick={handleBack}>
                    <ArrowLeft size={16} />
                    Back
                  </Button>
                </DialogFooter>
              </>
            )}
            {scaffoldStatus === 'running' && (
              <div className="space-y-3">
                <div className="flex items-center gap-2">
                  <Loader2 size={16} className="animate-spin text-primary" />
                  <span className="font-medium">Setting up Agentic Starter...</span>
                </div>
                <div
                  ref={scaffoldLogRef}
                  className="bg-muted rounded-lg p-3 max-h-60 overflow-y-auto font-mono text-xs leading-relaxed"
                >
                  {scaffoldOutput.map((line, i) => (
                    <div key={i} className="whitespace-pre-wrap break-all">{line}</div>
                  ))}
                </div>
              </div>
            )}
            {scaffoldStatus === 'success' && (
              <div className="text-center py-6">
                <div className="inline-flex items-center justify-center w-12 h-12 bg-primary/10 rounded-full mb-3">
                  <CheckCircle2 size={24} className="text-primary" />
                </div>
                <p className="font-medium">Template ready!</p>
                <p className="text-sm text-muted-foreground mt-1">Proceeding to setup method...</p>
              </div>
            )}
            {scaffoldStatus === 'error' && (
              <div className="space-y-3">
                <Alert variant="destructive">
                  <AlertCircle size={16} />
                  <AlertDescription>
                    {scaffoldError || 'An unknown error occurred'}
                  </AlertDescription>
                </Alert>
                {scaffoldOutput.length > 0 && (
                  <div className="bg-muted rounded-lg p-3 max-h-40 overflow-y-auto font-mono text-xs leading-relaxed">
                    {scaffoldOutput.slice(-10).map((line, i) => (
                      <div key={i} className="whitespace-pre-wrap break-all">{line}</div>
                    ))}
                  </div>
                )}
                <DialogFooter className="sm:justify-start gap-2">
                  <Button variant="ghost" onClick={handleBack}>
                    <ArrowLeft size={16} />
                    Back
                  </Button>
                  <Button variant="outline" onClick={() => handleTemplateSelect('agentic-starter')}>
                    <RotateCcw size={16} />
                    Retry
                  </Button>
                </DialogFooter>
              </div>
            )}
          </div>
        )}
        {/* Step 3: Spec Method */}
        {step === 'method' && (
          <div className="space-y-4">
            <DialogDescription>
--- a/ui/src/components/OrchestratorAvatar.tsx
+++ b/ui/src/components/OrchestratorAvatar.tsx
@@ -103,6 +103,10 @@ function getStateAnimation(state: OrchestratorState): string {
      return 'animate-working'
    case 'monitoring':
      return 'animate-bounce-gentle'
    case 'draining':
      return 'animate-thinking'
    case 'paused':
      return ''
    case 'complete':
      return 'animate-celebrate'
    default:
@@ -121,6 +125,10 @@ function getStateGlow(state: OrchestratorState): string {
      return 'shadow-[0_0_16px_rgba(124,58,237,0.6)]'
    case 'monitoring':
      return 'shadow-[0_0_8px_rgba(167,139,250,0.4)]'
    case 'draining':
      return 'shadow-[0_0_10px_rgba(251,191,36,0.5)]'
    case 'paused':
      return ''
    case 'complete':
      return 'shadow-[0_0_20px_rgba(112,224,0,0.6)]'
    default:
@@ -141,6 +149,10 @@ function getStateDescription(state: OrchestratorState): string {
      return 'spawning agents'
    case 'monitoring':
      return 'monitoring progress'
    case 'draining':
      return 'draining active agents'
    case 'paused':
      return 'paused'
    case 'complete':
      return 'all features complete'
    default:
--- a/ui/src/components/OrchestratorStatusCard.tsx
+++ b/ui/src/components/OrchestratorStatusCard.tsx
@@ -25,6 +25,10 @@ function getStateText(state: OrchestratorState): string {
      return 'Watching progress...'
    case 'complete':
      return 'Mission accomplished!'
    case 'draining':
      return 'Draining agents...'
    case 'paused':
      return 'Paused'
    default:
      return 'Orchestrating...'
  }
@@ -42,6 +46,10 @@ function getStateColor(state: OrchestratorState): string {
      return 'text-primary'
    case 'initializing':
      return 'text-yellow-600 dark:text-yellow-400'
    case 'draining':
      return 'text-amber-600 dark:text-amber-400'
    case 'paused':
      return 'text-muted-foreground'
    default:
      return 'text-muted-foreground'
  }
--- a/ui/src/components/ProgressDashboard.tsx
+++ b/ui/src/components/ProgressDashboard.tsx
@@ -55,7 +55,7 @@ export function ProgressDashboard({
  const showThought = useMemo(() => {
    if (!thought) return false
-    if (agentStatus === 'running') return true
+    if (agentStatus === 'running' || agentStatus === 'pausing') return true
    if (agentStatus === 'paused') {
      return Date.now() - lastLogTimestamp < IDLE_TIMEOUT
    }
--- a/ui/src/components/ProjectSelector.tsx
+++ b/ui/src/components/ProjectSelector.tsx
@@ -75,6 +75,7 @@ export function ProjectSelector({
            variant="outline"
            className="min-w-[140px] sm:min-w-[200px] justify-between"
            disabled={isLoading}
            title={selectedProjectData?.path}
          >
            {isLoading ? (
              <Loader2 size={18} className="animate-spin" />
@@ -101,6 +102,7 @@ export function ProjectSelector({
              {projects.map(project => (
                <DropdownMenuItem
                  key={project.name}
                  title={project.path}
                  className={`flex items-center justify-between cursor-pointer ${
                    project.name === selectedProject ? 'bg-primary/10' : ''
                  }`}
--- a/ui/src/components/SettingsModal.tsx
+++ b/ui/src/components/SettingsModal.tsx
@@ -10,6 +10,7 @@ import {
  DialogTitle,
 } from '@/components/ui/dialog'
 import { Switch } from '@/components/ui/switch'
 import { Slider } from '@/components/ui/slider'
 import { Label } from '@/components/ui/label'
 import { Alert, AlertDescription } from '@/components/ui/alert'
 import { Button } from '@/components/ui/button'
@@ -63,6 +64,12 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
    }
  }
  const handleTestingBatchSizeChange = (size: number) => {
    if (!updateSettings.isPending) {
      updateSettings.mutate({ testing_batch_size: size })
    }
  }
  const handleProviderChange = (providerId: string) => {
    if (!updateSettings.isPending) {
      updateSettings.mutate({ api_provider: providerId })
@@ -85,6 +92,7 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
  const handleSaveCustomBaseUrl = () => {
    if (customBaseUrlInput.trim() && !updateSettings.isPending) {
      updateSettings.mutate({ api_base_url: customBaseUrlInput.trim() })
      setCustomBaseUrlInput('')
    }
  }
@@ -102,12 +110,12 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
  const currentProviderInfo: ProviderInfo | undefined = providers.find(p => p.id === currentProvider)
  const isAlternativeProvider = currentProvider !== 'claude'
  const showAuthField = isAlternativeProvider && currentProviderInfo?.requires_auth
-  const showBaseUrlField = currentProvider === 'custom'
+  const showBaseUrlField = currentProvider === 'custom' || currentProvider === 'azure'
  const showCustomModelInput = currentProvider === 'custom' || currentProvider === 'ollama'
  return (
    <Dialog open={isOpen} onOpenChange={(open) => !open && onClose()}>
-      <DialogContent aria-describedby={undefined} className="sm:max-w-sm max-h-[85vh] overflow-y-auto">
+      <DialogContent aria-describedby={undefined} className="sm:max-w-lg max-h-[90vh] overflow-y-auto">
        <DialogHeader>
          <DialogTitle className="flex items-center gap-2">
            Settings
@@ -289,22 +297,38 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
              {showBaseUrlField && (
                <div className="space-y-2 pt-1">
                  <Label className="text-sm">Base URL</Label>
-                  <div className="flex gap-2">
+                  {settings.api_base_url && !customBaseUrlInput && (
-                    <input
+                    <div className="flex items-center gap-2 text-sm text-muted-foreground">
-                      type="text"
+                      <ShieldCheck size={14} className="text-green-500" />
-                      value={customBaseUrlInput || settings.api_base_url || ''}
+                      <span className="truncate">{settings.api_base_url}</span>
-                      onChange={(e) => setCustomBaseUrlInput(e.target.value)}
+                      <Button
-                      placeholder="https://api.example.com/v1"
+                        variant="ghost"
-                      className="flex-1 py-1.5 px-3 text-sm border rounded-md bg-background"
+                        size="sm"
-                    />
+                        className="h-auto py-0.5 px-2 text-xs shrink-0"
-                    <Button
+                        onClick={() => setCustomBaseUrlInput(settings.api_base_url || '')}
-                      size="sm"
+                      >
-                      onClick={handleSaveCustomBaseUrl}
+                        Change
-                      disabled={!customBaseUrlInput.trim() || isSaving}
+                      </Button>
-                    >
+                    </div>
-                      Save
+                  )}
-                    </Button>
+                  {(!settings.api_base_url || customBaseUrlInput) && (
-                  </div>
+                    <div className="flex gap-2">
                      <input
                        type="text"
                        value={customBaseUrlInput}
                        onChange={(e) => setCustomBaseUrlInput(e.target.value)}
                        placeholder={currentProvider === 'azure' ? 'https://your-resource.services.ai.azure.com/anthropic' : 'https://api.example.com/v1'}
                        className="flex-1 py-1.5 px-3 text-sm border rounded-md bg-background"
                      />
                      <Button
                        size="sm"
                        onClick={handleSaveCustomBaseUrl}
                        disabled={!customBaseUrlInput.trim() || isSaving}
                      >
                        Save
                      </Button>
                    </div>
                  )}
                </div>
              )}
            </div>
@@ -415,28 +439,34 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
              </div>
            </div>
-            {/* Features per Agent */}
+            {/* Features per Coding Agent */}
            <div className="space-y-2">
-              <Label className="font-medium">Features per Agent</Label>
+              <Label className="font-medium">Features per Coding Agent</Label>
              <p className="text-sm text-muted-foreground">
-                Number of features assigned to each coding agent
+                Number of features assigned to each coding agent session
              </p>
-              <div className="flex rounded-lg border overflow-hidden">
+              <Slider
-                {[1, 2, 3].map((size) => (
+                min={1}
-                  <button
+                max={15}
-                    key={size}
+                value={settings.batch_size ?? 3}
-                    onClick={() => handleBatchSizeChange(size)}
+                onChange={handleBatchSizeChange}
-                    disabled={isSaving}
+                disabled={isSaving}
-                    className={`flex-1 py-2 px-3 text-sm font-medium transition-colors ${
+              />
-                      (settings.batch_size ?? 1) === size
+            </div>
-                        ? 'bg-primary text-primary-foreground'
+
-                        : 'bg-background text-foreground hover:bg-muted'
+            {/* Features per Testing Agent */}
-                    } ${isSaving ? 'opacity-50 cursor-not-allowed' : ''}`}
+            <div className="space-y-2">
-                  >
+              <Label className="font-medium">Features per Testing Agent</Label>
-                    {size}
+              <p className="text-sm text-muted-foreground">
-                  </button>
+                Number of features assigned to each testing agent session
-                ))}
+              </p>
-              </div>
+              <Slider
                min={1}
                max={15}
                value={settings.testing_batch_size ?? 3}
                onChange={handleTestingBatchSizeChange}
                disabled={isSaving}
              />
            </div>
            {/* Update Error */}
--- a/ui/src/components/ui/slider.tsx
+++ b/ui/src/components/ui/slider.tsx
@@ -0,0 +1,44 @@
 import * as React from "react"
 import { cn } from "@/lib/utils"
 interface SliderProps extends Omit<React.InputHTMLAttributes<HTMLInputElement>, 'onChange'> {
  min: number
  max: number
  value: number
  onChange: (value: number) => void
  label?: string
 }
 function Slider({
  className,
  min,
  max,
  value,
  onChange,
  disabled,
  ...props
 }: SliderProps) {
  return (
    <div className={cn("flex items-center gap-3", className)}>
      <input
        type="range"
        min={min}
        max={max}
        value={value}
        onChange={(e) => onChange(Number(e.target.value))}
        disabled={disabled}
        className={cn(
          "slider-input h-2 w-full cursor-pointer appearance-none rounded-full bg-input transition-colors",
          "focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring",
          disabled && "cursor-not-allowed opacity-50"
        )}
        {...props}
      />
      <span className="min-w-[2ch] text-center text-sm font-semibold tabular-nums">
        {value}
      </span>
    </div>
  )
 }
 export { Slider }
--- a/ui/src/hooks/useCelebration.ts
+++ b/ui/src/hooks/useCelebration.ts
@@ -137,6 +137,7 @@ function isAllComplete(features: FeatureListResponse | undefined): boolean {
  return (
    features.pending.length === 0 &&
    features.in_progress.length === 0 &&
    (features.needs_human_input?.length || 0) === 0 &&
    features.done.length > 0
  )
 }
--- a/ui/src/hooks/useProjects.ts
+++ b/ui/src/hooks/useProjects.ts
@@ -133,6 +133,18 @@ export function useUpdateFeature(projectName: string) {
  })
 }
 export function useResolveHumanInput(projectName: string) {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: ({ featureId, fields }: { featureId: number; fields: Record<string, string | boolean | string[]> }) =>
      api.resolveHumanInput(projectName, featureId, { fields }),
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['features', projectName] })
    },
  })
 }
 // ============================================================================
 // Agent
 // ============================================================================
@@ -197,6 +209,28 @@ export function useResumeAgent(projectName: string) {
  })
 }
 export function useGracefulPauseAgent(projectName: string) {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: () => api.gracefulPauseAgent(projectName),
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['agent-status', projectName] })
    },
  })
 }
 export function useGracefulResumeAgent(projectName: string) {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: () => api.gracefulResumeAgent(projectName),
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['agent-status', projectName] })
    },
  })
 }
 // ============================================================================
 // Setup
 // ============================================================================
@@ -268,6 +302,7 @@ const DEFAULT_SETTINGS: Settings = {
  testing_agent_ratio: 1,
  playwright_headless: true,
  batch_size: 3,
  testing_batch_size: 3,
  api_provider: 'claude',
  api_base_url: null,
  api_has_auth_token: false,
--- a/ui/src/hooks/useWebSocket.ts
+++ b/ui/src/hooks/useWebSocket.ts
@@ -33,6 +33,7 @@ interface WebSocketState {
  progress: {
    passing: number
    in_progress: number
    needs_human_input: number
    total: number
    percentage: number
  }
@@ -60,7 +61,7 @@ const MAX_AGENT_LOGS = 500 // Keep last 500 log lines per agent
 export function useProjectWebSocket(projectName: string | null) {
  const [state, setState] = useState<WebSocketState>({
-    progress: { passing: 0, in_progress: 0, total: 0, percentage: 0 },
+    progress: { passing: 0, in_progress: 0, needs_human_input: 0, total: 0, percentage: 0 },
    agentStatus: 'loading',
    logs: [],
    isConnected: false,
@@ -107,6 +108,7 @@ export function useProjectWebSocket(projectName: string | null) {
                progress: {
                  passing: message.passing,
                  in_progress: message.in_progress,
                  needs_human_input: message.needs_human_input ?? 0,
                  total: message.total,
                  percentage: message.percentage,
                },
@@ -385,7 +387,7 @@ export function useProjectWebSocket(projectName: string | null) {
    // Reset state when project changes to clear stale data
    // Use 'loading' for agentStatus to show loading indicator until WebSocket provides actual status
    setState({
-      progress: { passing: 0, in_progress: 0, total: 0, percentage: 0 },
+      progress: { passing: 0, in_progress: 0, needs_human_input: 0, total: 0, percentage: 0 },
      agentStatus: 'loading',
      logs: [],
      isConnected: false,
--- a/ui/src/lib/api.ts
+++ b/ui/src/lib/api.ts
@@ -181,6 +181,17 @@ export async function createFeaturesBulk(
  })
 }
 export async function resolveHumanInput(
  projectName: string,
  featureId: number,
  response: { fields: Record<string, string | boolean | string[]> }
 ): Promise<Feature> {
  return fetchJSON(`/projects/${encodeURIComponent(projectName)}/features/${featureId}/resolve-human-input`, {
    method: 'POST',
    body: JSON.stringify(response),
  })
 }
 // ============================================================================
 // Dependency Graph API
 // ============================================================================
@@ -271,6 +282,18 @@ export async function resumeAgent(projectName: string): Promise<AgentActionRespo
  })
 }
 export async function gracefulPauseAgent(projectName: string): Promise<AgentActionResponse> {
  return fetchJSON(`/projects/${encodeURIComponent(projectName)}/agent/graceful-pause`, {
    method: 'POST',
  })
 }
 export async function gracefulResumeAgent(projectName: string): Promise<AgentActionResponse> {
  return fetchJSON(`/projects/${encodeURIComponent(projectName)}/agent/graceful-resume`, {
    method: 'POST',
  })
 }
 // ============================================================================
 // Spec Creation API
 // ============================================================================
--- a/ui/src/lib/types.ts
+++ b/ui/src/lib/types.ts
@@ -57,6 +57,26 @@ export interface ProjectPrompts {
  coding_prompt: string
 }
 // Human input types
 export interface HumanInputField {
  id: string
  label: string
  type: 'text' | 'textarea' | 'select' | 'boolean'
  required: boolean
  placeholder?: string
  options?: { value: string; label: string }[]
 }
 export interface HumanInputRequest {
  prompt: string
  fields: HumanInputField[]
 }
 export interface HumanInputResponseData {
  fields: Record<string, string | boolean | string[]>
  responded_at?: string
 }
 // Feature types
 export interface Feature {
  id: number
@@ -70,10 +90,13 @@ export interface Feature {
  dependencies?: number[]           // Optional for backwards compat
  blocked?: boolean                 // Computed by API
  blocking_dependencies?: number[]  // Computed by API
  needs_human_input?: boolean
  human_input_request?: HumanInputRequest | null
  human_input_response?: HumanInputResponseData | null
 }
 // Status type for graph nodes
-export type FeatureStatus = 'pending' | 'in_progress' | 'done' | 'blocked'
+export type FeatureStatus = 'pending' | 'in_progress' | 'done' | 'blocked' | 'needs_human_input'
 // Graph visualization types
 export interface GraphNode {
@@ -99,6 +122,7 @@ export interface FeatureListResponse {
  pending: Feature[]
  in_progress: Feature[]
  done: Feature[]
  needs_human_input: Feature[]
 }
 export interface FeatureCreate {
@@ -120,7 +144,7 @@ export interface FeatureUpdate {
 }
 // Agent types
-export type AgentStatus = 'stopped' | 'running' | 'paused' | 'crashed' | 'loading'
+export type AgentStatus = 'stopped' | 'running' | 'paused' | 'crashed' | 'loading' | 'pausing' | 'paused_graceful'
 export interface AgentStatusResponse {
  status: AgentStatus
@@ -216,6 +240,8 @@ export type OrchestratorState =
  | 'spawning'
  | 'monitoring'
  | 'complete'
  | 'draining'
  | 'paused'
 // Orchestrator event for recent activity
 export interface OrchestratorEvent {
@@ -248,6 +274,7 @@ export interface WSProgressMessage {
  in_progress: number
  total: number
  percentage: number
  needs_human_input?: number
 }
 export interface WSFeatureUpdateMessage {
@@ -552,7 +579,8 @@ export interface Settings {
  ollama_mode: boolean
  testing_agent_ratio: number  // Regression testing agents (0-3)
  playwright_headless: boolean
-  batch_size: number  // Features per coding agent batch (1-3)
+  batch_size: number  // Features per coding agent batch (1-15)
  testing_batch_size: number  // Features per testing agent batch (1-15)
  api_provider: string
  api_base_url: string | null
  api_has_auth_token: boolean
@@ -565,6 +593,7 @@ export interface SettingsUpdate {
  testing_agent_ratio?: number
  playwright_headless?: boolean
  batch_size?: number
  testing_batch_size?: number
  api_provider?: string
  api_base_url?: string
  api_auth_token?: string
--- a/ui/src/styles/globals.css
+++ b/ui/src/styles/globals.css
@@ -1472,3 +1472,53 @@
 ::-webkit-scrollbar-thumb:hover {
  background: var(--muted-foreground);
 }
 /* ============================================================================
   Slider (range input) styling
   ============================================================================ */
 .slider-input::-webkit-slider-thumb {
  -webkit-appearance: none;
  appearance: none;
  width: 16px;
  height: 16px;
  border-radius: 50%;
  background: var(--primary);
  border: 2px solid var(--primary-foreground);
  box-shadow: var(--shadow-sm);
  cursor: pointer;
  transition: transform 150ms, box-shadow 150ms;
 }
 .slider-input::-webkit-slider-thumb:hover {
  transform: scale(1.15);
  box-shadow: var(--shadow);
 }
 .slider-input::-moz-range-thumb {
  width: 16px;
  height: 16px;
  border-radius: 50%;
  background: var(--primary);
  border: 2px solid var(--primary-foreground);
  box-shadow: var(--shadow-sm);
  cursor: pointer;
  transition: transform 150ms, box-shadow 150ms;
 }
 .slider-input::-moz-range-thumb:hover {
  transform: scale(1.15);
  box-shadow: var(--shadow);
 }
 .slider-input::-webkit-slider-runnable-track {
  height: 8px;
  border-radius: 9999px;
  background: var(--input);
 }
 .slider-input::-moz-range-track {
  height: 8px;
  border-radius: 9999px;
  background: var(--input);
 }
Author	SHA1	Message	Date
Leon van Zyl	fca1f6a5e2	Merge pull request #226 from AutoForgeAI/feat/batch-size-limits-and-testing-batch-setting feat: increase batch size limits to 15 and add testing_batch_size setting	2026-03-20 13:41:54 +02:00
Auto	b15f45c094	version patch	2026-03-20 13:39:56 +02:00
Auto	f999e1937d	0.1.17	2026-03-20 13:39:23 +02:00
Auto	8b2251331d	feat: increase batch size limits to 15 and add testing_batch_size setting Batch size configuration: - Increase coding agent batch size limit from 1-3 to 1-15 - Increase testing agent batch size limit from 1-5 to 1-15 - Add separate `testing_batch_size` setting (previously only CLI-configurable) - Pass testing_batch_size through full stack: schema → settings router → agent router → process manager → CLI flag UI changes: - Replace 3-button batch size selector with range slider (1-15) - Add new Slider component (ui/src/components/ui/slider.tsx) - Add "Features per Testing Agent" slider in settings panel - Add custom slider CSS styling for webkit and mozilla Updated across: CLAUDE.md, autonomous_agent_demo.py, parallel_orchestrator.py, server/{schemas,routers,services}, and UI types/hooks/components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:39:19 +02:00
Leon van Zyl	7f875c3bbd	Merge pull request #214 from AutoForgeAI/fix/npm-audit-vulnerabilities Fix 4 npm audit vulnerabilities in UI dependencies	2026-02-26 14:10:02 +02:00
Auto	e26ca3761b	fix: resolve 4 npm audit vulnerabilities in UI dependencies Update rollup, minimatch, ajv, and lodash to patched versions via npm audit fix (2 high, 2 moderate → 0 vulnerabilities). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:09:08 +02:00
Auto	5d3c04a3c7	0.1.16	2026-02-26 14:04:56 +02:00
Leon van Zyl	df23a978cb	Merge pull request #213 from AutoForgeAI/feat/scaffold-template-selection feat: add scaffold router and project template selection	2026-02-26 14:03:16 +02:00
Auto	41c1a14ae3	feat: add scaffold router and project template selection step Add a new scaffold system that lets users choose a project template (blank or agentic starter) during project creation. This inserts a template selection step between folder selection and spec method choice. Backend: - New server/routers/scaffold.py with SSE streaming endpoint for running hardcoded scaffold commands (npx create-agentic-app) - Path validation, security checks, and cross-platform npx resolution - Registered scaffold_router in server/main.py and routers/__init__.py Frontend (NewProjectModal.tsx): - New "template" step with Blank Project and Agentic Starter cards - Real-time scaffold output streaming with auto-scroll log viewer - Success, error, and retry states with proper back-navigation - Updated step flow: name → folder → template → method → chat/complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 13:18:55 +02:00
Leon van Zyl	472064c3da	Merge pull request #212 from AutoForgeAI/fix/rate-limit-and-version-bump fix: resolve false-positive rate limit and version bump to 0.1.15	2026-02-23 13:18:02 +02:00
Auto	afc2f4ac3c	version patch	2026-02-23 13:01:20 +02:00
Auto	dceb535ade	0.1.15	2026-02-23 13:00:47 +02:00
Auto	4f102e7bc2	fix: resolve false-positive rate limit and one-message-behind in chat sessions The Claude Code CLI v2.1.45+ emits a `rate_limit_event` message type that the Python SDK v0.1.19 cannot parse, raising MessageParseError. Two bugs resulted: 1. False-positive rate limit: check_rate_limit_error() matched "rate_limit" in the exception string "Unknown message type: rate_limit_event" via both an explicit type check and a regex fallback, triggering 15-19s backoff + query re-send on every session. 2. One-message-behind: The MessageParseError killed the receive_response() async generator, but the CLI subprocess was still alive with buffered response data. Catching and returning meant the response was never consumed. The next send_message() would read the previous response first, creating a one-behind offset. Changes: - chat_constants.py: check_rate_limit_error() now returns (False, None) for any MessageParseError, blocking both false-positive paths. Added safe_receive_response() helper that retries receive_response() on MessageParseError — the SDK's decoupled producer/consumer architecture (anyio memory channel) allows the new generator to continue reading remaining messages without data loss. Removed calculate_rate_limit_backoff re-export and MAX_CHAT_RATE_LIMIT_RETRIES constant. - spec_chat_session.py, assistant_chat_session.py, expand_chat_session.py: Replaced retry-with-backoff loops with safe_receive_response() wrapper. Removed asyncio.sleep backoff, query re-send, and rate_limited yield. Cleaned up unused imports (asyncio, calculate_rate_limit_backoff, MAX_CHAT_RATE_LIMIT_RETRIES). - agent.py: Added inner retry loop around receive_response() with same MessageParseError skip-and-restart pattern. Removed early-return that truncated responses. - types.ts: Removed SpecChatRateLimitedMessage, AssistantChatRateLimitedMessage, and their union entries. - useSpecChat.ts, useAssistantChat.ts, useExpandChat.ts: Removed dead 'rate_limited' case handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:00:16 +02:00
Leon van Zyl	9af0f309b7	Merge pull request #211 from AutoForgeAI/fix/rate-limit-event-crash fix: handle rate_limit_event crash in chat sessions	2026-02-23 12:28:28 +02:00
Auto	49442f0d43	version patch	2026-02-23 12:23:02 +02:00
Auto	f786879908	0.1.14	2026-02-23 12:22:06 +02:00
Auto	dcdd06e02e	fix: handle rate_limit_event crash in chat sessions The Claude CLI sends `rate_limit_event` messages that the SDK's `parse_message()` doesn't recognize, raising `MessageParseError` and crashing all three chat session types (spec, assistant, expand). Changes: - Bump claude-agent-sdk minimum from 0.1.0 to 0.1.39 - Add `check_rate_limit_error()` helper in chat_constants.py that detects rate limits from both MessageParseError data payloads and error message text patterns - Wrap `receive_response()` loops in all three `_query_claude()` methods with retry-on-rate-limit logic (up to 3 retries with backoff) - Gracefully log and skip non-rate-limit MessageParseError instead of crashing the session - Add `rate_limited` message type to frontend TypeScript types and handle it in useSpecChat, useAssistantChat, useExpandChat hooks to show "Rate limited. Retrying in Xs..." system messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:21:49 +02:00
Auto	b7aef15c3b	feat: add VISION.md and enforce Claude Agent SDK exclusivity in PR reviews - Create VISION.md establishing AutoForge as a Claude Agent SDK wrapper exclusively, rejecting integrations with other AI SDKs/CLIs/platforms - Update review-pr.md step 6 to make vision deviation a merge blocker (previously informational only) and auto-reject PRs modifying VISION.md - Add .claude/launch.json with backend (uvicorn) and frontend (Vite) dev server configurations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 10:23:42 +02:00
Leon van Zyl	d65fa0ca56	Merge pull request #196 from CaitlynByrne/fix/pr-184-feedback Clean, well-scoped validation improvement. Thanks for the contribution, @CaitlynByrne! 🎉	2026-02-15 10:37:26 +02:00
Caitlyn Byrne	d712e58ff5	fix: stricter field validation for human input requests (#184 feedback) Validate select option structure (value/label keys, non-empty strings) and reject options on non-select field types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 07:44:30 -05:00
Auto	69d9313c07	version patch	2026-02-12 09:48:51 +02:00
Auto	a434767b41	0.1.13	2026-02-12 09:48:21 +02:00
Auto	090dcf977b	chore: enhance PR review workflow and add GLM 5 model - Add merge conflict detection as step 2 in PR review command, surfacing conflicts early before the rest of the review proceeds - Refine merge recommendations: always fix issues on the PR branch before merging rather than merging first and fixing on main afterward - Update verdict definitions (MERGE / MERGE after fixes / DON'T MERGE) with clearer action guidance for each outcome - Add GLM 5 model to the GLM API provider in registry - Clean up ui/package-lock.json (remove unnecessary peer flags) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 09:48:13 +02:00
Auto	ca5fc48443	Merge pull request #184 from CaitlynByrne/feature/blocked-for-human-input feat: add blocked for human input feature	2026-02-12 07:37:11 +02:00
Auto	d846a021b8	fix: address PR #184 review findings for blocked-for-human-input feature A) Graph view: add needs_human_input bucket to handleGraphNodeClick so clicking blocked nodes opens the feature modal B) MCP validation: validate field type enum, require options for select, enforce unique non-empty field IDs and labels C) Progress fallback: include needs_human_input in non-WebSocket total D) WebSocket: track needs_human_input count in progress state E) Cleanup guard: remove unnecessary needs_human_input check in _cleanup_stale_features (resolved via merge conflict) F) Defensive SQL: require in_progress=1 in feature_request_human_input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 07:36:48 +02:00
Auto	819ebcd112	Merge remote-tracking branch 'origin/master' into feature/blocked-for-human-input # Conflicts: # server/services/process_manager.py	2026-02-12 07:36:11 +02:00
Auto	f4636fdfd5	fix: handle pausing/draining states in UI guards and process cleanup Follow-up fixes after merging PR #183 (graceful pause/drain mode): - process_manager: _stream_output finally block now transitions from pausing/paused_graceful to crashed/stopped (not just running), and cleans up the drain signal file on process exit - App.tsx: block Reset button and R shortcut during pausing/paused_graceful - AgentThought/ProgressDashboard: keep thought bubble visible while pausing - OrchestratorAvatar: add draining/paused cases to animation, glow, and description switch statements - AgentMissionControl: show Draining/Paused badge text for new states - registry.py: remove redundant type annotation to fix mypy no-redef - process_manager.py: add type:ignore for SQLAlchemy Column assignment - websocket.py: reclassify test-pass lines as 'testing' not 'success' - review-pr.md: add post-review recommended action guidance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 07:28:37 +02:00
Leon van Zyl	c114248b09	Merge pull request #183 from CaitlynByrne/feat/pause-drain feat: add graceful pause (drain mode) for running agents	2026-02-12 07:22:01 +02:00
Auto	76dd4b8d80	version patch	2026-02-11 18:48:44 +02:00
Auto	4e84de3839	0.1.12	2026-02-11 18:48:21 +02:00
Auto	8a934c3374	fix: isolate Playwright CLI browser sessions per agent in parallel mode Set unique PLAYWRIGHT_CLI_SESSION environment variable for each spawned agent subprocess to prevent concurrent agents from sharing a single browser instance and interfering with each other's navigation. - _spawn_coding_agent: session named "coding-{feature_id}" - _spawn_coding_agent_batch: session named "coding-{primary_id}" - _spawn_testing_agent: session named "testing-{counter}" using an incrementing counter (since multiple testing agents can test overlapping features, feature ID alone isn't sufficient) Previously, after migrating from Playwright MCP to CLI, all parallel agents shared the default browser session, causing them to navigate away from each other's pages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 18:48:19 +02:00
Auto	81e8c37f29	feat: expose read-only MCP tools to all agent types, fix settings base URL handling Add feature_get_ready, feature_get_blocked, and feature_get_graph to CODING_AGENT_TOOLS, TESTING_AGENT_TOOLS, and INITIALIZER_AGENT_TOOLS. These read-only tools were available on the MCP server but blocked by the allowed_tools lists, causing "blocked/not allowed" errors when agents tried to query project state. Fix SettingsModal custom base URL input: - Remove fallback to current settings value when saving, so empty input is not silently replaced with the existing URL - Remove .trim() on the input value to prevent cursor jumping while typing - Fix "Change" button pre-fill using empty string instead of space Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 17:09:22 +02:00
Leon van Zyl	6ffbf09b91	Merge pull request #190 from nogataka/feature/azure-claude-provider feat: add Azure Anthropic (Claude) provider support	2026-02-11 16:59:35 +02:00
Auto	d1b0b73b20	version patch	2026-02-11 13:38:55 +02:00
Auto	9fb7926df1	0.1.11	2026-02-11 13:38:30 +02:00
Auto	e9873a2642	feat: migrate browser automation from Playwright MCP to CLI, fix headless setting Major changes across 21 files (755 additions, 196 deletions): Browser Automation Migration: - Add versioned project migration system (prompts.py) with content-based detection and section-level regex replacement for coding/testing prompts - Migrate STEP 5 (browser verification) and BROWSER AUTOMATION sections in coding prompt template to use playwright-cli commands - Migrate STEP 2 and AVAILABLE TOOLS sections in testing prompt template - Migration auto-runs at agent startup (autonomous_agent_demo.py), copies playwright-cli skill, scaffolds .playwright/cli.config.json, updates .gitignore, and stamps .migration_version file - Add playwright-cli command validation to security allowlist (security.py) with tests for allowed subcommands and blocked eval/run-code Headless Browser Setting Fix: - Add _apply_playwright_headless() to process_manager.py that reads/updates .playwright/cli.config.json before agent subprocess launch - Remove dead PLAYWRIGHT_HEADLESS env var that was never consumed - Settings UI toggle now correctly controls visible browser window Playwright CLI Auto-Install: - Add ensurePlaywrightCli() to lib/cli.js for npm global entry point - Add playwright-cli detection + npm install to start.bat, start.sh, start_ui.bat, start_ui.sh for all startup paths Other Improvements: - Add project folder path tooltip to ProjectSelector.tsx dropdown items - Remove legacy Playwright MCP server configuration from client.py - Update CLAUDE.md with playwright-cli skill documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 13:37:03 +02:00
Auto	f285db1ad3	add paywright cli skill	2026-02-11 08:38:53 +02:00
nogataka	d2b3ba9aee	feat: add Azure Anthropic (Claude) provider support - Add "Azure Anthropic (Claude)" to API_PROVIDERS in registry.py with ANTHROPIC_API_KEY auth (required for Claude CLI to route through custom base URL instead of default Anthropic endpoint) - Add Azure env var template to .env.example - Show Base URL input field for Azure provider in Settings UI with "Configured" state and Azure-specific placeholder - Widen Settings modal for better readability with long URLs - Add Azure endpoint detection and "Azure Mode" log label - Rename misleading "GLM Mode" fallback label to "Alternative API" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 21:29:05 +09:00
Caitlyn Byrne	656df0fd9a	feat: add "blocked for human input" feature across full stack Agents can now request structured human input when they encounter genuine blockers (API keys, design choices, external configs). The request is displayed in the UI with a dynamic form, and the human's response is stored and made available when the agent resumes. Changes span 21 files + 1 new component: - Database: 3 new columns (needs_human_input, human_input_request, human_input_response) with migration - MCP: new feature_request_human_input tool + guards on existing tools - API: new resolve-human-input endpoint, 4th feature bucket - Orchestrator: skip needs_human_input features in scheduling - Progress: 4-tuple return from count_passing_tests - WebSocket: needs_human_input count in progress messages - UI: conditional 4th Kanban column, HumanInputForm component, amber status indicators, dependency graph support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:11:35 -05:00
Caitlyn Byrne	9721368188	feat: add graceful pause (drain mode) for running agents File-based signal (.pause_drain) lets the orchestrator finish current work before pausing instead of hard-freezing the process tree. New status states pausing/paused_graceful flow through WebSocket to the UI where a Pause button, draining indicator, and Resume button are shown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 13:37:22 -05:00