feat: migrate browser automation from Playwright MCP to CLI, fix headless setting

Major changes across 21 files (755 additions, 196 deletions): Browser Automation Migration: - Add versioned project migration system (prompts.py) with content-based detection and section-level regex replacement for coding/testing prompts - Migrate STEP 5 (browser verification) and BROWSER AUTOMATION sections in coding prompt template to use playwright-cli commands - Migrate STEP 2 and AVAILABLE TOOLS sections in testing prompt template - Migration auto-runs at agent startup (autonomous_agent_demo.py), copies playwright-cli skill, scaffolds .playwright/cli.config.json, updates .gitignore, and stamps .migration_version file - Add playwright-cli command validation to security allowlist (security.py) with tests for allowed subcommands and blocked eval/run-code Headless Browser Setting Fix: - Add _apply_playwright_headless() to process_manager.py that reads/updates .playwright/cli.config.json before agent subprocess launch - Remove dead PLAYWRIGHT_HEADLESS env var that was never consumed - Settings UI toggle now correctly controls visible browser window Playwright CLI Auto-Install: - Add ensurePlaywrightCli() to lib/cli.js for npm global entry point - Add playwright-cli detection + npm install to start.bat, start.sh, start_ui.bat, start_ui.sh for all startup paths Other Improvements: - Add project folder path tooltip to ProjectSelector.tsx dropdown items - Remove legacy Playwright MCP server configuration from client.py - Update CLAUDE.md with playwright-cli skill documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 18:33:08 +00:00 · 2026-02-11 13:37:03 +02:00
parent f285db1ad3
commit e9873a2642
21 changed files with 754 additions and 195 deletions
--- a/.claude/templates/coding_prompt.template.md
+++ b/.claude/templates/coding_prompt.template.md
@@ -86,24 +86,33 @@ Implement the chosen feature thoroughly:
 **CRITICAL:** You MUST verify features through the actual UI.
-Use browser automation tools:
+Use `playwright-cli` for browser automation:
- Navigate to the app in a real browser
+- Open the browser: `playwright-cli open http://localhost:PORT`
- Interact like a human user (click, type, scroll)
+- Take a snapshot to see page elements: `playwright-cli snapshot`
- Take screenshots at each step (use inline screenshots only -- do NOT save screenshot files to disk)
+- Read the snapshot YAML file to see element refs
- Verify both functionality AND visual appearance
+- Click elements by ref: `playwright-cli click e5`
 - Type text: `playwright-cli type "search query"`
 - Fill form fields: `playwright-cli fill e3 "value"`
 - Take screenshots: `playwright-cli screenshot`
 - Read the screenshot file to verify visual appearance
 - Check console errors: `playwright-cli console`
 - Close browser when done: `playwright-cli close`
 **Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
 to `.playwright-cli/`. You will see a file link in the output. Read the file only
 when you need to verify visual appearance or find element refs.
 **DO:**
 - Test through the UI with clicks and keyboard input
- Take screenshots to verify visual appearance (inline only, never save to disk)
+- Take screenshots and read them to verify visual appearance
- Check for console errors in browser
+- Check for console errors with `playwright-cli console`
 - Verify complete user workflows end-to-end
 - Always run `playwright-cli close` when finished testing
 **DON'T:**
-
+- Only test with curl commands
- Only test with curl commands (backend testing alone is insufficient)
+- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
 - Use JavaScript evaluation to bypass UI (no shortcuts)
 - Skip visual verification
 - Mark tests passing without thorough verification
@@ -145,7 +154,7 @@ Use the feature_mark_passing tool with feature_id=42
 - Combine or consolidate features
 - Reorder features
-**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**
+**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**
 ### STEP 7: COMMIT YOUR PROGRESS
@@ -192,11 +201,15 @@ Before context fills up:
 ## BROWSER AUTOMATION
-Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in.
+Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
 `snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.
-**Screenshot rule:** Always use inline mode (base64). NEVER save screenshots as files to disk.
+**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
 subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
 save to `.playwright-cli/` -- read the files when you need to verify content.
-Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation.
+Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
 JS errors. Don't bypass UI with JavaScript evaluation.
 ---
--- a/.claude/templates/testing_prompt.template.md
+++ b/.claude/templates/testing_prompt.template.md
@@ -31,26 +31,32 @@ For the feature returned:
 1. Read and understand the feature's verification steps
 2. Navigate to the relevant part of the application
 3. Execute each verification step using browser automation
-4. Take screenshots to document the verification (inline only -- do NOT save to disk)
+4. Take screenshots and read them to verify visual appearance
 5. Check for console errors
-Use browser automation tools:
+### Browser Automation (Playwright CLI)
 **Navigation & Screenshots:**
- browser_navigate - Navigate to a URL
+- `playwright-cli open <url>` - Open browser and navigate
- browser_take_screenshot - Capture screenshot (inline mode only -- never save to disk)
+- `playwright-cli goto <url>` - Navigate to URL
- browser_snapshot - Get accessibility tree snapshot
+- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
 - `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`
 **Element Interaction:**
- browser_click - Click elements
+- `playwright-cli click <ref>` - Click elements (ref from snapshot)
- browser_type - Type text into editable elements
+- `playwright-cli type <text>` - Type text
- browser_fill_form - Fill multiple form fields
+- `playwright-cli fill <ref> <text>` - Fill form fields
- browser_select_option - Select dropdown options
+- `playwright-cli select <ref> <val>` - Select dropdown
- browser_press_key - Press keyboard keys
+- `playwright-cli press <key>` - Keyboard input
 **Debugging:**
- browser_console_messages - Get browser console output (check for errors)
+- `playwright-cli console` - Check for JS errors
- browser_network_requests - Monitor API calls
+- `playwright-cli network` - Monitor API calls
 **Cleanup:**
 - `playwright-cli close` - Close browser when done (ALWAYS do this)
 **Note:** Screenshots and snapshots save to files. Read the file to see the content.
 ### STEP 3: HANDLE RESULTS
@@ -79,7 +85,7 @@ A regression has been introduced. You MUST fix it:
 4. **Verify the fix:**
   - Run through all verification steps again
-   - Take screenshots confirming the fix (inline only, never save to disk)
+   - Take screenshots and read them to confirm the fix
 5. **Mark as passing after fix:**
   ```
@@ -98,7 +104,7 @@ A regression has been introduced. You MUST fix it:
 ---
-## AVAILABLE MCP TOOLS
+## AVAILABLE TOOLS
 ### Feature Management
 - `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
@@ -106,19 +112,17 @@ A regression has been introduced. You MUST fix it:
 - `feature_mark_failing` - Mark a feature as failing (when you find a regression)
 - `feature_mark_passing` - Mark a feature as passing (after fixing a regression)
-### Browser Automation (Playwright)
+### Browser Automation (Playwright CLI)
-All interaction tools have **built-in auto-wait** -- no manual timeouts needed.
+Use `playwright-cli` commands for browser interaction. Key commands:
-
+- `playwright-cli open <url>` - Open browser
- `browser_navigate` - Navigate to URL
+- `playwright-cli goto <url>` - Navigate to URL
- `browser_take_screenshot` - Capture screenshot (inline only, never save to disk)
+- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
- `browser_snapshot` - Get accessibility tree
+- `playwright-cli snapshot` - Get page snapshot with element refs
- `browser_click` - Click elements
+- `playwright-cli click <ref>` - Click element
- `browser_type` - Type text
+- `playwright-cli type <text>` - Type text
- `browser_fill_form` - Fill form fields
+- `playwright-cli fill <ref> <text>` - Fill form field
- `browser_select_option` - Select dropdown
+- `playwright-cli console` - Check for JS errors
- `browser_press_key` - Keyboard input
+- `playwright-cli close` - Close browser (always do this when done)
 - `browser_console_messages` - Check for JS errors
 - `browser_network_requests` - Monitor API calls
 ---
--- a/.gitignore
+++ b/.gitignore
@@ -10,6 +10,10 @@ issues/
 # Browser profiles for parallel agent execution
 .browser-profiles/
 # Playwright CLI daemon artifacts
 .playwright-cli/
 .playwright/
 # Log files
 logs/
 *.log
--- a/.npmignore
+++ b/.npmignore
@@ -28,5 +28,4 @@ start.sh
 start_ui.sh
 start_ui.py
 .claude/agents/
 .claude/skills/
 .claude/settings.json
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -85,7 +85,7 @@ python autonomous_agent_demo.py --project-dir my-app --yolo
 **What's different in YOLO mode:**
 - No regression testing
- No Playwright MCP server (browser automation disabled)
+- No Playwright CLI (browser automation disabled)
 - Features marked passing after lint/type-check succeeds
 - Faster iteration for prototyping
@@ -163,7 +163,7 @@ Publishing: `npm publish` (triggers `prepublishOnly` which builds UI, then publi
 - `autonomous_agent_demo.py` - Entry point for running the agent (supports `--yolo`, `--parallel`, `--batch-size`, `--batch-features`)
 - `autoforge_paths.py` - Central path resolution with dual-path backward compatibility and migration
 - `agent.py` - Agent session loop using Claude Agent SDK
- `client.py` - ClaudeSDKClient configuration with security hooks, MCP servers, and Vertex AI support
+- `client.py` - ClaudeSDKClient configuration with security hooks, feature MCP server, and Vertex AI support
 - `security.py` - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
 - `prompts.py` - Prompt template loading with project-specific fallback and batch feature prompts
 - `progress.py` - Progress tracking, database queries, webhook notifications
@@ -288,6 +288,9 @@ Projects can be stored in any directory (registered in `~/.autoforge/registry.db
 - `.autoforge/.agent.lock` - Lock file to prevent multiple agent instances
 - `.autoforge/allowed_commands.yaml` - Project-specific bash command allowlist (optional)
 - `.autoforge/.gitignore` - Ignores runtime files
 - `.claude/skills/playwright-cli/` - Playwright CLI skill for browser automation
 - `.playwright/cli.config.json` - Browser configuration (headless, viewport, etc.)
 - `.playwright-cli/` - Playwright CLI daemon artifacts (screenshots, snapshots) - gitignored
 - `CLAUDE.md` - Stays at project root (SDK convention)
 - `app_spec.txt` - Root copy for agent template compatibility
@@ -445,6 +448,7 @@ Alternative providers are configured via the **Settings UI** (gear icon > API Pr
 **Skills** (`.claude/skills/`):
 - `frontend-design` - Distinctive, production-grade UI design
 - `gsd-to-autoforge-spec` - Convert GSD codebase mapping to AutoForge app_spec format
 - `playwright-cli` - Browser automation via Playwright CLI (copied to each project)
 **Other:**
 - `.claude/templates/` - Prompt templates copied to new projects
@@ -479,7 +483,7 @@ When running with `--parallel`, the orchestrator:
 1. Spawns multiple Claude agents as subprocesses (up to `--max-concurrency`)
 2. Each agent claims features atomically via `feature_claim_and_get`
 3. Features blocked by unmet dependencies are skipped
-4. Browser contexts are isolated per agent using `--isolated` flag
+4. Browser sessions are isolated per agent via `PLAYWRIGHT_CLI_SESSION` environment variable
 5. AgentTracker parses output and emits `agent_update` messages for UI
 ### Process Limits (Parallel Mode)
--- a/agent.py
+++ b/agent.py
@@ -240,17 +240,7 @@ async def run_autonomous_agent(
        print_session_header(iteration, is_initializer)
        # Create client (fresh context)
-        # Pass agent_id for browser isolation in multi-agent scenarios
+        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type)
        import os
        if agent_type == "testing":
            agent_id = f"testing-{os.getpid()}"  # Unique ID for testing agents
        elif feature_ids and len(feature_ids) > 1:
            agent_id = f"batch-{feature_ids[0]}"
        elif feature_id:
            agent_id = f"feature-{feature_id}"
        else:
            agent_id = None
        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_id=agent_id, agent_type=agent_type)
        # Choose prompt based on agent type
        if agent_type == "initializer":
--- a/autoforge_paths.py
+++ b/autoforge_paths.py
@@ -43,6 +43,7 @@ assistant.db-shm
 .claude_assistant_settings.json
 .claude_settings.expand.*.json
 .progress_cache
 .migration_version
 """
--- a/autonomous_agent_demo.py
+++ b/autonomous_agent_demo.py
@@ -237,6 +237,12 @@ def main() -> None:
    if migrated:
        print(f"Migrated project files to .autoforge/: {', '.join(migrated)}", flush=True)
    # Migrate project to current AutoForge version (idempotent, safe)
    from prompts import migrate_project_to_current
    version_migrated = migrate_project_to_current(project_dir)
    if version_migrated:
        print(f"Upgraded project: {', '.join(version_migrated)}", flush=True)
    # Parse batch testing feature IDs (comma-separated string -> list[int])
    testing_feature_ids: list[int] | None = None
    if args.testing_feature_ids:
--- a/client.py
+++ b/client.py
@@ -21,16 +21,6 @@ from security import SENSITIVE_DIRECTORIES, bash_security_hook
 # Load environment variables from .env file if present
 load_dotenv()
 # Default Playwright headless mode - can be overridden via PLAYWRIGHT_HEADLESS env var
 # When True, browser runs invisibly in background (default - saves CPU)
 # When False, browser window is visible (useful for monitoring agent progress)
 DEFAULT_PLAYWRIGHT_HEADLESS = True
 # Default browser for Playwright - can be overridden via PLAYWRIGHT_BROWSER env var
 # Options: chrome, firefox, webkit, msedge
 # Firefox is recommended for lower CPU usage
 DEFAULT_PLAYWRIGHT_BROWSER = "firefox"
 # Extra read paths for cross-project file access (read-only)
 # Set EXTRA_READ_PATHS environment variable with comma-separated absolute paths
 # Example: EXTRA_READ_PATHS=/Volumes/Data/dev,/Users/shared/libs
@@ -41,6 +31,7 @@ EXTRA_READ_PATHS_VAR = "EXTRA_READ_PATHS"
 # this blocklist and the filesystem browser API share a single source of truth.
 EXTRA_READ_PATHS_BLOCKLIST = SENSITIVE_DIRECTORIES
 def convert_model_for_vertex(model: str) -> str:
    """
    Convert model name format for Vertex AI compatibility.
@@ -72,43 +63,6 @@ def convert_model_for_vertex(model: str) -> str:
    return model
 def get_playwright_headless() -> bool:
    """
    Get the Playwright headless mode setting.
    Reads from PLAYWRIGHT_HEADLESS environment variable, defaults to True.
    Returns True for headless mode (invisible browser), False for visible browser.
    """
    value = os.getenv("PLAYWRIGHT_HEADLESS", str(DEFAULT_PLAYWRIGHT_HEADLESS).lower()).strip().lower()
    truthy = {"true", "1", "yes", "on"}
    falsy = {"false", "0", "no", "off"}
    if value not in truthy | falsy:
        print(f"   - Warning: Invalid PLAYWRIGHT_HEADLESS='{value}', defaulting to {DEFAULT_PLAYWRIGHT_HEADLESS}")
        return DEFAULT_PLAYWRIGHT_HEADLESS
    return value in truthy
 # Valid browsers supported by Playwright MCP
 VALID_PLAYWRIGHT_BROWSERS = {"chrome", "firefox", "webkit", "msedge"}
 def get_playwright_browser() -> str:
    """
    Get the browser to use for Playwright.
    Reads from PLAYWRIGHT_BROWSER environment variable, defaults to firefox.
    Options: chrome, firefox, webkit, msedge
    Firefox is recommended for lower CPU usage.
    """
    value = os.getenv("PLAYWRIGHT_BROWSER", DEFAULT_PLAYWRIGHT_BROWSER).strip().lower()
    if value not in VALID_PLAYWRIGHT_BROWSERS:
        print(f"   - Warning: Invalid PLAYWRIGHT_BROWSER='{value}', "
              f"valid options: {', '.join(sorted(VALID_PLAYWRIGHT_BROWSERS))}. "
              f"Defaulting to {DEFAULT_PLAYWRIGHT_BROWSER}")
        return DEFAULT_PLAYWRIGHT_BROWSER
    return value
 def get_extra_read_paths() -> list[Path]:
    """
    Get extra read-only paths from EXTRA_READ_PATHS environment variable.
@@ -228,41 +182,6 @@ ALL_FEATURE_MCP_TOOLS = sorted(
    set(CODING_AGENT_TOOLS) | set(TESTING_AGENT_TOOLS) | set(INITIALIZER_AGENT_TOOLS)
 )
 # Playwright MCP tools for browser automation.
 # Full set of tools for comprehensive UI testing including drag-and-drop,
 # hover menus, file uploads, tab management, etc.
 PLAYWRIGHT_TOOLS = [
    # Core navigation & screenshots
    "mcp__playwright__browser_navigate",
    "mcp__playwright__browser_navigate_back",
    "mcp__playwright__browser_take_screenshot",
    "mcp__playwright__browser_snapshot",
    # Element interaction
    "mcp__playwright__browser_click",
    "mcp__playwright__browser_type",
    "mcp__playwright__browser_fill_form",
    "mcp__playwright__browser_select_option",
    "mcp__playwright__browser_press_key",
    "mcp__playwright__browser_drag",
    "mcp__playwright__browser_hover",
    "mcp__playwright__browser_file_upload",
    # JavaScript & debugging
    "mcp__playwright__browser_evaluate",
    # "mcp__playwright__browser_run_code",  # REMOVED - causes Playwright MCP server crash
    "mcp__playwright__browser_console_messages",
    "mcp__playwright__browser_network_requests",
    # Browser management
    "mcp__playwright__browser_resize",
    "mcp__playwright__browser_wait_for",
    "mcp__playwright__browser_handle_dialog",
    "mcp__playwright__browser_install",
    "mcp__playwright__browser_close",
    "mcp__playwright__browser_tabs",
 ]
 # Built-in tools available to agents.
 # WebFetch and WebSearch are included so coding agents can look up current
 # documentation for frameworks and libraries they are implementing.
@@ -282,7 +201,6 @@ def create_client(
    project_dir: Path,
    model: str,
    yolo_mode: bool = False,
    agent_id: str | None = None,
    agent_type: str = "coding",
 ):
    """
@@ -291,9 +209,7 @@ def create_client(
    Args:
        project_dir: Directory for the project
        model: Claude model to use
-        yolo_mode: If True, skip Playwright MCP server for rapid prototyping
+        yolo_mode: If True, skip browser testing for rapid prototyping
        agent_id: Optional unique identifier for browser isolation in parallel mode.
                  When provided, each agent gets its own browser profile.
        agent_type: One of "coding", "testing", or "initializer". Controls which
                    MCP tools are exposed and the max_turns limit.
@@ -327,11 +243,8 @@ def create_client(
    }
    max_turns = max_turns_map.get(agent_type, 300)
-    # Build allowed tools list based on mode and agent type.
+    # Build allowed tools list based on agent type.
    # In YOLO mode, exclude Playwright tools for faster prototyping.
    allowed_tools = [*BUILTIN_TOOLS, *feature_tools]
    if not yolo_mode:
        allowed_tools.extend(PLAYWRIGHT_TOOLS)
    # Build permissions list.
    # We permit ALL feature MCP tools at the security layer (so the MCP server
@@ -363,10 +276,6 @@ def create_client(
        permissions_list.append(f"Glob({path}/**)")
        permissions_list.append(f"Grep({path}/**)")
    if not yolo_mode:
        # Allow Playwright MCP tools for browser automation (standard mode only)
        permissions_list.extend(PLAYWRIGHT_TOOLS)
    # Create comprehensive security settings
    # Note: Using relative paths ("./**") restricts access to project directory
    # since cwd is set to project_dir
@@ -395,9 +304,9 @@ def create_client(
        print(f"   - Extra read paths (validated): {', '.join(str(p) for p in extra_read_paths)}")
    print("   - Bash commands restricted to allowlist (see security.py)")
    if yolo_mode:
-        print("   - MCP servers: features (database) - YOLO MODE (no Playwright)")
+        print("   - MCP servers: features (database) - YOLO MODE (no browser testing)")
    else:
-        print("   - MCP servers: playwright (browser), features (database)")
+        print("   - MCP servers: features (database)")
    print("   - Project settings enabled (skills, commands, CLAUDE.md)")
    print()
@@ -421,36 +330,6 @@ def create_client(
            },
        },
    }
    if not yolo_mode:
        # Include Playwright MCP server for browser automation (standard mode only)
        # Browser and headless mode configurable via environment variables
        browser = get_playwright_browser()
        playwright_args = [
            "@playwright/mcp@latest",
            "--viewport-size", "1280x720",
            "--browser", browser,
        ]
        if get_playwright_headless():
            playwright_args.append("--headless")
        print(f"   - Browser: {browser} (headless={get_playwright_headless()})")
        # Browser isolation for parallel execution
        # Each agent gets its own isolated browser context to prevent tab conflicts
        if agent_id:
            # Use --isolated for ephemeral browser context
            # This creates a fresh, isolated context without persistent state
            # Note: --isolated and --user-data-dir are mutually exclusive
            playwright_args.append("--isolated")
            print(f"   - Browser isolation enabled for agent: {agent_id}")
        mcp_servers["playwright"] = {
            "command": "npx",
            "args": playwright_args,
            "env": {
                "NODE_COMPILE_CACHE": "",  # Disable V8 compile caching to prevent .node file accumulation in %TEMP%
            },
        }
    # Build environment overrides for API endpoint configuration
    # Uses get_effective_sdk_env() which reads provider settings from the database,
    # ensuring UI-configured alternative providers (GLM, Ollama, Kimi, Custom) propagate
--- a/lib/cli.js
+++ b/lib/cli.js
@@ -517,6 +517,41 @@ function killProcess(pid) {
  }
 }
 // ---------------------------------------------------------------------------
 // Playwright CLI
 // ---------------------------------------------------------------------------
 /**
 * Ensure playwright-cli is available globally for browser automation.
 * Returns true if available (already installed or freshly installed).
 *
 * @param {boolean} showProgress - If true, print install progress
 */
 function ensurePlaywrightCli(showProgress) {
  try {
    execSync('playwright-cli --version', {
      timeout: 10_000,
      stdio: ['pipe', 'pipe', 'pipe'],
    });
    return true;
  } catch {
    // Not installed — try to install
  }
  if (showProgress) {
    log('      Installing playwright-cli for browser automation...');
  }
  try {
    execSync('npm install -g @playwright/cli', {
      timeout: 120_000,
      stdio: ['pipe', 'pipe', 'pipe'],
    });
    return true;
  } catch {
    return false;
  }
 }
 // ---------------------------------------------------------------------------
 // CLI commands
 // ---------------------------------------------------------------------------
@@ -613,6 +648,14 @@ function startServer(opts) {
  }
  const wasAlreadyReady = ensureVenv(python, repair);
  // Ensure playwright-cli for browser automation (quick check, installs once)
  if (!ensurePlaywrightCli(!wasAlreadyReady)) {
    log('');
    log('  Note: playwright-cli not available (browser automation will be limited)');
    log('  Install manually: npm install -g @playwright/cli');
    log('');
  }
  // Step 3: Config file
  const configCreated = ensureEnvFile();
--- a/package.json
+++ b/package.json
@@ -19,6 +19,7 @@
    "ui/dist/",
    "ui/package.json",
    ".claude/commands/",
    ".claude/skills/",
    ".claude/templates/",
    "examples/",
    "start.py",
--- a/prompts.py
+++ b/prompts.py
@@ -16,6 +16,9 @@ from pathlib import Path
 # Base templates location (generic templates)
 TEMPLATES_DIR = Path(__file__).parent / ".claude" / "templates"
 # Migration version — bump when adding new migration steps
 CURRENT_MIGRATION_VERSION = 1
 def get_project_prompts_dir(project_dir: Path) -> Path:
    """Get the prompts directory for a specific project."""
@@ -99,9 +102,9 @@ def _strip_browser_testing_sections(prompt: str) -> str:
        flags=re.DOTALL,
    )
-    # Replace the screenshots-only marking rule with YOLO-appropriate wording
+    # Replace the marking rule with YOLO-appropriate wording
    prompt = prompt.replace(
-        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**",
+        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
        "**YOLO mode: Mark a feature as passing after lint/type-check succeeds and server starts cleanly.**",
    )
@@ -351,9 +354,70 @@ def scaffold_project_prompts(project_dir: Path) -> Path:
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy allowed_commands.yaml: {e}")
    # Copy Playwright CLI skill for browser automation
    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
    if skills_src.exists() and not skills_dest.exists():
        try:
            shutil.copytree(skills_src, skills_dest)
            copied_files.append(".claude/skills/playwright-cli/")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy playwright-cli skill: {e}")
    # Ensure .playwright-cli/ and .playwright/ are in project .gitignore
    project_gitignore = project_dir / ".gitignore"
    entries_to_add = [".playwright-cli/", ".playwright/"]
    existing_lines: list[str] = []
    if project_gitignore.exists():
        try:
            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
        except (OSError, PermissionError):
            pass
    missing_entries = [e for e in entries_to_add if e not in existing_lines]
    if missing_entries:
        try:
            with open(project_gitignore, "a", encoding="utf-8") as f:
                # Add newline before entries if file doesn't end with one
                if existing_lines and existing_lines[-1].strip():
                    f.write("\n")
                for entry in missing_entries:
                    f.write(f"{entry}\n")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update .gitignore: {e}")
    # Scaffold .playwright/cli.config.json for browser settings
    playwright_config_dir = project_dir / ".playwright"
    playwright_config_file = playwright_config_dir / "cli.config.json"
    if not playwright_config_file.exists():
        try:
            playwright_config_dir.mkdir(parents=True, exist_ok=True)
            import json
            config = {
                "browser": {
                    "browserName": "chromium",
                    "launchOptions": {
                        "channel": "chrome",
                        "headless": True,
                    },
                    "contextOptions": {
                        "viewport": {"width": 1280, "height": 720},
                    },
                    "isolated": True,
                },
            }
            with open(playwright_config_file, "w", encoding="utf-8") as f:
                json.dump(config, f, indent=2)
                f.write("\n")
            copied_files.append(".playwright/cli.config.json")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not create playwright config: {e}")
    if copied_files:
        print(f"  Created project files: {', '.join(copied_files)}")
    # Stamp new projects at the current migration version so they never trigger migration
    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
    return project_prompts
@@ -425,3 +489,330 @@ def copy_spec_to_project(project_dir: Path) -> None:
            return
    print("Warning: No app_spec.txt found to copy to project directory")
 # ---------------------------------------------------------------------------
 # Project version migration
 # ---------------------------------------------------------------------------
 # Replacement content: coding_prompt.md STEP 5 section (Playwright CLI)
 _CLI_STEP5_CONTENT = """\
 ### STEP 5: VERIFY WITH BROWSER AUTOMATION
 **CRITICAL:** You MUST verify features through the actual UI.
 Use `playwright-cli` for browser automation:
 - Open the browser: `playwright-cli open http://localhost:PORT`
 - Take a snapshot to see page elements: `playwright-cli snapshot`
 - Read the snapshot YAML file to see element refs
 - Click elements by ref: `playwright-cli click e5`
 - Type text: `playwright-cli type "search query"`
 - Fill form fields: `playwright-cli fill e3 "value"`
 - Take screenshots: `playwright-cli screenshot`
 - Read the screenshot file to verify visual appearance
 - Check console errors: `playwright-cli console`
 - Close browser when done: `playwright-cli close`
 **Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
 to `.playwright-cli/`. You will see a file link in the output. Read the file only
 when you need to verify visual appearance or find element refs.
 **DO:**
 - Test through the UI with clicks and keyboard input
 - Take screenshots and read them to verify visual appearance
 - Check for console errors with `playwright-cli console`
 - Verify complete user workflows end-to-end
 - Always run `playwright-cli close` when finished testing
 **DON'T:**
 - Only test with curl commands
 - Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
 - Skip visual verification
 - Mark tests passing without thorough verification
 """
 # Replacement content: coding_prompt.md BROWSER AUTOMATION reference section
 _CLI_BROWSER_SECTION = """\
 ## BROWSER AUTOMATION
 Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
 `snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.
 **How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
 subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
 save to `.playwright-cli/` -- read the files when you need to verify content.
 Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
 JS errors. Don't bypass UI with JavaScript evaluation.
 """
 # Replacement content: testing_prompt.md STEP 2 section (Playwright CLI)
 _CLI_TESTING_STEP2 = """\
 ### STEP 2: VERIFY THE FEATURE
 **CRITICAL:** You MUST verify the feature through the actual UI using browser automation.
 For the feature returned:
 1. Read and understand the feature's verification steps
 2. Navigate to the relevant part of the application
 3. Execute each verification step using browser automation
 4. Take screenshots and read them to verify visual appearance
 5. Check for console errors
 ### Browser Automation (Playwright CLI)
 **Navigation & Screenshots:**
 - `playwright-cli open <url>` - Open browser and navigate
 - `playwright-cli goto <url>` - Navigate to URL
 - `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
 - `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`
 **Element Interaction:**
 - `playwright-cli click <ref>` - Click elements (ref from snapshot)
 - `playwright-cli type <text>` - Type text
 - `playwright-cli fill <ref> <text>` - Fill form fields
 - `playwright-cli select <ref> <val>` - Select dropdown
 - `playwright-cli press <key>` - Keyboard input
 **Debugging:**
 - `playwright-cli console` - Check for JS errors
 - `playwright-cli network` - Monitor API calls
 **Cleanup:**
 - `playwright-cli close` - Close browser when done (ALWAYS do this)
 **Note:** Screenshots and snapshots save to files. Read the file to see the content.
 """
 # Replacement content: testing_prompt.md AVAILABLE TOOLS browser subsection
 _CLI_TESTING_TOOLS = """\
 ### Browser Automation (Playwright CLI)
 Use `playwright-cli` commands for browser interaction. Key commands:
 - `playwright-cli open <url>` - Open browser
 - `playwright-cli goto <url>` - Navigate to URL
 - `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
 - `playwright-cli snapshot` - Get page snapshot with element refs
 - `playwright-cli click <ref>` - Click element
 - `playwright-cli type <text>` - Type text
 - `playwright-cli fill <ref> <text>` - Fill form field
 - `playwright-cli console` - Check for JS errors
 - `playwright-cli close` - Close browser (always do this when done)
 """
 def _get_migration_version(project_dir: Path) -> int:
    """Read the migration version from .autoforge/.migration_version."""
    from autoforge_paths import get_autoforge_dir
    version_file = get_autoforge_dir(project_dir) / ".migration_version"
    if not version_file.exists():
        return 0
    try:
        return int(version_file.read_text().strip())
    except (ValueError, OSError):
        return 0
 def _set_migration_version(project_dir: Path, version: int) -> None:
    """Write the migration version to .autoforge/.migration_version."""
    from autoforge_paths import get_autoforge_dir
    version_file = get_autoforge_dir(project_dir) / ".migration_version"
    version_file.parent.mkdir(parents=True, exist_ok=True)
    version_file.write_text(str(version))
 def _migrate_coding_prompt_to_cli(content: str) -> str:
    """Replace MCP-based Playwright sections with CLI-based content in coding prompt."""
    # Replace STEP 5 section (from header to just before STEP 5.5)
    content = re.sub(
        r"### STEP 5: VERIFY WITH BROWSER AUTOMATION.*?(?=### STEP 5\.5:)",
        _CLI_STEP5_CONTENT,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace BROWSER AUTOMATION reference section (from header to next ---)
    content = re.sub(
        r"## BROWSER AUTOMATION\n\n.*?(?=---)",
        _CLI_BROWSER_SECTION,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace inline screenshot rule
    content = content.replace(
        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**",
        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
    )
    # Replace inline screenshot references (various phrasings from old templates)
    for old_phrase in (
        "(inline only -- do NOT save to disk)",
        "(inline only, never save to disk)",
        "(inline mode only -- never save to disk)",
    ):
        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
    return content
 def _migrate_testing_prompt_to_cli(content: str) -> str:
    """Replace MCP-based Playwright sections with CLI-based content in testing prompt."""
    # Replace AVAILABLE TOOLS browser subsection FIRST (before STEP 2, to avoid
    # matching the new CLI subsection header that the STEP 2 replacement inserts).
    # In old prompts, ### Browser Automation (Playwright) only exists in AVAILABLE TOOLS.
    content = re.sub(
        r"### Browser Automation \(Playwright[^)]*\)\n.*?(?=---)",
        _CLI_TESTING_TOOLS,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace STEP 2 verification section (from header to just before STEP 3)
    content = re.sub(
        r"### STEP 2: VERIFY THE FEATURE.*?(?=### STEP 3:)",
        _CLI_TESTING_STEP2,
        content,
        count=1,
        flags=re.DOTALL,
    )
    # Replace inline screenshot references (various phrasings from old templates)
    for old_phrase in (
        "(inline only -- do NOT save to disk)",
        "(inline only, never save to disk)",
        "(inline mode only -- never save to disk)",
    ):
        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
    return content
 def _migrate_v0_to_v1(project_dir: Path) -> list[str]:
    """Migrate from v0 (MCP-based Playwright) to v1 (Playwright CLI).
    Four idempotent sub-steps:
    A. Copy playwright-cli skill to project
    B. Scaffold .playwright/cli.config.json
    C. Update .gitignore with .playwright-cli/ and .playwright/
    D. Update coding_prompt.md and testing_prompt.md
    """
    import json
    migrated: list[str] = []
    # A. Copy Playwright CLI skill
    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
    if skills_src.exists() and not skills_dest.exists():
        try:
            shutil.copytree(skills_src, skills_dest)
            migrated.append("Copied playwright-cli skill")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy playwright-cli skill: {e}")
    # B. Scaffold .playwright/cli.config.json
    playwright_config_dir = project_dir / ".playwright"
    playwright_config_file = playwright_config_dir / "cli.config.json"
    if not playwright_config_file.exists():
        try:
            playwright_config_dir.mkdir(parents=True, exist_ok=True)
            config = {
                "browser": {
                    "browserName": "chromium",
                    "launchOptions": {
                        "channel": "chrome",
                        "headless": True,
                    },
                    "contextOptions": {
                        "viewport": {"width": 1280, "height": 720},
                    },
                    "isolated": True,
                },
            }
            with open(playwright_config_file, "w", encoding="utf-8") as f:
                json.dump(config, f, indent=2)
                f.write("\n")
            migrated.append("Created .playwright/cli.config.json")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not create playwright config: {e}")
    # C. Update .gitignore
    project_gitignore = project_dir / ".gitignore"
    entries_to_add = [".playwright-cli/", ".playwright/"]
    existing_lines: list[str] = []
    if project_gitignore.exists():
        try:
            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
        except (OSError, PermissionError):
            pass
    missing_entries = [e for e in entries_to_add if e not in existing_lines]
    if missing_entries:
        try:
            with open(project_gitignore, "a", encoding="utf-8") as f:
                if existing_lines and existing_lines[-1].strip():
                    f.write("\n")
                for entry in missing_entries:
                    f.write(f"{entry}\n")
            migrated.append(f"Added {', '.join(missing_entries)} to .gitignore")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update .gitignore: {e}")
    # D. Update prompts
    prompts_dir = get_project_prompts_dir(project_dir)
    # D1. Update coding_prompt.md
    coding_prompt_path = prompts_dir / "coding_prompt.md"
    if coding_prompt_path.exists():
        try:
            content = coding_prompt_path.read_text(encoding="utf-8")
            if "Playwright MCP" in content or "browser_navigate" in content or "browser_take_screenshot" in content:
                updated = _migrate_coding_prompt_to_cli(content)
                if updated != content:
                    coding_prompt_path.write_text(updated, encoding="utf-8")
                    migrated.append("Updated coding_prompt.md to Playwright CLI")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update coding_prompt.md: {e}")
    # D2. Update testing_prompt.md
    testing_prompt_path = prompts_dir / "testing_prompt.md"
    if testing_prompt_path.exists():
        try:
            content = testing_prompt_path.read_text(encoding="utf-8")
            if "browser_navigate" in content or "browser_take_screenshot" in content:
                updated = _migrate_testing_prompt_to_cli(content)
                if updated != content:
                    testing_prompt_path.write_text(updated, encoding="utf-8")
                    migrated.append("Updated testing_prompt.md to Playwright CLI")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not update testing_prompt.md: {e}")
    return migrated
 def migrate_project_to_current(project_dir: Path) -> list[str]:
    """Migrate an existing project to the current AutoForge version.
    Idempotent — safe to call on every agent start. Returns list of
    human-readable descriptions of what was migrated.
    """
    current = _get_migration_version(project_dir)
    if current >= CURRENT_MIGRATION_VERSION:
        return []
    migrated: list[str] = []
    if current < 1:
        migrated.extend(_migrate_v0_to_v1(project_dir))
    # Future: if current < 2: migrated.extend(_migrate_v1_to_v2(project_dir))
    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
    return migrated
--- a/security.py
+++ b/security.py
@@ -66,10 +66,12 @@ ALLOWED_COMMANDS = {
    "bash",
    # Script execution
    "init.sh",  # Init scripts; validated separately
    # Browser automation
    "playwright-cli",  # Playwright CLI for browser testing; validated separately
 }
 # Commands that need additional validation even when in the allowlist
-COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"}
+COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh", "playwright-cli"}
 # Commands that are NEVER allowed, even with user approval
 # These commands can cause permanent system damage or security breaches
@@ -438,6 +440,37 @@ def validate_init_script(command_string: str) -> tuple[bool, str]:
    return False, f"Only ./init.sh is allowed, got: {script}"
 def validate_playwright_command(command_string: str) -> tuple[bool, str]:
    """
    Validate playwright-cli commands - block dangerous subcommands.
    Blocks `run-code` (arbitrary Node.js execution) and `eval` (arbitrary JS
    evaluation) which bypass the security sandbox.
    Returns:
        Tuple of (is_allowed, reason_if_blocked)
    """
    try:
        tokens = shlex.split(command_string)
    except ValueError:
        return False, "Could not parse playwright-cli command"
    if not tokens:
        return False, "Empty command"
    BLOCKED_SUBCOMMANDS = {"run-code", "eval"}
    # Find the subcommand: first non-flag token after 'playwright-cli'
    for token in tokens[1:]:
        if token.startswith("-"):
            continue  # skip flags like -s=agent-1
        if token in BLOCKED_SUBCOMMANDS:
            return False, f"playwright-cli '{token}' is not allowed"
        break  # first non-flag token is the subcommand
    return True, ""
 def matches_pattern(command: str, pattern: str) -> bool:
    """
    Check if a command matches a pattern.
@@ -955,5 +988,9 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):
                allowed, reason = validate_init_script(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
            elif cmd == "playwright-cli":
                allowed, reason = validate_playwright_command(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
    return {}
--- a/server/services/process_manager.py
+++ b/server/services/process_manager.py
@@ -227,6 +227,28 @@ class AgentProcessManager:
        """Remove lock file."""
        self.lock_file.unlink(missing_ok=True)
    def _apply_playwright_headless(self, headless: bool) -> None:
        """Update .playwright/cli.config.json with the current headless setting.
        playwright-cli reads this config file on each ``open`` command, so
        updating it before the agent starts is sufficient.
        """
        config_file = self.project_dir / ".playwright" / "cli.config.json"
        if not config_file.exists():
            return
        try:
            import json
            config = json.loads(config_file.read_text(encoding="utf-8"))
            launch_opts = config.get("browser", {}).get("launchOptions", {})
            if launch_opts.get("headless") == headless:
                return  # already correct
            launch_opts["headless"] = headless
            config.setdefault("browser", {})["launchOptions"] = launch_opts
            config_file.write_text(json.dumps(config, indent=2) + "\n", encoding="utf-8")
            logger.info("Set playwright headless=%s for %s", headless, self.project_name)
        except Exception:
            logger.warning("Failed to update playwright config", exc_info=True)
    def _cleanup_stale_features(self) -> None:
        """Clear in_progress flag for all features when agent stops/crashes.
@@ -361,6 +383,15 @@ class AgentProcessManager:
        if not self._check_lock():
            return False, "Another agent instance is already running for this project"
        # Clean up stale browser daemons from previous runs
        try:
            subprocess.run(
                ["playwright-cli", "kill-all"],
                timeout=5, capture_output=True,
            )
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
        # Clean up features stuck from a previous crash/stop
        self._cleanup_stale_features()
@@ -397,6 +428,10 @@ class AgentProcessManager:
        # Add --batch-size flag for multi-feature batching
        cmd.extend(["--batch-size", str(batch_size)])
        # Apply headless setting to .playwright/cli.config.json so playwright-cli
        # picks it up (the only mechanism it supports for headless control)
        self._apply_playwright_headless(playwright_headless)
        try:
            # Start subprocess with piped stdout/stderr
            # Use project_dir as cwd so Claude SDK sandbox allows access to project files
@@ -409,7 +444,7 @@ class AgentProcessManager:
            subprocess_env = {
                **os.environ,
                "PYTHONUNBUFFERED": "1",
-                "PLAYWRIGHT_HEADLESS": "true" if playwright_headless else "false",
+                "PLAYWRIGHT_CLI_SESSION": f"agent-{self.project_name}-{os.getpid()}",
                "NODE_COMPILE_CACHE": "",  # Disable V8 compile caching to prevent .node file accumulation in %TEMP%
                **api_env,
            }
@@ -469,6 +504,15 @@ class AgentProcessManager:
                except asyncio.CancelledError:
                    pass
            # Kill browser daemons before stopping agent
            try:
                subprocess.run(
                    ["playwright-cli", "kill-all"],
                    timeout=5, capture_output=True,
                )
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                pass
            # CRITICAL: Kill entire process tree, not just orchestrator
            # This ensures all spawned coding/testing agents are also terminated
            proc = self.process  # Capture reference before async call
--- a/start.bat
+++ b/start.bat
@@ -54,5 +54,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet
 REM Ensure playwright-cli is available for browser automation
 where playwright-cli >nul 2>&1
 if %ERRORLEVEL% neq 0 (
    echo Installing playwright-cli for browser automation...
    call npm install -g @playwright/cli >nul 2>&1
    if %ERRORLEVEL% neq 0 (
        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
    )
 )
 REM Run the app
 python start.py
--- a/start.sh
+++ b/start.sh
@@ -74,5 +74,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet
 # Ensure playwright-cli is available for browser automation
 if ! command -v playwright-cli &> /dev/null; then
    echo "Installing playwright-cli for browser automation..."
    npm install -g @playwright/cli --quiet 2>/dev/null
    if [ $? -ne 0 ]; then
        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
    fi
 fi
 # Run the app
 python start.py
--- a/start_ui.bat
+++ b/start_ui.bat
@@ -37,5 +37,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet
 REM Ensure playwright-cli is available for browser automation
 where playwright-cli >nul 2>&1
 if %ERRORLEVEL% neq 0 (
    echo Installing playwright-cli for browser automation...
    call npm install -g @playwright/cli >nul 2>&1
    if %ERRORLEVEL% neq 0 (
        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
    )
 )
 REM Run the Python launcher
 python "%~dp0start_ui.py" %*
--- a/start_ui.sh
+++ b/start_ui.sh
@@ -80,5 +80,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet
 # Ensure playwright-cli is available for browser automation
 if ! command -v playwright-cli &> /dev/null; then
    echo "Installing playwright-cli for browser automation..."
    npm install -g @playwright/cli --quiet 2>/dev/null
    if [ $? -ne 0 ]; then
        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
    fi
 fi
 # Run the Python launcher
 python start_ui.py "$@"
--- a/temp_cleanup.py
+++ b/temp_cleanup.py
@@ -125,14 +125,18 @@ def cleanup_stale_temp(max_age_seconds: int = MAX_AGE_SECONDS) -> dict:
 def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -> dict:
    """
-    Clean up stale screenshot files from the project root.
+    Clean up stale Playwright CLI artifacts from the project.
-    Playwright browser verification can leave .png files in the project
+    The Playwright CLI daemon saves screenshots, snapshots, and other artifacts
-    directory. This removes them after they've aged out (default 5 minutes).
+    to `{project_dir}/.playwright-cli/`. This removes them after they've aged
    out (default 5 minutes).
    Also cleans up legacy screenshot patterns from the project root (from the
    old Playwright MCP server approach).
    Args:
        project_dir: Path to the project directory.
-        max_age_seconds: Maximum age in seconds before a screenshot is deleted.
+        max_age_seconds: Maximum age in seconds before an artifact is deleted.
                        Defaults to 5 minutes (300 seconds).
    Returns:
@@ -141,13 +145,33 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -
    cutoff_time = time.time() - max_age_seconds
    stats: dict = {"files_deleted": 0, "bytes_freed": 0, "errors": []}
-    screenshot_patterns = [
+    # Clean up .playwright-cli/ directory (new CLI approach)
    playwright_cli_dir = project_dir / ".playwright-cli"
    if playwright_cli_dir.exists():
        for item in playwright_cli_dir.iterdir():
            if not item.is_file():
                continue
            try:
                mtime = item.stat().st_mtime
                if mtime < cutoff_time:
                    size = item.stat().st_size
                    item.unlink(missing_ok=True)
                    if not item.exists():
                        stats["files_deleted"] += 1
                        stats["bytes_freed"] += size
                        logger.debug(f"Deleted playwright-cli artifact: {item}")
            except Exception as e:
                stats["errors"].append(f"Failed to delete {item}: {e}")
                logger.debug(f"Failed to delete artifact {item}: {e}")
    # Legacy cleanup: root-level screenshot patterns (from old MCP server approach)
    legacy_patterns = [
        "feature*-*.png",
        "screenshot-*.png",
        "step-*.png",
    ]
-    for pattern in screenshot_patterns:
+    for pattern in legacy_patterns:
        for item in project_dir.glob(pattern):
            if not item.is_file():
                continue
@@ -159,14 +183,14 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -
                    if not item.exists():
                        stats["files_deleted"] += 1
                        stats["bytes_freed"] += size
-                        logger.debug(f"Deleted project screenshot: {item}")
+                        logger.debug(f"Deleted legacy screenshot: {item}")
            except Exception as e:
                stats["errors"].append(f"Failed to delete {item}: {e}")
                logger.debug(f"Failed to delete screenshot {item}: {e}")
    if stats["files_deleted"] > 0:
        mb_freed = stats["bytes_freed"] / (1024 * 1024)
-        logger.info(f"Screenshot cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
+        logger.info(f"Artifact cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
    return stats
--- a/test_security.py
+++ b/test_security.py
@@ -25,6 +25,7 @@ from security import (
    validate_chmod_command,
    validate_init_script,
    validate_pkill_command,
    validate_playwright_command,
    validate_project_command,
 )
@@ -923,6 +924,70 @@ pkill_processes:
    return passed, failed
 def test_playwright_cli_validation():
    """Test playwright-cli subcommand validation."""
    print("\nTesting playwright-cli validation:\n")
    passed = 0
    failed = 0
    # Test cases: (command, should_be_allowed, description)
    test_cases = [
        # Allowed cases
        ("playwright-cli screenshot", True, "screenshot allowed"),
        ("playwright-cli snapshot", True, "snapshot allowed"),
        ("playwright-cli click e5", True, "click with ref"),
        ("playwright-cli open http://localhost:3000", True, "open URL"),
        ("playwright-cli -s=agent-1 click e5", True, "session flag with click"),
        ("playwright-cli close", True, "close browser"),
        ("playwright-cli goto http://localhost:3000/page", True, "goto URL"),
        ("playwright-cli fill e3 'test value'", True, "fill form field"),
        ("playwright-cli console", True, "console messages"),
        # Blocked cases
        ("playwright-cli run-code 'await page.evaluate(() => {})'", False, "run-code blocked"),
        ("playwright-cli eval 'document.title'", False, "eval blocked"),
        ("playwright-cli -s=test eval 'document.title'", False, "eval with session flag blocked"),
    ]
    for cmd, should_allow, description in test_cases:
        allowed, reason = validate_playwright_command(cmd)
        if allowed == should_allow:
            print(f"  PASS: {cmd!r} ({description})")
            passed += 1
        else:
            expected = "allowed" if should_allow else "blocked"
            actual = "allowed" if allowed else "blocked"
            print(f"  FAIL: {cmd!r} ({description})")
            print(f"         Expected: {expected}, Got: {actual}")
            if reason:
                print(f"         Reason: {reason}")
            failed += 1
    # Integration test: verify through the security hook
    print("\n  Integration tests (via security hook):\n")
    # playwright-cli screenshot should be allowed
    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli screenshot"}}
    result = asyncio.run(bash_security_hook(input_data))
    if result.get("decision") != "block":
        print("  PASS: playwright-cli screenshot allowed via hook")
        passed += 1
    else:
        print(f"  FAIL: playwright-cli screenshot should be allowed: {result.get('reason')}")
        failed += 1
    # playwright-cli run-code should be blocked
    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli run-code 'code'"}}
    result = asyncio.run(bash_security_hook(input_data))
    if result.get("decision") == "block":
        print("  PASS: playwright-cli run-code blocked via hook")
        passed += 1
    else:
        print("  FAIL: playwright-cli run-code should be blocked via hook")
        failed += 1
    return passed, failed
 def main():
    print("=" * 70)
    print("  SECURITY HOOK TESTS")
@@ -991,6 +1056,11 @@ def main():
    passed += pkill_passed
    failed += pkill_failed
    # Test playwright-cli validation
    pw_passed, pw_failed = test_playwright_cli_validation()
    passed += pw_passed
    failed += pw_failed
    # Commands that SHOULD be blocked
    # Note: blocklisted commands (sudo, shutdown, dd, aws) are tested in
    # test_blocklist_enforcement(). chmod validation is tested in
@@ -1012,6 +1082,9 @@ def main():
        # Shell injection attempts
        "$(echo pkill) node",
        'eval "pkill node"',
        # playwright-cli dangerous subcommands
        "playwright-cli run-code 'await page.goto(\"http://evil.com\")'",
        "playwright-cli eval 'document.cookie'",
    ]
    for cmd in dangerous:
@@ -1077,6 +1150,12 @@ def main():
        "/usr/local/bin/node app.js",
        # Combined chmod and init.sh (integration test for both validators)
        "chmod +x init.sh && ./init.sh",
        # Playwright CLI allowed commands
        "playwright-cli open http://localhost:3000",
        "playwright-cli screenshot",
        "playwright-cli snapshot",
        "playwright-cli click e5",
        "playwright-cli -s=agent-1 close",
    ]
    for cmd in safe:
--- a/ui/src/components/ProjectSelector.tsx
+++ b/ui/src/components/ProjectSelector.tsx
@@ -75,6 +75,7 @@ export function ProjectSelector({
            variant="outline"
            className="min-w-[140px] sm:min-w-[200px] justify-between"
            disabled={isLoading}
            title={selectedProjectData?.path}
          >
            {isLoading ? (
              <Loader2 size={18} className="animate-spin" />
@@ -101,6 +102,7 @@ export function ProjectSelector({
              {projects.map(project => (
                <DropdownMenuItem
                  key={project.name}
                  title={project.path}
                  className={`flex items-center justify-between cursor-pointer ${
                    project.name === selectedProject ? 'bg-primary/10' : ''
                  }`}