diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 65243d8..832eb59 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -86,24 +86,33 @@ Implement the chosen feature thoroughly: **CRITICAL:** You MUST verify features through the actual UI. -Use browser automation tools: +Use `playwright-cli` for browser automation: -- Navigate to the app in a real browser -- Interact like a human user (click, type, scroll) -- Take screenshots at each step (use inline screenshots only -- do NOT save screenshot files to disk) -- Verify both functionality AND visual appearance +- Open the browser: `playwright-cli open http://localhost:PORT` +- Take a snapshot to see page elements: `playwright-cli snapshot` +- Read the snapshot YAML file to see element refs +- Click elements by ref: `playwright-cli click e5` +- Type text: `playwright-cli type "search query"` +- Fill form fields: `playwright-cli fill e3 "value"` +- Take screenshots: `playwright-cli screenshot` +- Read the screenshot file to verify visual appearance +- Check console errors: `playwright-cli console` +- Close browser when done: `playwright-cli close` + +**Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files +to `.playwright-cli/`. You will see a file link in the output. Read the file only +when you need to verify visual appearance or find element refs. **DO:** - - Test through the UI with clicks and keyboard input -- Take screenshots to verify visual appearance (inline only, never save to disk) -- Check for console errors in browser +- Take screenshots and read them to verify visual appearance +- Check for console errors with `playwright-cli console` - Verify complete user workflows end-to-end +- Always run `playwright-cli close` when finished testing **DON'T:** - -- Only test with curl commands (backend testing alone is insufficient) -- Use JavaScript evaluation to bypass UI (no shortcuts) +- Only test with curl commands +- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked) - Skip visual verification - Mark tests passing without thorough verification @@ -145,7 +154,7 @@ Use the feature_mark_passing tool with feature_id=42 - Combine or consolidate features - Reorder features -**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.** +**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.** ### STEP 7: COMMIT YOUR PROGRESS @@ -192,11 +201,15 @@ Before context fills up: ## BROWSER AUTOMATION -Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in. +Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`, +`snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`. -**Screenshot rule:** Always use inline mode (base64). NEVER save screenshots as files to disk. +**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it, +subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots +save to `.playwright-cli/` -- read the files when you need to verify content. -Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation. +Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect +JS errors. Don't bypass UI with JavaScript evaluation. --- diff --git a/.claude/templates/testing_prompt.template.md b/.claude/templates/testing_prompt.template.md index 3714d47..ee6a08f 100644 --- a/.claude/templates/testing_prompt.template.md +++ b/.claude/templates/testing_prompt.template.md @@ -31,26 +31,32 @@ For the feature returned: 1. Read and understand the feature's verification steps 2. Navigate to the relevant part of the application 3. Execute each verification step using browser automation -4. Take screenshots to document the verification (inline only -- do NOT save to disk) +4. Take screenshots and read them to verify visual appearance 5. Check for console errors -Use browser automation tools: +### Browser Automation (Playwright CLI) **Navigation & Screenshots:** -- browser_navigate - Navigate to a URL -- browser_take_screenshot - Capture screenshot (inline mode only -- never save to disk) -- browser_snapshot - Get accessibility tree snapshot +- `playwright-cli open ` - Open browser and navigate +- `playwright-cli goto ` - Navigate to URL +- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/` +- `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/` **Element Interaction:** -- browser_click - Click elements -- browser_type - Type text into editable elements -- browser_fill_form - Fill multiple form fields -- browser_select_option - Select dropdown options -- browser_press_key - Press keyboard keys +- `playwright-cli click ` - Click elements (ref from snapshot) +- `playwright-cli type ` - Type text +- `playwright-cli fill ` - Fill form fields +- `playwright-cli select ` - Select dropdown +- `playwright-cli press ` - Keyboard input **Debugging:** -- browser_console_messages - Get browser console output (check for errors) -- browser_network_requests - Monitor API calls +- `playwright-cli console` - Check for JS errors +- `playwright-cli network` - Monitor API calls + +**Cleanup:** +- `playwright-cli close` - Close browser when done (ALWAYS do this) + +**Note:** Screenshots and snapshots save to files. Read the file to see the content. ### STEP 3: HANDLE RESULTS @@ -79,7 +85,7 @@ A regression has been introduced. You MUST fix it: 4. **Verify the fix:** - Run through all verification steps again - - Take screenshots confirming the fix (inline only, never save to disk) + - Take screenshots and read them to confirm the fix 5. **Mark as passing after fix:** ``` @@ -98,7 +104,7 @@ A regression has been introduced. You MUST fix it: --- -## AVAILABLE MCP TOOLS +## AVAILABLE TOOLS ### Feature Management - `feature_get_stats` - Get progress overview (passing/in_progress/total counts) @@ -106,19 +112,17 @@ A regression has been introduced. You MUST fix it: - `feature_mark_failing` - Mark a feature as failing (when you find a regression) - `feature_mark_passing` - Mark a feature as passing (after fixing a regression) -### Browser Automation (Playwright) -All interaction tools have **built-in auto-wait** -- no manual timeouts needed. - -- `browser_navigate` - Navigate to URL -- `browser_take_screenshot` - Capture screenshot (inline only, never save to disk) -- `browser_snapshot` - Get accessibility tree -- `browser_click` - Click elements -- `browser_type` - Type text -- `browser_fill_form` - Fill form fields -- `browser_select_option` - Select dropdown -- `browser_press_key` - Keyboard input -- `browser_console_messages` - Check for JS errors -- `browser_network_requests` - Monitor API calls +### Browser Automation (Playwright CLI) +Use `playwright-cli` commands for browser interaction. Key commands: +- `playwright-cli open ` - Open browser +- `playwright-cli goto ` - Navigate to URL +- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`) +- `playwright-cli snapshot` - Get page snapshot with element refs +- `playwright-cli click ` - Click element +- `playwright-cli type ` - Type text +- `playwright-cli fill ` - Fill form field +- `playwright-cli console` - Check for JS errors +- `playwright-cli close` - Close browser (always do this when done) --- diff --git a/.gitignore b/.gitignore index 6a01793..d63e64e 100644 --- a/.gitignore +++ b/.gitignore @@ -10,6 +10,10 @@ issues/ # Browser profiles for parallel agent execution .browser-profiles/ +# Playwright CLI daemon artifacts +.playwright-cli/ +.playwright/ + # Log files logs/ *.log diff --git a/.npmignore b/.npmignore index 9c4ada3..6bf112b 100644 --- a/.npmignore +++ b/.npmignore @@ -28,5 +28,4 @@ start.sh start_ui.sh start_ui.py .claude/agents/ -.claude/skills/ .claude/settings.json diff --git a/CLAUDE.md b/CLAUDE.md index e0f9ea3..8665260 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -85,7 +85,7 @@ python autonomous_agent_demo.py --project-dir my-app --yolo **What's different in YOLO mode:** - No regression testing -- No Playwright MCP server (browser automation disabled) +- No Playwright CLI (browser automation disabled) - Features marked passing after lint/type-check succeeds - Faster iteration for prototyping @@ -163,7 +163,7 @@ Publishing: `npm publish` (triggers `prepublishOnly` which builds UI, then publi - `autonomous_agent_demo.py` - Entry point for running the agent (supports `--yolo`, `--parallel`, `--batch-size`, `--batch-features`) - `autoforge_paths.py` - Central path resolution with dual-path backward compatibility and migration - `agent.py` - Agent session loop using Claude Agent SDK -- `client.py` - ClaudeSDKClient configuration with security hooks, MCP servers, and Vertex AI support +- `client.py` - ClaudeSDKClient configuration with security hooks, feature MCP server, and Vertex AI support - `security.py` - Bash command allowlist validation (ALLOWED_COMMANDS whitelist) - `prompts.py` - Prompt template loading with project-specific fallback and batch feature prompts - `progress.py` - Progress tracking, database queries, webhook notifications @@ -288,6 +288,9 @@ Projects can be stored in any directory (registered in `~/.autoforge/registry.db - `.autoforge/.agent.lock` - Lock file to prevent multiple agent instances - `.autoforge/allowed_commands.yaml` - Project-specific bash command allowlist (optional) - `.autoforge/.gitignore` - Ignores runtime files +- `.claude/skills/playwright-cli/` - Playwright CLI skill for browser automation +- `.playwright/cli.config.json` - Browser configuration (headless, viewport, etc.) +- `.playwright-cli/` - Playwright CLI daemon artifacts (screenshots, snapshots) - gitignored - `CLAUDE.md` - Stays at project root (SDK convention) - `app_spec.txt` - Root copy for agent template compatibility @@ -445,6 +448,7 @@ Alternative providers are configured via the **Settings UI** (gear icon > API Pr **Skills** (`.claude/skills/`): - `frontend-design` - Distinctive, production-grade UI design - `gsd-to-autoforge-spec` - Convert GSD codebase mapping to AutoForge app_spec format +- `playwright-cli` - Browser automation via Playwright CLI (copied to each project) **Other:** - `.claude/templates/` - Prompt templates copied to new projects @@ -479,7 +483,7 @@ When running with `--parallel`, the orchestrator: 1. Spawns multiple Claude agents as subprocesses (up to `--max-concurrency`) 2. Each agent claims features atomically via `feature_claim_and_get` 3. Features blocked by unmet dependencies are skipped -4. Browser contexts are isolated per agent using `--isolated` flag +4. Browser sessions are isolated per agent via `PLAYWRIGHT_CLI_SESSION` environment variable 5. AgentTracker parses output and emits `agent_update` messages for UI ### Process Limits (Parallel Mode) diff --git a/agent.py b/agent.py index a3daaf8..e837628 100644 --- a/agent.py +++ b/agent.py @@ -240,17 +240,7 @@ async def run_autonomous_agent( print_session_header(iteration, is_initializer) # Create client (fresh context) - # Pass agent_id for browser isolation in multi-agent scenarios - import os - if agent_type == "testing": - agent_id = f"testing-{os.getpid()}" # Unique ID for testing agents - elif feature_ids and len(feature_ids) > 1: - agent_id = f"batch-{feature_ids[0]}" - elif feature_id: - agent_id = f"feature-{feature_id}" - else: - agent_id = None - client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_id=agent_id, agent_type=agent_type) + client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type) # Choose prompt based on agent type if agent_type == "initializer": diff --git a/autoforge_paths.py b/autoforge_paths.py index 8283a9b..076d0be 100644 --- a/autoforge_paths.py +++ b/autoforge_paths.py @@ -43,6 +43,7 @@ assistant.db-shm .claude_assistant_settings.json .claude_settings.expand.*.json .progress_cache +.migration_version """ diff --git a/autonomous_agent_demo.py b/autonomous_agent_demo.py index 918b2c1..f24908f 100644 --- a/autonomous_agent_demo.py +++ b/autonomous_agent_demo.py @@ -237,6 +237,12 @@ def main() -> None: if migrated: print(f"Migrated project files to .autoforge/: {', '.join(migrated)}", flush=True) + # Migrate project to current AutoForge version (idempotent, safe) + from prompts import migrate_project_to_current + version_migrated = migrate_project_to_current(project_dir) + if version_migrated: + print(f"Upgraded project: {', '.join(version_migrated)}", flush=True) + # Parse batch testing feature IDs (comma-separated string -> list[int]) testing_feature_ids: list[int] | None = None if args.testing_feature_ids: diff --git a/client.py b/client.py index 4d06816..e752cad 100644 --- a/client.py +++ b/client.py @@ -21,16 +21,6 @@ from security import SENSITIVE_DIRECTORIES, bash_security_hook # Load environment variables from .env file if present load_dotenv() -# Default Playwright headless mode - can be overridden via PLAYWRIGHT_HEADLESS env var -# When True, browser runs invisibly in background (default - saves CPU) -# When False, browser window is visible (useful for monitoring agent progress) -DEFAULT_PLAYWRIGHT_HEADLESS = True - -# Default browser for Playwright - can be overridden via PLAYWRIGHT_BROWSER env var -# Options: chrome, firefox, webkit, msedge -# Firefox is recommended for lower CPU usage -DEFAULT_PLAYWRIGHT_BROWSER = "firefox" - # Extra read paths for cross-project file access (read-only) # Set EXTRA_READ_PATHS environment variable with comma-separated absolute paths # Example: EXTRA_READ_PATHS=/Volumes/Data/dev,/Users/shared/libs @@ -41,6 +31,7 @@ EXTRA_READ_PATHS_VAR = "EXTRA_READ_PATHS" # this blocklist and the filesystem browser API share a single source of truth. EXTRA_READ_PATHS_BLOCKLIST = SENSITIVE_DIRECTORIES + def convert_model_for_vertex(model: str) -> str: """ Convert model name format for Vertex AI compatibility. @@ -72,43 +63,6 @@ def convert_model_for_vertex(model: str) -> str: return model -def get_playwright_headless() -> bool: - """ - Get the Playwright headless mode setting. - - Reads from PLAYWRIGHT_HEADLESS environment variable, defaults to True. - Returns True for headless mode (invisible browser), False for visible browser. - """ - value = os.getenv("PLAYWRIGHT_HEADLESS", str(DEFAULT_PLAYWRIGHT_HEADLESS).lower()).strip().lower() - truthy = {"true", "1", "yes", "on"} - falsy = {"false", "0", "no", "off"} - if value not in truthy | falsy: - print(f" - Warning: Invalid PLAYWRIGHT_HEADLESS='{value}', defaulting to {DEFAULT_PLAYWRIGHT_HEADLESS}") - return DEFAULT_PLAYWRIGHT_HEADLESS - return value in truthy - - -# Valid browsers supported by Playwright MCP -VALID_PLAYWRIGHT_BROWSERS = {"chrome", "firefox", "webkit", "msedge"} - - -def get_playwright_browser() -> str: - """ - Get the browser to use for Playwright. - - Reads from PLAYWRIGHT_BROWSER environment variable, defaults to firefox. - Options: chrome, firefox, webkit, msedge - Firefox is recommended for lower CPU usage. - """ - value = os.getenv("PLAYWRIGHT_BROWSER", DEFAULT_PLAYWRIGHT_BROWSER).strip().lower() - if value not in VALID_PLAYWRIGHT_BROWSERS: - print(f" - Warning: Invalid PLAYWRIGHT_BROWSER='{value}', " - f"valid options: {', '.join(sorted(VALID_PLAYWRIGHT_BROWSERS))}. " - f"Defaulting to {DEFAULT_PLAYWRIGHT_BROWSER}") - return DEFAULT_PLAYWRIGHT_BROWSER - return value - - def get_extra_read_paths() -> list[Path]: """ Get extra read-only paths from EXTRA_READ_PATHS environment variable. @@ -228,41 +182,6 @@ ALL_FEATURE_MCP_TOOLS = sorted( set(CODING_AGENT_TOOLS) | set(TESTING_AGENT_TOOLS) | set(INITIALIZER_AGENT_TOOLS) ) -# Playwright MCP tools for browser automation. -# Full set of tools for comprehensive UI testing including drag-and-drop, -# hover menus, file uploads, tab management, etc. -PLAYWRIGHT_TOOLS = [ - # Core navigation & screenshots - "mcp__playwright__browser_navigate", - "mcp__playwright__browser_navigate_back", - "mcp__playwright__browser_take_screenshot", - "mcp__playwright__browser_snapshot", - - # Element interaction - "mcp__playwright__browser_click", - "mcp__playwright__browser_type", - "mcp__playwright__browser_fill_form", - "mcp__playwright__browser_select_option", - "mcp__playwright__browser_press_key", - "mcp__playwright__browser_drag", - "mcp__playwright__browser_hover", - "mcp__playwright__browser_file_upload", - - # JavaScript & debugging - "mcp__playwright__browser_evaluate", - # "mcp__playwright__browser_run_code", # REMOVED - causes Playwright MCP server crash - "mcp__playwright__browser_console_messages", - "mcp__playwright__browser_network_requests", - - # Browser management - "mcp__playwright__browser_resize", - "mcp__playwright__browser_wait_for", - "mcp__playwright__browser_handle_dialog", - "mcp__playwright__browser_install", - "mcp__playwright__browser_close", - "mcp__playwright__browser_tabs", -] - # Built-in tools available to agents. # WebFetch and WebSearch are included so coding agents can look up current # documentation for frameworks and libraries they are implementing. @@ -282,7 +201,6 @@ def create_client( project_dir: Path, model: str, yolo_mode: bool = False, - agent_id: str | None = None, agent_type: str = "coding", ): """ @@ -291,9 +209,7 @@ def create_client( Args: project_dir: Directory for the project model: Claude model to use - yolo_mode: If True, skip Playwright MCP server for rapid prototyping - agent_id: Optional unique identifier for browser isolation in parallel mode. - When provided, each agent gets its own browser profile. + yolo_mode: If True, skip browser testing for rapid prototyping agent_type: One of "coding", "testing", or "initializer". Controls which MCP tools are exposed and the max_turns limit. @@ -327,11 +243,8 @@ def create_client( } max_turns = max_turns_map.get(agent_type, 300) - # Build allowed tools list based on mode and agent type. - # In YOLO mode, exclude Playwright tools for faster prototyping. + # Build allowed tools list based on agent type. allowed_tools = [*BUILTIN_TOOLS, *feature_tools] - if not yolo_mode: - allowed_tools.extend(PLAYWRIGHT_TOOLS) # Build permissions list. # We permit ALL feature MCP tools at the security layer (so the MCP server @@ -363,10 +276,6 @@ def create_client( permissions_list.append(f"Glob({path}/**)") permissions_list.append(f"Grep({path}/**)") - if not yolo_mode: - # Allow Playwright MCP tools for browser automation (standard mode only) - permissions_list.extend(PLAYWRIGHT_TOOLS) - # Create comprehensive security settings # Note: Using relative paths ("./**") restricts access to project directory # since cwd is set to project_dir @@ -395,9 +304,9 @@ def create_client( print(f" - Extra read paths (validated): {', '.join(str(p) for p in extra_read_paths)}") print(" - Bash commands restricted to allowlist (see security.py)") if yolo_mode: - print(" - MCP servers: features (database) - YOLO MODE (no Playwright)") + print(" - MCP servers: features (database) - YOLO MODE (no browser testing)") else: - print(" - MCP servers: playwright (browser), features (database)") + print(" - MCP servers: features (database)") print(" - Project settings enabled (skills, commands, CLAUDE.md)") print() @@ -421,36 +330,6 @@ def create_client( }, }, } - if not yolo_mode: - # Include Playwright MCP server for browser automation (standard mode only) - # Browser and headless mode configurable via environment variables - browser = get_playwright_browser() - playwright_args = [ - "@playwright/mcp@latest", - "--viewport-size", "1280x720", - "--browser", browser, - ] - if get_playwright_headless(): - playwright_args.append("--headless") - print(f" - Browser: {browser} (headless={get_playwright_headless()})") - - # Browser isolation for parallel execution - # Each agent gets its own isolated browser context to prevent tab conflicts - if agent_id: - # Use --isolated for ephemeral browser context - # This creates a fresh, isolated context without persistent state - # Note: --isolated and --user-data-dir are mutually exclusive - playwright_args.append("--isolated") - print(f" - Browser isolation enabled for agent: {agent_id}") - - mcp_servers["playwright"] = { - "command": "npx", - "args": playwright_args, - "env": { - "NODE_COMPILE_CACHE": "", # Disable V8 compile caching to prevent .node file accumulation in %TEMP% - }, - } - # Build environment overrides for API endpoint configuration # Uses get_effective_sdk_env() which reads provider settings from the database, # ensuring UI-configured alternative providers (GLM, Ollama, Kimi, Custom) propagate diff --git a/lib/cli.js b/lib/cli.js index d0d4789..682ba84 100644 --- a/lib/cli.js +++ b/lib/cli.js @@ -517,6 +517,41 @@ function killProcess(pid) { } } +// --------------------------------------------------------------------------- +// Playwright CLI +// --------------------------------------------------------------------------- + +/** + * Ensure playwright-cli is available globally for browser automation. + * Returns true if available (already installed or freshly installed). + * + * @param {boolean} showProgress - If true, print install progress + */ +function ensurePlaywrightCli(showProgress) { + try { + execSync('playwright-cli --version', { + timeout: 10_000, + stdio: ['pipe', 'pipe', 'pipe'], + }); + return true; + } catch { + // Not installed — try to install + } + + if (showProgress) { + log(' Installing playwright-cli for browser automation...'); + } + try { + execSync('npm install -g @playwright/cli', { + timeout: 120_000, + stdio: ['pipe', 'pipe', 'pipe'], + }); + return true; + } catch { + return false; + } +} + // --------------------------------------------------------------------------- // CLI commands // --------------------------------------------------------------------------- @@ -613,6 +648,14 @@ function startServer(opts) { } const wasAlreadyReady = ensureVenv(python, repair); + // Ensure playwright-cli for browser automation (quick check, installs once) + if (!ensurePlaywrightCli(!wasAlreadyReady)) { + log(''); + log(' Note: playwright-cli not available (browser automation will be limited)'); + log(' Install manually: npm install -g @playwright/cli'); + log(''); + } + // Step 3: Config file const configCreated = ensureEnvFile(); diff --git a/package.json b/package.json index f9a47c6..638e803 100644 --- a/package.json +++ b/package.json @@ -19,6 +19,7 @@ "ui/dist/", "ui/package.json", ".claude/commands/", + ".claude/skills/", ".claude/templates/", "examples/", "start.py", diff --git a/prompts.py b/prompts.py index 40d0494..dedead0 100644 --- a/prompts.py +++ b/prompts.py @@ -16,6 +16,9 @@ from pathlib import Path # Base templates location (generic templates) TEMPLATES_DIR = Path(__file__).parent / ".claude" / "templates" +# Migration version — bump when adding new migration steps +CURRENT_MIGRATION_VERSION = 1 + def get_project_prompts_dir(project_dir: Path) -> Path: """Get the prompts directory for a specific project.""" @@ -99,9 +102,9 @@ def _strip_browser_testing_sections(prompt: str) -> str: flags=re.DOTALL, ) - # Replace the screenshots-only marking rule with YOLO-appropriate wording + # Replace the marking rule with YOLO-appropriate wording prompt = prompt.replace( - "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**", + "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**", "**YOLO mode: Mark a feature as passing after lint/type-check succeeds and server starts cleanly.**", ) @@ -351,9 +354,70 @@ def scaffold_project_prompts(project_dir: Path) -> Path: except (OSError, PermissionError) as e: print(f" Warning: Could not copy allowed_commands.yaml: {e}") + # Copy Playwright CLI skill for browser automation + skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli" + skills_dest = project_dir / ".claude" / "skills" / "playwright-cli" + if skills_src.exists() and not skills_dest.exists(): + try: + shutil.copytree(skills_src, skills_dest) + copied_files.append(".claude/skills/playwright-cli/") + except (OSError, PermissionError) as e: + print(f" Warning: Could not copy playwright-cli skill: {e}") + + # Ensure .playwright-cli/ and .playwright/ are in project .gitignore + project_gitignore = project_dir / ".gitignore" + entries_to_add = [".playwright-cli/", ".playwright/"] + existing_lines: list[str] = [] + if project_gitignore.exists(): + try: + existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines() + except (OSError, PermissionError): + pass + missing_entries = [e for e in entries_to_add if e not in existing_lines] + if missing_entries: + try: + with open(project_gitignore, "a", encoding="utf-8") as f: + # Add newline before entries if file doesn't end with one + if existing_lines and existing_lines[-1].strip(): + f.write("\n") + for entry in missing_entries: + f.write(f"{entry}\n") + except (OSError, PermissionError) as e: + print(f" Warning: Could not update .gitignore: {e}") + + # Scaffold .playwright/cli.config.json for browser settings + playwright_config_dir = project_dir / ".playwright" + playwright_config_file = playwright_config_dir / "cli.config.json" + if not playwright_config_file.exists(): + try: + playwright_config_dir.mkdir(parents=True, exist_ok=True) + import json + config = { + "browser": { + "browserName": "chromium", + "launchOptions": { + "channel": "chrome", + "headless": True, + }, + "contextOptions": { + "viewport": {"width": 1280, "height": 720}, + }, + "isolated": True, + }, + } + with open(playwright_config_file, "w", encoding="utf-8") as f: + json.dump(config, f, indent=2) + f.write("\n") + copied_files.append(".playwright/cli.config.json") + except (OSError, PermissionError) as e: + print(f" Warning: Could not create playwright config: {e}") + if copied_files: print(f" Created project files: {', '.join(copied_files)}") + # Stamp new projects at the current migration version so they never trigger migration + _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION) + return project_prompts @@ -425,3 +489,330 @@ def copy_spec_to_project(project_dir: Path) -> None: return print("Warning: No app_spec.txt found to copy to project directory") + + +# --------------------------------------------------------------------------- +# Project version migration +# --------------------------------------------------------------------------- + +# Replacement content: coding_prompt.md STEP 5 section (Playwright CLI) +_CLI_STEP5_CONTENT = """\ +### STEP 5: VERIFY WITH BROWSER AUTOMATION + +**CRITICAL:** You MUST verify features through the actual UI. + +Use `playwright-cli` for browser automation: + +- Open the browser: `playwright-cli open http://localhost:PORT` +- Take a snapshot to see page elements: `playwright-cli snapshot` +- Read the snapshot YAML file to see element refs +- Click elements by ref: `playwright-cli click e5` +- Type text: `playwright-cli type "search query"` +- Fill form fields: `playwright-cli fill e3 "value"` +- Take screenshots: `playwright-cli screenshot` +- Read the screenshot file to verify visual appearance +- Check console errors: `playwright-cli console` +- Close browser when done: `playwright-cli close` + +**Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files +to `.playwright-cli/`. You will see a file link in the output. Read the file only +when you need to verify visual appearance or find element refs. + +**DO:** +- Test through the UI with clicks and keyboard input +- Take screenshots and read them to verify visual appearance +- Check for console errors with `playwright-cli console` +- Verify complete user workflows end-to-end +- Always run `playwright-cli close` when finished testing + +**DON'T:** +- Only test with curl commands +- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked) +- Skip visual verification +- Mark tests passing without thorough verification + +""" + +# Replacement content: coding_prompt.md BROWSER AUTOMATION reference section +_CLI_BROWSER_SECTION = """\ +## BROWSER AUTOMATION + +Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`, +`snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`. + +**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it, +subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots +save to `.playwright-cli/` -- read the files when you need to verify content. + +Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect +JS errors. Don't bypass UI with JavaScript evaluation. + +""" + +# Replacement content: testing_prompt.md STEP 2 section (Playwright CLI) +_CLI_TESTING_STEP2 = """\ +### STEP 2: VERIFY THE FEATURE + +**CRITICAL:** You MUST verify the feature through the actual UI using browser automation. + +For the feature returned: +1. Read and understand the feature's verification steps +2. Navigate to the relevant part of the application +3. Execute each verification step using browser automation +4. Take screenshots and read them to verify visual appearance +5. Check for console errors + +### Browser Automation (Playwright CLI) + +**Navigation & Screenshots:** +- `playwright-cli open ` - Open browser and navigate +- `playwright-cli goto ` - Navigate to URL +- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/` +- `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/` + +**Element Interaction:** +- `playwright-cli click ` - Click elements (ref from snapshot) +- `playwright-cli type ` - Type text +- `playwright-cli fill ` - Fill form fields +- `playwright-cli select ` - Select dropdown +- `playwright-cli press ` - Keyboard input + +**Debugging:** +- `playwright-cli console` - Check for JS errors +- `playwright-cli network` - Monitor API calls + +**Cleanup:** +- `playwright-cli close` - Close browser when done (ALWAYS do this) + +**Note:** Screenshots and snapshots save to files. Read the file to see the content. + +""" + +# Replacement content: testing_prompt.md AVAILABLE TOOLS browser subsection +_CLI_TESTING_TOOLS = """\ +### Browser Automation (Playwright CLI) +Use `playwright-cli` commands for browser interaction. Key commands: +- `playwright-cli open ` - Open browser +- `playwright-cli goto ` - Navigate to URL +- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`) +- `playwright-cli snapshot` - Get page snapshot with element refs +- `playwright-cli click ` - Click element +- `playwright-cli type ` - Type text +- `playwright-cli fill ` - Fill form field +- `playwright-cli console` - Check for JS errors +- `playwright-cli close` - Close browser (always do this when done) + +""" + + +def _get_migration_version(project_dir: Path) -> int: + """Read the migration version from .autoforge/.migration_version.""" + from autoforge_paths import get_autoforge_dir + version_file = get_autoforge_dir(project_dir) / ".migration_version" + if not version_file.exists(): + return 0 + try: + return int(version_file.read_text().strip()) + except (ValueError, OSError): + return 0 + + +def _set_migration_version(project_dir: Path, version: int) -> None: + """Write the migration version to .autoforge/.migration_version.""" + from autoforge_paths import get_autoforge_dir + version_file = get_autoforge_dir(project_dir) / ".migration_version" + version_file.parent.mkdir(parents=True, exist_ok=True) + version_file.write_text(str(version)) + + +def _migrate_coding_prompt_to_cli(content: str) -> str: + """Replace MCP-based Playwright sections with CLI-based content in coding prompt.""" + # Replace STEP 5 section (from header to just before STEP 5.5) + content = re.sub( + r"### STEP 5: VERIFY WITH BROWSER AUTOMATION.*?(?=### STEP 5\.5:)", + _CLI_STEP5_CONTENT, + content, + count=1, + flags=re.DOTALL, + ) + + # Replace BROWSER AUTOMATION reference section (from header to next ---) + content = re.sub( + r"## BROWSER AUTOMATION\n\n.*?(?=---)", + _CLI_BROWSER_SECTION, + content, + count=1, + flags=re.DOTALL, + ) + + # Replace inline screenshot rule + content = content.replace( + "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**", + "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**", + ) + + # Replace inline screenshot references (various phrasings from old templates) + for old_phrase in ( + "(inline only -- do NOT save to disk)", + "(inline only, never save to disk)", + "(inline mode only -- never save to disk)", + ): + content = content.replace(old_phrase, "(saved to `.playwright-cli/`)") + + return content + + +def _migrate_testing_prompt_to_cli(content: str) -> str: + """Replace MCP-based Playwright sections with CLI-based content in testing prompt.""" + # Replace AVAILABLE TOOLS browser subsection FIRST (before STEP 2, to avoid + # matching the new CLI subsection header that the STEP 2 replacement inserts). + # In old prompts, ### Browser Automation (Playwright) only exists in AVAILABLE TOOLS. + content = re.sub( + r"### Browser Automation \(Playwright[^)]*\)\n.*?(?=---)", + _CLI_TESTING_TOOLS, + content, + count=1, + flags=re.DOTALL, + ) + + # Replace STEP 2 verification section (from header to just before STEP 3) + content = re.sub( + r"### STEP 2: VERIFY THE FEATURE.*?(?=### STEP 3:)", + _CLI_TESTING_STEP2, + content, + count=1, + flags=re.DOTALL, + ) + + # Replace inline screenshot references (various phrasings from old templates) + for old_phrase in ( + "(inline only -- do NOT save to disk)", + "(inline only, never save to disk)", + "(inline mode only -- never save to disk)", + ): + content = content.replace(old_phrase, "(saved to `.playwright-cli/`)") + + return content + + +def _migrate_v0_to_v1(project_dir: Path) -> list[str]: + """Migrate from v0 (MCP-based Playwright) to v1 (Playwright CLI). + + Four idempotent sub-steps: + A. Copy playwright-cli skill to project + B. Scaffold .playwright/cli.config.json + C. Update .gitignore with .playwright-cli/ and .playwright/ + D. Update coding_prompt.md and testing_prompt.md + """ + import json + + migrated: list[str] = [] + + # A. Copy Playwright CLI skill + skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli" + skills_dest = project_dir / ".claude" / "skills" / "playwright-cli" + if skills_src.exists() and not skills_dest.exists(): + try: + shutil.copytree(skills_src, skills_dest) + migrated.append("Copied playwright-cli skill") + except (OSError, PermissionError) as e: + print(f" Warning: Could not copy playwright-cli skill: {e}") + + # B. Scaffold .playwright/cli.config.json + playwright_config_dir = project_dir / ".playwright" + playwright_config_file = playwright_config_dir / "cli.config.json" + if not playwright_config_file.exists(): + try: + playwright_config_dir.mkdir(parents=True, exist_ok=True) + config = { + "browser": { + "browserName": "chromium", + "launchOptions": { + "channel": "chrome", + "headless": True, + }, + "contextOptions": { + "viewport": {"width": 1280, "height": 720}, + }, + "isolated": True, + }, + } + with open(playwright_config_file, "w", encoding="utf-8") as f: + json.dump(config, f, indent=2) + f.write("\n") + migrated.append("Created .playwright/cli.config.json") + except (OSError, PermissionError) as e: + print(f" Warning: Could not create playwright config: {e}") + + # C. Update .gitignore + project_gitignore = project_dir / ".gitignore" + entries_to_add = [".playwright-cli/", ".playwright/"] + existing_lines: list[str] = [] + if project_gitignore.exists(): + try: + existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines() + except (OSError, PermissionError): + pass + missing_entries = [e for e in entries_to_add if e not in existing_lines] + if missing_entries: + try: + with open(project_gitignore, "a", encoding="utf-8") as f: + if existing_lines and existing_lines[-1].strip(): + f.write("\n") + for entry in missing_entries: + f.write(f"{entry}\n") + migrated.append(f"Added {', '.join(missing_entries)} to .gitignore") + except (OSError, PermissionError) as e: + print(f" Warning: Could not update .gitignore: {e}") + + # D. Update prompts + prompts_dir = get_project_prompts_dir(project_dir) + + # D1. Update coding_prompt.md + coding_prompt_path = prompts_dir / "coding_prompt.md" + if coding_prompt_path.exists(): + try: + content = coding_prompt_path.read_text(encoding="utf-8") + if "Playwright MCP" in content or "browser_navigate" in content or "browser_take_screenshot" in content: + updated = _migrate_coding_prompt_to_cli(content) + if updated != content: + coding_prompt_path.write_text(updated, encoding="utf-8") + migrated.append("Updated coding_prompt.md to Playwright CLI") + except (OSError, PermissionError) as e: + print(f" Warning: Could not update coding_prompt.md: {e}") + + # D2. Update testing_prompt.md + testing_prompt_path = prompts_dir / "testing_prompt.md" + if testing_prompt_path.exists(): + try: + content = testing_prompt_path.read_text(encoding="utf-8") + if "browser_navigate" in content or "browser_take_screenshot" in content: + updated = _migrate_testing_prompt_to_cli(content) + if updated != content: + testing_prompt_path.write_text(updated, encoding="utf-8") + migrated.append("Updated testing_prompt.md to Playwright CLI") + except (OSError, PermissionError) as e: + print(f" Warning: Could not update testing_prompt.md: {e}") + + return migrated + + +def migrate_project_to_current(project_dir: Path) -> list[str]: + """Migrate an existing project to the current AutoForge version. + + Idempotent — safe to call on every agent start. Returns list of + human-readable descriptions of what was migrated. + """ + current = _get_migration_version(project_dir) + if current >= CURRENT_MIGRATION_VERSION: + return [] + + migrated: list[str] = [] + + if current < 1: + migrated.extend(_migrate_v0_to_v1(project_dir)) + + # Future: if current < 2: migrated.extend(_migrate_v1_to_v2(project_dir)) + + _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION) + return migrated diff --git a/security.py b/security.py index 8ed9ce7..9d928b5 100644 --- a/security.py +++ b/security.py @@ -66,10 +66,12 @@ ALLOWED_COMMANDS = { "bash", # Script execution "init.sh", # Init scripts; validated separately + # Browser automation + "playwright-cli", # Playwright CLI for browser testing; validated separately } # Commands that need additional validation even when in the allowlist -COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"} +COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh", "playwright-cli"} # Commands that are NEVER allowed, even with user approval # These commands can cause permanent system damage or security breaches @@ -438,6 +440,37 @@ def validate_init_script(command_string: str) -> tuple[bool, str]: return False, f"Only ./init.sh is allowed, got: {script}" +def validate_playwright_command(command_string: str) -> tuple[bool, str]: + """ + Validate playwright-cli commands - block dangerous subcommands. + + Blocks `run-code` (arbitrary Node.js execution) and `eval` (arbitrary JS + evaluation) which bypass the security sandbox. + + Returns: + Tuple of (is_allowed, reason_if_blocked) + """ + try: + tokens = shlex.split(command_string) + except ValueError: + return False, "Could not parse playwright-cli command" + + if not tokens: + return False, "Empty command" + + BLOCKED_SUBCOMMANDS = {"run-code", "eval"} + + # Find the subcommand: first non-flag token after 'playwright-cli' + for token in tokens[1:]: + if token.startswith("-"): + continue # skip flags like -s=agent-1 + if token in BLOCKED_SUBCOMMANDS: + return False, f"playwright-cli '{token}' is not allowed" + break # first non-flag token is the subcommand + + return True, "" + + def matches_pattern(command: str, pattern: str) -> bool: """ Check if a command matches a pattern. @@ -955,5 +988,9 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None): allowed, reason = validate_init_script(cmd_segment) if not allowed: return {"decision": "block", "reason": reason} + elif cmd == "playwright-cli": + allowed, reason = validate_playwright_command(cmd_segment) + if not allowed: + return {"decision": "block", "reason": reason} return {} diff --git a/server/services/process_manager.py b/server/services/process_manager.py index 9a4bd5c..3054add 100644 --- a/server/services/process_manager.py +++ b/server/services/process_manager.py @@ -227,6 +227,28 @@ class AgentProcessManager: """Remove lock file.""" self.lock_file.unlink(missing_ok=True) + def _apply_playwright_headless(self, headless: bool) -> None: + """Update .playwright/cli.config.json with the current headless setting. + + playwright-cli reads this config file on each ``open`` command, so + updating it before the agent starts is sufficient. + """ + config_file = self.project_dir / ".playwright" / "cli.config.json" + if not config_file.exists(): + return + try: + import json + config = json.loads(config_file.read_text(encoding="utf-8")) + launch_opts = config.get("browser", {}).get("launchOptions", {}) + if launch_opts.get("headless") == headless: + return # already correct + launch_opts["headless"] = headless + config.setdefault("browser", {})["launchOptions"] = launch_opts + config_file.write_text(json.dumps(config, indent=2) + "\n", encoding="utf-8") + logger.info("Set playwright headless=%s for %s", headless, self.project_name) + except Exception: + logger.warning("Failed to update playwright config", exc_info=True) + def _cleanup_stale_features(self) -> None: """Clear in_progress flag for all features when agent stops/crashes. @@ -361,6 +383,15 @@ class AgentProcessManager: if not self._check_lock(): return False, "Another agent instance is already running for this project" + # Clean up stale browser daemons from previous runs + try: + subprocess.run( + ["playwright-cli", "kill-all"], + timeout=5, capture_output=True, + ) + except (subprocess.TimeoutExpired, FileNotFoundError, OSError): + pass + # Clean up features stuck from a previous crash/stop self._cleanup_stale_features() @@ -397,6 +428,10 @@ class AgentProcessManager: # Add --batch-size flag for multi-feature batching cmd.extend(["--batch-size", str(batch_size)]) + # Apply headless setting to .playwright/cli.config.json so playwright-cli + # picks it up (the only mechanism it supports for headless control) + self._apply_playwright_headless(playwright_headless) + try: # Start subprocess with piped stdout/stderr # Use project_dir as cwd so Claude SDK sandbox allows access to project files @@ -409,7 +444,7 @@ class AgentProcessManager: subprocess_env = { **os.environ, "PYTHONUNBUFFERED": "1", - "PLAYWRIGHT_HEADLESS": "true" if playwright_headless else "false", + "PLAYWRIGHT_CLI_SESSION": f"agent-{self.project_name}-{os.getpid()}", "NODE_COMPILE_CACHE": "", # Disable V8 compile caching to prevent .node file accumulation in %TEMP% **api_env, } @@ -469,6 +504,15 @@ class AgentProcessManager: except asyncio.CancelledError: pass + # Kill browser daemons before stopping agent + try: + subprocess.run( + ["playwright-cli", "kill-all"], + timeout=5, capture_output=True, + ) + except (subprocess.TimeoutExpired, FileNotFoundError, OSError): + pass + # CRITICAL: Kill entire process tree, not just orchestrator # This ensures all spawned coding/testing agents are also terminated proc = self.process # Capture reference before async call diff --git a/start.bat b/start.bat index 9931c38..9d1e95d 100644 --- a/start.bat +++ b/start.bat @@ -54,5 +54,15 @@ REM Install dependencies echo Installing dependencies... pip install -r requirements.txt --quiet +REM Ensure playwright-cli is available for browser automation +where playwright-cli >nul 2>&1 +if %ERRORLEVEL% neq 0 ( + echo Installing playwright-cli for browser automation... + call npm install -g @playwright/cli >nul 2>&1 + if %ERRORLEVEL% neq 0 ( + echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli + ) +) + REM Run the app python start.py diff --git a/start.sh b/start.sh index 25c8751..9b938af 100755 --- a/start.sh +++ b/start.sh @@ -74,5 +74,14 @@ fi echo "Installing dependencies..." pip install -r requirements.txt --quiet +# Ensure playwright-cli is available for browser automation +if ! command -v playwright-cli &> /dev/null; then + echo "Installing playwright-cli for browser automation..." + npm install -g @playwright/cli --quiet 2>/dev/null + if [ $? -ne 0 ]; then + echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli" + fi +fi + # Run the app python start.py diff --git a/start_ui.bat b/start_ui.bat index 3fc67f5..edbc60a 100644 --- a/start_ui.bat +++ b/start_ui.bat @@ -37,5 +37,15 @@ REM Install dependencies echo Installing dependencies... pip install -r requirements.txt --quiet +REM Ensure playwright-cli is available for browser automation +where playwright-cli >nul 2>&1 +if %ERRORLEVEL% neq 0 ( + echo Installing playwright-cli for browser automation... + call npm install -g @playwright/cli >nul 2>&1 + if %ERRORLEVEL% neq 0 ( + echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli + ) +) + REM Run the Python launcher python "%~dp0start_ui.py" %* diff --git a/start_ui.sh b/start_ui.sh index 4381bbe..8c63ff9 100755 --- a/start_ui.sh +++ b/start_ui.sh @@ -80,5 +80,14 @@ fi echo "Installing dependencies..." pip install -r requirements.txt --quiet +# Ensure playwright-cli is available for browser automation +if ! command -v playwright-cli &> /dev/null; then + echo "Installing playwright-cli for browser automation..." + npm install -g @playwright/cli --quiet 2>/dev/null + if [ $? -ne 0 ]; then + echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli" + fi +fi + # Run the Python launcher python start_ui.py "$@" diff --git a/temp_cleanup.py b/temp_cleanup.py index 5cfda06..5907908 100644 --- a/temp_cleanup.py +++ b/temp_cleanup.py @@ -125,14 +125,18 @@ def cleanup_stale_temp(max_age_seconds: int = MAX_AGE_SECONDS) -> dict: def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -> dict: """ - Clean up stale screenshot files from the project root. + Clean up stale Playwright CLI artifacts from the project. - Playwright browser verification can leave .png files in the project - directory. This removes them after they've aged out (default 5 minutes). + The Playwright CLI daemon saves screenshots, snapshots, and other artifacts + to `{project_dir}/.playwright-cli/`. This removes them after they've aged + out (default 5 minutes). + + Also cleans up legacy screenshot patterns from the project root (from the + old Playwright MCP server approach). Args: project_dir: Path to the project directory. - max_age_seconds: Maximum age in seconds before a screenshot is deleted. + max_age_seconds: Maximum age in seconds before an artifact is deleted. Defaults to 5 minutes (300 seconds). Returns: @@ -141,13 +145,33 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) - cutoff_time = time.time() - max_age_seconds stats: dict = {"files_deleted": 0, "bytes_freed": 0, "errors": []} - screenshot_patterns = [ + # Clean up .playwright-cli/ directory (new CLI approach) + playwright_cli_dir = project_dir / ".playwright-cli" + if playwright_cli_dir.exists(): + for item in playwright_cli_dir.iterdir(): + if not item.is_file(): + continue + try: + mtime = item.stat().st_mtime + if mtime < cutoff_time: + size = item.stat().st_size + item.unlink(missing_ok=True) + if not item.exists(): + stats["files_deleted"] += 1 + stats["bytes_freed"] += size + logger.debug(f"Deleted playwright-cli artifact: {item}") + except Exception as e: + stats["errors"].append(f"Failed to delete {item}: {e}") + logger.debug(f"Failed to delete artifact {item}: {e}") + + # Legacy cleanup: root-level screenshot patterns (from old MCP server approach) + legacy_patterns = [ "feature*-*.png", "screenshot-*.png", "step-*.png", ] - for pattern in screenshot_patterns: + for pattern in legacy_patterns: for item in project_dir.glob(pattern): if not item.is_file(): continue @@ -159,14 +183,14 @@ def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) - if not item.exists(): stats["files_deleted"] += 1 stats["bytes_freed"] += size - logger.debug(f"Deleted project screenshot: {item}") + logger.debug(f"Deleted legacy screenshot: {item}") except Exception as e: stats["errors"].append(f"Failed to delete {item}: {e}") logger.debug(f"Failed to delete screenshot {item}: {e}") if stats["files_deleted"] > 0: mb_freed = stats["bytes_freed"] / (1024 * 1024) - logger.info(f"Screenshot cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed") + logger.info(f"Artifact cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed") return stats diff --git a/test_security.py b/test_security.py index 1017d1b..ccd2346 100644 --- a/test_security.py +++ b/test_security.py @@ -25,6 +25,7 @@ from security import ( validate_chmod_command, validate_init_script, validate_pkill_command, + validate_playwright_command, validate_project_command, ) @@ -923,6 +924,70 @@ pkill_processes: return passed, failed +def test_playwright_cli_validation(): + """Test playwright-cli subcommand validation.""" + print("\nTesting playwright-cli validation:\n") + passed = 0 + failed = 0 + + # Test cases: (command, should_be_allowed, description) + test_cases = [ + # Allowed cases + ("playwright-cli screenshot", True, "screenshot allowed"), + ("playwright-cli snapshot", True, "snapshot allowed"), + ("playwright-cli click e5", True, "click with ref"), + ("playwright-cli open http://localhost:3000", True, "open URL"), + ("playwright-cli -s=agent-1 click e5", True, "session flag with click"), + ("playwright-cli close", True, "close browser"), + ("playwright-cli goto http://localhost:3000/page", True, "goto URL"), + ("playwright-cli fill e3 'test value'", True, "fill form field"), + ("playwright-cli console", True, "console messages"), + # Blocked cases + ("playwright-cli run-code 'await page.evaluate(() => {})'", False, "run-code blocked"), + ("playwright-cli eval 'document.title'", False, "eval blocked"), + ("playwright-cli -s=test eval 'document.title'", False, "eval with session flag blocked"), + ] + + for cmd, should_allow, description in test_cases: + allowed, reason = validate_playwright_command(cmd) + if allowed == should_allow: + print(f" PASS: {cmd!r} ({description})") + passed += 1 + else: + expected = "allowed" if should_allow else "blocked" + actual = "allowed" if allowed else "blocked" + print(f" FAIL: {cmd!r} ({description})") + print(f" Expected: {expected}, Got: {actual}") + if reason: + print(f" Reason: {reason}") + failed += 1 + + # Integration test: verify through the security hook + print("\n Integration tests (via security hook):\n") + + # playwright-cli screenshot should be allowed + input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli screenshot"}} + result = asyncio.run(bash_security_hook(input_data)) + if result.get("decision") != "block": + print(" PASS: playwright-cli screenshot allowed via hook") + passed += 1 + else: + print(f" FAIL: playwright-cli screenshot should be allowed: {result.get('reason')}") + failed += 1 + + # playwright-cli run-code should be blocked + input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli run-code 'code'"}} + result = asyncio.run(bash_security_hook(input_data)) + if result.get("decision") == "block": + print(" PASS: playwright-cli run-code blocked via hook") + passed += 1 + else: + print(" FAIL: playwright-cli run-code should be blocked via hook") + failed += 1 + + return passed, failed + + def main(): print("=" * 70) print(" SECURITY HOOK TESTS") @@ -991,6 +1056,11 @@ def main(): passed += pkill_passed failed += pkill_failed + # Test playwright-cli validation + pw_passed, pw_failed = test_playwright_cli_validation() + passed += pw_passed + failed += pw_failed + # Commands that SHOULD be blocked # Note: blocklisted commands (sudo, shutdown, dd, aws) are tested in # test_blocklist_enforcement(). chmod validation is tested in @@ -1012,6 +1082,9 @@ def main(): # Shell injection attempts "$(echo pkill) node", 'eval "pkill node"', + # playwright-cli dangerous subcommands + "playwright-cli run-code 'await page.goto(\"http://evil.com\")'", + "playwright-cli eval 'document.cookie'", ] for cmd in dangerous: @@ -1077,6 +1150,12 @@ def main(): "/usr/local/bin/node app.js", # Combined chmod and init.sh (integration test for both validators) "chmod +x init.sh && ./init.sh", + # Playwright CLI allowed commands + "playwright-cli open http://localhost:3000", + "playwright-cli screenshot", + "playwright-cli snapshot", + "playwright-cli click e5", + "playwright-cli -s=agent-1 close", ] for cmd in safe: diff --git a/ui/src/components/ProjectSelector.tsx b/ui/src/components/ProjectSelector.tsx index 10b4839..06eb8bf 100644 --- a/ui/src/components/ProjectSelector.tsx +++ b/ui/src/components/ProjectSelector.tsx @@ -75,6 +75,7 @@ export function ProjectSelector({ variant="outline" className="min-w-[140px] sm:min-w-[200px] justify-between" disabled={isLoading} + title={selectedProjectData?.path} > {isLoading ? ( @@ -101,6 +102,7 @@ export function ProjectSelector({ {projects.map(project => (