diff --git a/CLAUDE.md b/CLAUDE.md index a6857db..02308bc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -169,19 +169,99 @@ Projects can be stored in any directory (registered in `~/.autocoder/registry.db - `prompts/coding_prompt.md` - Continuation session prompt - `features.db` - SQLite database with feature test cases - `.agent.lock` - Lock file to prevent multiple agent instances +- `.autocoder/allowed_commands.yaml` - Project-specific bash command allowlist (optional) ### Security Model Defense-in-depth approach configured in `client.py`: 1. OS-level sandbox for bash commands 2. Filesystem restricted to project directory only -3. Bash commands validated against `ALLOWED_COMMANDS` in `security.py` +3. Bash commands validated using hierarchical allowlist system + +#### Per-Project Allowed Commands + +The agent's bash command access is controlled through a hierarchical configuration system: + +**Command Hierarchy (highest to lowest priority):** +1. **Hardcoded Blocklist** (`security.py`) - NEVER allowed (dd, sudo, shutdown, etc.) +2. **Org Blocklist** (`~/.autocoder/config.yaml`) - Cannot be overridden by projects +3. **Org Allowlist** (`~/.autocoder/config.yaml`) - Available to all projects +4. **Global Allowlist** (`security.py`) - Default commands (npm, git, curl, etc.) +5. **Project Allowlist** (`.autocoder/allowed_commands.yaml`) - Project-specific commands + +**Project Configuration:** + +Each project can define custom allowed commands in `.autocoder/allowed_commands.yaml`: + +```yaml +version: 1 +commands: + # Exact command names + - name: swift + description: Swift compiler + + # Prefix wildcards (matches swiftc, swiftlint, swiftformat) + - name: swift* + description: All Swift development tools + + # Local project scripts + - name: ./scripts/build.sh + description: Project build script +``` + +**Organization Configuration:** + +System administrators can set org-wide policies in `~/.autocoder/config.yaml`: + +```yaml +version: 1 + +# Commands available to ALL projects +allowed_commands: + - name: jq + description: JSON processor + +# Commands blocked across ALL projects (cannot be overridden) +blocked_commands: + - aws # Prevent accidental cloud operations + - kubectl # Block production deployments +``` + +**Pattern Matching:** +- Exact: `swift` matches only `swift` +- Wildcard: `swift*` matches `swift`, `swiftc`, `swiftlint`, etc. +- Scripts: `./scripts/build.sh` matches the script by name from any directory + +**Limits:** +- Maximum 50 commands per project config +- Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed +- Org-level blocked commands cannot be overridden by project configs + +**Testing:** +```bash +# Unit tests (136 tests - fast) +python test_security.py + +# Integration tests (9 tests - uses real hooks) +python test_security_integration.py +``` + +**Files:** +- `security.py` - Command validation logic and hardcoded blocklist +- `test_security.py` - Unit tests for security system (136 tests) +- `test_security_integration.py` - Integration tests with real hooks (9 tests) +- `TEST_SECURITY.md` - Quick testing reference guide +- `examples/project_allowed_commands.yaml` - Project config example (all commented by default) +- `examples/org_config.yaml` - Org config example (all commented by default) +- `examples/README.md` - Comprehensive guide with use cases, testing, and troubleshooting +- `PHASE3_SPEC.md` - Specification for mid-session approval feature (future enhancement) ## Claude Code Integration - `.claude/commands/create-spec.md` - `/create-spec` slash command for interactive spec creation - `.claude/skills/frontend-design/SKILL.md` - Skill for distinctive UI design - `.claude/templates/` - Prompt templates copied to new projects +- `examples/` - Configuration examples and documentation for security settings ## Key Patterns diff --git a/PHASE3_SPEC.md b/PHASE3_SPEC.md new file mode 100644 index 0000000..7c0c64b --- /dev/null +++ b/PHASE3_SPEC.md @@ -0,0 +1,1591 @@ +# Phase 3: Mid-Session Command Approval - Implementation Specification + +**Status:** Not yet implemented (Phases 1 & 2 complete) +**Estimated Effort:** 2-3 days for experienced developer +**Priority:** Medium (nice-to-have, not blocking) + +--- + +## Table of Contents + +- [Executive Summary](#executive-summary) +- [User Experience](#user-experience) +- [Technical Architecture](#technical-architecture) +- [Implementation Checklist](#implementation-checklist) +- [Detailed Implementation Guide](#detailed-implementation-guide) +- [Testing Strategy](#testing-strategy) +- [Security Considerations](#security-considerations) +- [Future Enhancements](#future-enhancements) + +--- + +## Executive Summary + +### What is Phase 3? + +Phase 3 adds **mid-session approval** for bash commands that aren't in the allowlist. Instead of immediately blocking unknown commands, the agent can request user approval in real-time. + +### Current State (Phases 1 & 2) + +The agent can only run commands that are: +1. In the hardcoded allowlist (npm, git, ls, etc.) +2. In project config (`.autocoder/allowed_commands.yaml`) +3. In org config (`~/.autocoder/config.yaml`) + +If the agent tries an unknown command → **immediately blocked**. + +### Phase 3 Vision + +If the agent tries an unknown command → **request approval**: +- **CLI mode**: Rich TUI overlay shows approval dialog +- **UI mode**: React banner/toast prompts user +- **User decides**: Session-only, Permanent (save to YAML), or Deny +- **Timeout**: Auto-deny after 5 minutes (configurable) + +### Benefits + +1. **Flexibility**: Don't need to pre-configure every possible command +2. **Discovery**: See what commands the agent actually needs +3. **Safety**: Still requires explicit approval (not automatic) +4. **Persistence**: Can save approved commands to config for future sessions + +### Non-Goals + +- **NOT** auto-approval (always requires user confirmation) +- **NOT** bypassing hardcoded blocklist (sudo, dd, etc. are NEVER allowed) +- **NOT** bypassing org-level blocklist (those remain final) + +--- + +## User Experience + +### CLI Mode Flow + +``` +Agent is working... +Agent tries: xcodebuild -project MyApp.xcodeproj + +┌─────────────────────────────────────────────────────────────┐ +│ ⚠️ COMMAND APPROVAL REQUIRED │ +├─────────────────────────────────────────────────────────────┤ +│ The agent is requesting permission to run: │ +│ │ +│ xcodebuild -project MyApp.xcodeproj │ +│ │ +│ This command is not in your allowed commands list. │ +│ │ +│ Options: │ +│ [S] Allow for this Session only │ +│ [P] Allow Permanently (save to config) │ +│ [D] Deny (default in 5 minutes) │ +│ │ +│ Your choice (S/P/D): │ +└─────────────────────────────────────────────────────────────┘ +``` + +**For dangerous commands** (aws, kubectl, sudo*): + +``` +╔═══════════════════════════════════════════════════════════════╗ +║ ⚠️ DANGER: PRIVILEGED COMMAND REQUESTED ║ +╠═══════════════════════════════════════════════════════════════╣ +║ The agent is requesting: aws s3 ls ║ +║ ║ +║ aws is a CLOUD CLI that can: ║ +║ • Access production infrastructure ║ +║ • Modify or delete cloud resources ║ +║ • Incur significant costs ║ +║ ║ +║ This action could have SERIOUS consequences. ║ +║ ║ +║ Type CONFIRM to allow, or press Enter to deny: ║ +╚═══════════════════════════════════════════════════════════════╝ +``` + +*Note: sudo would still be in hardcoded blocklist, but this shows the UX pattern + +### UI Mode Flow + +**React UI Banner** (top of screen): + +``` +┌─────────────────────────────────────────────────────────────┐ +│ ⚠️ Agent requesting permission: xcodebuild │ +│ │ +│ [Session Only] [Save to Config] [Deny] │ +│ │ +│ Auto-denies in: 4:32 │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Multiple requests queued:** + +``` +┌─────────────────────────────────────────────────────────────┐ +│ ⚠️ 3 approval requests pending │ +│ │ +│ 1. xcodebuild -project MyApp.xcodeproj │ +│ [Session] [Save] [Deny] │ +│ │ +│ 2. swift package resolve │ +│ [Session] [Save] [Deny] │ +│ │ +│ 3. xcrun simctl list devices │ +│ [Session] [Save] [Deny] │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Response Behavior + +| User Action | Agent Behavior | Config Updated | +|-------------|----------------|----------------| +| Session Only | Command allowed this session | No | +| Permanent | Command allowed forever | Yes - appended to YAML | +| Deny | Command blocked, agent sees error | No | +| Timeout (5 min) | Command blocked, agent sees timeout | No | + +--- + +## Technical Architecture + +### Data Flow + +``` +┌─────────────────────────────────────────────────────────────┐ +│ 1. Agent tries command: xcodebuild │ +└────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 2. bash_security_hook() checks allowlist │ +│ → Not found, not in blocklist │ +└────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 3. Hook returns: {"decision": "pending", │ +│ "request_id": "req_123", │ +│ "command": "xcodebuild"} │ +└────────────────────┬────────────────────────────────────────┘ + │ + ┌──────────┴──────────┐ + │ │ + ▼ ▼ +┌─────────────────────┐ ┌─────────────────────┐ +│ CLI Mode │ │ UI Mode │ +│ │ │ │ +│ approval_tui.py │ │ WebSocket message │ +│ shows Rich dialog │ │ → React banner │ +└──────────┬──────────┘ └──────────┬──────────┘ + │ │ + └────────┬───────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 4. User responds: "session" / "permanent" / "deny" │ +└────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 5. approval_manager.respond(request_id, decision) │ +│ → If permanent: persist_command() │ +│ → If session: add to in-memory set │ +└────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 6. Hook gets response, returns to agent: │ +│ → "allow" or "block" │ +└─────────────────────────────────────────────────────────────┘ +``` + +### State Management + +**ApprovalManager** (new class in `security.py`): + +```python +class ApprovalManager: + """ + Manages pending approval requests and responses. + Thread-safe for concurrent access. + """ + + def __init__(self): + self._pending: Dict[str, PendingRequest] = {} + self._session_allowed: Set[str] = set() + self._lock = threading.Lock() + + def request_approval( + self, + command: str, + is_dangerous: bool = False + ) -> str: + """ + Create a new approval request. + Returns request_id. + """ + ... + + def wait_for_response( + self, + request_id: str, + timeout_seconds: int = 300 + ) -> ApprovalDecision: + """ + Block until user responds or timeout. + Returns: "allow_session", "allow_permanent", "deny", "timeout" + """ + ... + + def respond( + self, + request_id: str, + decision: ApprovalDecision + ): + """ + Called by UI/CLI to respond to a request. + """ + ... +``` + +### File Locking for Persistence + +When user chooses "Permanent", append to YAML with exclusive file lock: + +```python +import fcntl # Unix +import msvcrt # Windows + +def persist_command(project_dir: Path, command: str, description: str = None): + """ + Atomically append command to project YAML. + Uses platform-specific file locking. + """ + config_path = project_dir / ".autocoder" / "allowed_commands.yaml" + + # Ensure file exists + if not config_path.exists(): + config_path.write_text("version: 1\ncommands: []\n") + + with open(config_path, "r+") as f: + # Acquire exclusive lock + if sys.platform == "win32": + msvcrt.locking(f.fileno(), msvcrt.LK_LOCK, 1) + else: + fcntl.flock(f.fileno(), fcntl.LOCK_EX) + + try: + # Load current config + config = yaml.safe_load(f) or {"version": 1, "commands": []} + + # Add new command + new_entry = {"name": command} + if description: + new_entry["description"] = description + + config.setdefault("commands", []).append(new_entry) + + # Validate doesn't exceed 50 commands + if len(config["commands"]) > 50: + raise ValueError("Cannot add command: 50 command limit reached") + + # Write back + f.seek(0) + f.truncate() + yaml.dump(config, f, default_flow_style=False) + + finally: + # Release lock + if sys.platform == "win32": + msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1) + else: + fcntl.flock(f.fileno(), fcntl.LOCK_UN) +``` + +--- + +## Implementation Checklist + +### Core Security Module + +- [ ] Create `ApprovalManager` class in `security.py` + - [ ] Thread-safe pending request storage + - [ ] Session-only allowed commands set + - [ ] Timeout handling with threading.Timer + - [ ] Request/response API + +- [ ] Modify `bash_security_hook()` to support pending state + - [ ] Check if command needs approval + - [ ] Create approval request + - [ ] Wait for response (with timeout) + - [ ] Return appropriate decision + +- [ ] Implement `persist_command()` with file locking + - [ ] Platform-specific locking (fcntl/msvcrt) + - [ ] Atomic YAML append + - [ ] 50 command limit validation + - [ ] Auto-generate description if not provided + +- [ ] Add `is_dangerous_command()` helper + - [ ] Check against DANGEROUS_COMMANDS set + - [ ] Return emphatic warning text + +- [ ] Update DANGEROUS_COMMANDS set + - [ ] Move from hardcoded blocklist to dangerous list + - [ ] Commands: aws, gcloud, az, kubectl, docker-compose + - [ ] Keep sudo, dd, etc. in BLOCKED_COMMANDS (never allowed) + +### CLI Approval Interface + +- [ ] Create `approval_tui.py` module + - [ ] Use Rich library for TUI + - [ ] Overlay design (doesn't clear screen) + - [ ] Keyboard input handling (S/P/D keys) + - [ ] Timeout display (countdown timer) + - [ ] Different layouts for normal vs dangerous commands + +- [ ] Integrate with agent.py + - [ ] Detect if running in CLI mode (not UI) + - [ ] Pass approval callback to client + - [ ] Handle approval responses + +- [ ] Add `rich` to requirements.txt + - [ ] Version: `rich>=13.0.0` + +### React UI Components + +- [ ] Create `ApprovalBanner.tsx` component + - [ ] Banner at top of screen + - [ ] Queue multiple requests + - [ ] Session/Permanent/Deny buttons + - [ ] Countdown timer display + - [ ] Dangerous command warning variant + +- [ ] Update `useWebSocket.ts` hook + - [ ] Handle `approval_request` message type + - [ ] Send `approval_response` message + - [ ] Queue management for multiple requests + +- [ ] Update WebSocket message types in `types.ts` + ```typescript + type ApprovalRequest = { + request_id: string; + command: string; + is_dangerous: boolean; + timeout_seconds: number; + warning_text?: string; + }; + + type ApprovalResponse = { + request_id: string; + decision: "session" | "permanent" | "deny"; + }; + ``` + +### Backend WebSocket Integration + +- [ ] Update `server/routers/agent.py` + - [ ] Add `approval_request` message sender + - [ ] Add `approval_response` message handler + - [ ] Wire to ApprovalManager + +- [ ] Thread-safe WebSocket message queue + - [ ] Handle approval requests from agent thread + - [ ] Handle approval responses from WebSocket thread + +### MCP Tool for Agent Introspection + +- [ ] Add `list_allowed_commands` tool to feature MCP + - [ ] Returns current allowed commands + - [ ] Indicates which are from project/org/global + - [ ] Shows if approval is available + - [ ] Agent can proactively query before trying commands + +- [ ] Tool response format: + ```python + { + "commands": [ + {"name": "swift", "source": "project"}, + {"name": "npm", "source": "global"}, + {"name": "jq", "source": "org"} + ], + "blocked_count": 15, + "can_request_approval": True, + "approval_timeout_minutes": 5 + } + ``` + +### Configuration + +- [ ] Add approval settings to org config + - [ ] `approval_timeout_minutes` (default: 5) + - [ ] `approval_enabled` (default: true) + - [ ] `dangerous_command_requires_confirmation` (default: true) + +- [ ] Validate org config settings + - [ ] Timeout must be 1-30 minutes + - [ ] Boolean flags properly typed + +### Testing + +- [ ] Unit tests for ApprovalManager + - [ ] Request creation + - [ ] Response handling + - [ ] Timeout behavior + - [ ] Thread safety + +- [ ] Unit tests for file locking + - [ ] Concurrent append operations + - [ ] Platform-specific locking + - [ ] Error handling + +- [ ] Integration tests for approval flow + - [ ] CLI approval (mocked input) + - [ ] WebSocket approval (mocked messages) + - [ ] Session vs permanent vs deny + - [ ] Timeout scenarios + +- [ ] UI component tests + - [ ] ApprovalBanner rendering + - [ ] Queue management + - [ ] Button interactions + - [ ] Timer countdown + +### Documentation + +- [ ] Update `CLAUDE.md` + - [ ] Document approval flow + - [ ] Update security model section + - [ ] Add Phase 3 to architecture + +- [ ] Update `examples/README.md` + - [ ] Add mid-session approval examples + - [ ] Document timeout configuration + - [ ] Troubleshooting approval issues + +- [ ] Create user guide for approvals + - [ ] When/why to use session vs permanent + - [ ] How to handle dangerous commands + - [ ] Keyboard shortcuts for CLI + +--- + +## Detailed Implementation Guide + +### Step 1: Core ApprovalManager (2-3 hours) + +**File:** `security.py` + +```python +from dataclasses import dataclass +from enum import Enum +import threading +import time +from typing import Dict, Set, Optional +import uuid + +class ApprovalDecision(Enum): + ALLOW_SESSION = "session" + ALLOW_PERMANENT = "permanent" + DENY = "deny" + TIMEOUT = "timeout" + +@dataclass +class PendingRequest: + request_id: str + command: str + is_dangerous: bool + timestamp: float + response_event: threading.Event + decision: Optional[ApprovalDecision] = None + +class ApprovalManager: + """ + Singleton manager for approval requests. + Thread-safe for concurrent access from agent and UI. + """ + + _instance = None + _lock = threading.Lock() + + def __new__(cls): + if cls._instance is None: + with cls._lock: + if cls._instance is None: + cls._instance = super().__new__(cls) + cls._instance._initialized = False + return cls._instance + + def __init__(self): + if self._initialized: + return + + self._pending: Dict[str, PendingRequest] = {} + self._session_allowed: Set[str] = set() + self._state_lock = threading.Lock() + self._initialized = True + + def request_approval( + self, + command: str, + is_dangerous: bool = False, + timeout_seconds: int = 300 + ) -> str: + """ + Create a new approval request. + + Args: + command: The command needing approval + is_dangerous: True if command is in DANGEROUS_COMMANDS + timeout_seconds: How long to wait before auto-deny + + Returns: + request_id to use for waiting/responding + """ + request_id = f"req_{uuid.uuid4().hex[:8]}" + + with self._state_lock: + request = PendingRequest( + request_id=request_id, + command=command, + is_dangerous=is_dangerous, + timestamp=time.time(), + response_event=threading.Event() + ) + self._pending[request_id] = request + + # Start timeout timer + timer = threading.Timer( + timeout_seconds, + self._handle_timeout, + args=[request_id] + ) + timer.daemon = True + timer.start() + + # Emit notification (CLI or WebSocket) + self._emit_approval_request(request) + + return request_id + + def wait_for_response( + self, + request_id: str, + timeout_seconds: int = 300 + ) -> ApprovalDecision: + """ + Block until user responds or timeout. + + Returns: + ApprovalDecision (session/permanent/deny/timeout) + """ + with self._state_lock: + request = self._pending.get(request_id) + if not request: + return ApprovalDecision.DENY + + # Wait for response event + request.response_event.wait(timeout=timeout_seconds) + + with self._state_lock: + request = self._pending.get(request_id) + if not request or not request.decision: + return ApprovalDecision.TIMEOUT + + decision = request.decision + + # Handle permanent approval + if decision == ApprovalDecision.ALLOW_PERMANENT: + # This will be handled by caller (needs project_dir) + pass + elif decision == ApprovalDecision.ALLOW_SESSION: + self._session_allowed.add(request.command) + + # Clean up + del self._pending[request_id] + + return decision + + def respond( + self, + request_id: str, + decision: ApprovalDecision + ): + """ + Called by UI/CLI to respond to a request. + """ + with self._state_lock: + request = self._pending.get(request_id) + if not request: + return + + request.decision = decision + request.response_event.set() + + def is_session_allowed(self, command: str) -> bool: + """Check if command was approved for this session.""" + with self._state_lock: + return command in self._session_allowed + + def _handle_timeout(self, request_id: str): + """Called by timer thread when request times out.""" + self.respond(request_id, ApprovalDecision.TIMEOUT) + + def _emit_approval_request(self, request: PendingRequest): + """ + Emit approval request to CLI or WebSocket. + To be implemented based on execution mode. + """ + # This is called by approval_callback in client.py + pass + +# Global singleton instance +_approval_manager = ApprovalManager() + +def get_approval_manager() -> ApprovalManager: + """Get the global ApprovalManager singleton.""" + return _approval_manager +``` + +### Step 2: Modify bash_security_hook (1 hour) + +**File:** `security.py` + +```python +async def bash_security_hook(input_data, tool_use_id=None, context=None): + """ + Pre-tool-use hook that validates bash commands. + + Phase 3: Supports mid-session approval for unknown commands. + """ + if input_data.get("tool_name") != "Bash": + return {} + + command = input_data.get("tool_input", {}).get("command", "") + if not command: + return {} + + # Extract commands + commands = extract_commands(command) + if not commands: + return { + "decision": "block", + "reason": f"Could not parse command: {command}", + } + + # Get project directory and effective commands + project_dir = None + if context and isinstance(context, dict): + project_dir_str = context.get("project_dir") + if project_dir_str: + project_dir = Path(project_dir_str) + + allowed_commands, blocked_commands = get_effective_commands(project_dir) + segments = split_command_segments(command) + + # Check each command + for cmd in commands: + # Check blocklist (highest priority) + if cmd in blocked_commands: + return { + "decision": "block", + "reason": f"Command '{cmd}' is blocked and cannot be approved.", + } + + # Check if allowed (allowlist or session) + approval_mgr = get_approval_manager() + if is_command_allowed(cmd, allowed_commands) or approval_mgr.is_session_allowed(cmd): + # Additional validation for sensitive commands + if cmd in COMMANDS_NEEDING_EXTRA_VALIDATION: + cmd_segment = get_command_for_validation(cmd, segments) + # ... existing validation code ... + continue + + # PHASE 3: Request approval + is_dangerous = cmd in DANGEROUS_COMMANDS + request_id = approval_mgr.request_approval( + command=cmd, + is_dangerous=is_dangerous, + timeout_seconds=300 # TODO: Get from org config + ) + + decision = approval_mgr.wait_for_response(request_id) + + if decision == ApprovalDecision.DENY: + return { + "decision": "block", + "reason": f"Command '{cmd}' was denied.", + } + elif decision == ApprovalDecision.TIMEOUT: + return { + "decision": "block", + "reason": f"Command '{cmd}' was denied (approval timeout after 5 minutes).", + } + elif decision == ApprovalDecision.ALLOW_PERMANENT: + # Persist to YAML + if project_dir: + try: + persist_command( + project_dir, + cmd, + description=f"Added via mid-session approval" + ) + except Exception as e: + # If persist fails, still allow for session + print(f"Warning: Could not save to config: {e}") + # If ALLOW_SESSION, already added to session set by wait_for_response + + return {} # Allow +``` + +### Step 3: CLI Approval Interface (3-4 hours) + +**File:** `approval_tui.py` + +```python +""" +CLI approval interface using Rich library. +Displays an overlay when approval is needed. +""" + +from rich.console import Console +from rich.panel import Panel +from rich.prompt import Prompt +from rich.live import Live +from rich.text import Text +import sys +import threading +import time + +console = Console() + +def show_approval_dialog( + command: str, + is_dangerous: bool, + timeout_seconds: int, + on_response: callable +): + """ + Show approval dialog in CLI. + + Args: + command: The command requesting approval + is_dangerous: True if dangerous command + timeout_seconds: Timeout in seconds + on_response: Callback(decision: str) - "session"/"permanent"/"deny" + """ + + if is_dangerous: + _show_dangerous_dialog(command, timeout_seconds, on_response) + else: + _show_normal_dialog(command, timeout_seconds, on_response) + +def _show_normal_dialog(command: str, timeout_seconds: int, on_response: callable): + """Standard approval dialog.""" + + start_time = time.time() + + while True: + elapsed = time.time() - start_time + remaining = timeout_seconds - elapsed + + if remaining <= 0: + on_response("deny") + console.print("[red]⏱️ Request timed out - command denied[/red]") + return + + # Build dialog + content = f"""[bold yellow]⚠️ COMMAND APPROVAL REQUIRED[/bold yellow] + +The agent is requesting permission to run: + + [cyan]{command}[/cyan] + +This command is not in your allowed commands list. + +Options: + [green][S][/green] Allow for this [green]Session only[/green] + [blue][P][/blue] Allow [blue]Permanently[/blue] (save to config) + [red][D][/red] [red]Deny[/red] (default in {int(remaining)}s) + +Your choice (S/P/D): """ + + console.print(Panel(content, border_style="yellow", expand=False)) + + # Get input with timeout + choice = _get_input_with_timeout("", timeout=1.0) + + if choice: + choice = choice.upper() + if choice == "S": + on_response("session") + console.print("[green]✅ Allowed for this session[/green]") + return + elif choice == "P": + on_response("permanent") + console.print("[blue]✅ Saved to config permanently[/blue]") + return + elif choice == "D": + on_response("deny") + console.print("[red]❌ Command denied[/red]") + return + else: + console.print("[yellow]Invalid choice. Use S, P, or D.[/yellow]") + +def _show_dangerous_dialog(command: str, timeout_seconds: int, on_response: callable): + """Emphatic dialog for dangerous commands.""" + + # Determine warning text based on command + warnings = { + "aws": "AWS CLI can:\n • Access production infrastructure\n • Modify or delete cloud resources\n • Incur significant costs", + "gcloud": "Google Cloud CLI can:\n • Access production GCP resources\n • Modify or delete cloud infrastructure\n • Incur significant costs", + "kubectl": "Kubernetes CLI can:\n • Access production clusters\n • Deploy or delete workloads\n • Disrupt running services", + } + + cmd_name = command.split()[0] + warning = warnings.get(cmd_name, "This command can make significant system changes.") + + content = f"""[bold red on white] ⚠️ DANGER: PRIVILEGED COMMAND REQUESTED [/bold red on white] + +The agent is requesting: [red bold]{command}[/red bold] + +[yellow]{warning}[/yellow] + +[bold]This action could have SERIOUS consequences.[/bold] + +Type [bold]CONFIRM[/bold] to allow, or press Enter to deny:""" + + console.print(Panel(content, border_style="red", expand=False)) + + confirmation = Prompt.ask("", default="deny") + + if confirmation.upper() == "CONFIRM": + # Ask session vs permanent + choice = Prompt.ask( + "Allow for [S]ession or [P]ermanent?", + choices=["S", "P", "s", "p"], + default="S" + ) + if choice.upper() == "P": + on_response("permanent") + console.print("[blue]✅ Saved to config permanently[/blue]") + else: + on_response("session") + console.print("[green]✅ Allowed for this session[/green]") + else: + on_response("deny") + console.print("[red]❌ Command denied[/red]") + +def _get_input_with_timeout(prompt: str, timeout: float) -> str: + """ + Get input with timeout (non-blocking). + Returns empty string if timeout. + """ + import select + + sys.stdout.write(prompt) + sys.stdout.flush() + + # Check if input available (Unix only, Windows needs different approach) + if sys.platform != "win32": + ready, _, _ = select.select([sys.stdin], [], [], timeout) + if ready: + return sys.stdin.readline().strip() + else: + # Windows: use msvcrt.kbhit() and msvcrt.getch() + import msvcrt + start = time.time() + chars = [] + while time.time() - start < timeout: + if msvcrt.kbhit(): + char = msvcrt.getch() + if char == b'\r': # Enter + return ''.join(chars) + elif char == b'\x08': # Backspace + if chars: + chars.pop() + sys.stdout.write('\b \b') + else: + chars.append(char.decode('utf-8')) + sys.stdout.write(char.decode('utf-8')) + time.sleep(0.01) + + return "" +``` + +### Step 4: React UI Components (4-5 hours) + +**File:** `ui/src/components/ApprovalBanner.tsx` + +```tsx +import React, { useState, useEffect } from 'react'; +import { X, AlertTriangle, Clock } from 'lucide-react'; + +interface ApprovalRequest { + request_id: string; + command: string; + is_dangerous: boolean; + timeout_seconds: number; + warning_text?: string; + timestamp: number; +} + +interface ApprovalBannerProps { + requests: ApprovalRequest[]; + onRespond: (requestId: string, decision: 'session' | 'permanent' | 'deny') => void; +} + +export function ApprovalBanner({ requests, onRespond }: ApprovalBannerProps) { + const [remainingTimes, setRemainingTimes] = useState>({}); + + // Update countdown timers + useEffect(() => { + const interval = setInterval(() => { + const now = Date.now(); + const newTimes: Record = {}; + + requests.forEach(req => { + const elapsed = (now - req.timestamp) / 1000; + const remaining = Math.max(0, req.timeout_seconds - elapsed); + newTimes[req.request_id] = remaining; + + // Auto-deny on timeout + if (remaining === 0) { + onRespond(req.request_id, 'deny'); + } + }); + + setRemainingTimes(newTimes); + }, 100); + + return () => clearInterval(interval); + }, [requests, onRespond]); + + if (requests.length === 0) return null; + + const formatTime = (seconds: number): string => { + const mins = Math.floor(seconds / 60); + const secs = Math.floor(seconds % 60); + return `${mins}:${secs.toString().padStart(2, '0')}`; + }; + + return ( +
+
+ {requests.length === 1 ? ( + + ) : ( + + )} +
+
+ ); +} + +function SingleRequestView({ + request, + remaining, + onRespond, + formatTime, +}: { + request: ApprovalRequest; + remaining: number; + onRespond: (requestId: string, decision: 'session' | 'permanent' | 'deny') => void; + formatTime: (seconds: number) => string; +}) { + const isDangerous = request.is_dangerous; + + return ( +
+ {isDangerous && ( +
+ + DANGER: PRIVILEGED COMMAND +
+ )} + +
+
+
+ Agent requesting permission: + + {request.command} + +
+ + {request.warning_text && ( +

+ {request.warning_text} +

+ )} +
+ +
+ + + + + + +
+ + {formatTime(remaining)} +
+
+
+
+ ); +} + +function MultipleRequestsView({ + requests, + remainingTimes, + onRespond, + formatTime, +}: { + requests: ApprovalRequest[]; + remainingTimes: Record; + onRespond: (requestId: string, decision: 'session' | 'permanent' | 'deny') => void; + formatTime: (seconds: number) => string; +}) { + return ( +
+
+ ⚠️ {requests.length} approval requests pending +
+ +
+ {requests.map(req => ( +
+ + {req.command} + + +
+ + + + + + + + {formatTime(remainingTimes[req.request_id] || 0)} + +
+
+ ))} +
+
+ ); +} +``` + +**File:** `ui/src/hooks/useWebSocket.ts` (add approval handling) + +```typescript +// Add to message types +type ApprovalRequestMessage = { + type: 'approval_request'; + request_id: string; + command: string; + is_dangerous: boolean; + timeout_seconds: number; + warning_text?: string; +}; + +// Add to useWebSocket hook +const [approvalRequests, setApprovalRequests] = useState([]); + +// In message handler +if (data.type === 'approval_request') { + setApprovalRequests(prev => [ + ...prev, + { + ...data, + timestamp: Date.now(), + }, + ]); +} + +// Approval response function +const respondToApproval = useCallback( + (requestId: string, decision: 'session' | 'permanent' | 'deny') => { + if (ws.current?.readyState === WebSocket.OPEN) { + ws.current.send( + JSON.stringify({ + type: 'approval_response', + request_id: requestId, + decision, + }) + ); + } + + // Remove from queue + setApprovalRequests(prev => + prev.filter(req => req.request_id !== requestId) + ); + }, + [] +); + +return { + // ... existing returns + approvalRequests, + respondToApproval, +}; +``` + +### Step 5: Backend WebSocket (2-3 hours) + +**File:** `server/routers/agent.py` + +```python +# Add to WebSocket message handlers + +async def handle_approval_response(websocket: WebSocket, data: dict): + """ + Handle approval response from UI. + + Message format: + { + "type": "approval_response", + "request_id": "req_abc123", + "decision": "session" | "permanent" | "deny" + } + """ + request_id = data.get("request_id") + decision = data.get("decision") + + if not request_id or not decision: + return + + # Convert string to enum + decision_map = { + "session": ApprovalDecision.ALLOW_SESSION, + "permanent": ApprovalDecision.ALLOW_PERMANENT, + "deny": ApprovalDecision.DENY, + } + + approval_decision = decision_map.get(decision, ApprovalDecision.DENY) + + # Respond to approval manager + from security import get_approval_manager + approval_mgr = get_approval_manager() + approval_mgr.respond(request_id, approval_decision) + + +async def send_approval_request( + websocket: WebSocket, + request_id: str, + command: str, + is_dangerous: bool, + timeout_seconds: int, + warning_text: str = None +): + """ + Send approval request to UI via WebSocket. + """ + await websocket.send_json({ + "type": "approval_request", + "request_id": request_id, + "command": command, + "is_dangerous": is_dangerous, + "timeout_seconds": timeout_seconds, + "warning_text": warning_text, + }) +``` + +--- + +## Testing Strategy + +### Unit Tests + +**File:** `test_approval.py` + +```python +def test_approval_manager_request(): + """Test creating approval request.""" + mgr = ApprovalManager() + request_id = mgr.request_approval("swift", is_dangerous=False) + assert request_id.startswith("req_") + +def test_approval_manager_respond(): + """Test responding to approval.""" + mgr = ApprovalManager() + request_id = mgr.request_approval("swift", is_dangerous=False, timeout_seconds=1) + + # Respond in separate thread + import threading + def respond(): + time.sleep(0.1) + mgr.respond(request_id, ApprovalDecision.ALLOW_SESSION) + + t = threading.Thread(target=respond) + t.start() + + decision = mgr.wait_for_response(request_id, timeout_seconds=2) + assert decision == ApprovalDecision.ALLOW_SESSION + t.join() + +def test_approval_timeout(): + """Test approval timeout.""" + mgr = ApprovalManager() + request_id = mgr.request_approval("swift", is_dangerous=False, timeout_seconds=1) + + # Don't respond, let it timeout + decision = mgr.wait_for_response(request_id, timeout_seconds=2) + assert decision == ApprovalDecision.TIMEOUT + +def test_session_allowed(): + """Test session-allowed commands.""" + mgr = ApprovalManager() + assert not mgr.is_session_allowed("swift") + + # Approve for session + request_id = mgr.request_approval("swift", is_dangerous=False, timeout_seconds=1) + mgr.respond(request_id, ApprovalDecision.ALLOW_SESSION) + mgr.wait_for_response(request_id) + + assert mgr.is_session_allowed("swift") +``` + +### Integration Tests + +**File:** `test_security_integration.py` (add Phase 3 tests) + +```python +def test_approval_flow_session(): + """Test mid-session approval with session-only.""" + # Create project with no config + # Mock approval response: session + # Try command → should be allowed + # Try same command again → should still be allowed (session) + pass + +def test_approval_flow_permanent(): + """Test mid-session approval with permanent save.""" + # Create project with empty config + # Mock approval response: permanent + # Try command → should be allowed + # Check YAML file → command should be added + # Create new session → command should still be allowed + pass + +def test_approval_flow_deny(): + """Test mid-session approval denial.""" + # Create project + # Mock approval response: deny + # Try command → should be blocked + pass + +def test_approval_timeout(): + """Test approval timeout auto-deny.""" + # Create project + # Don't respond to approval + # Wait for timeout + # Command should be blocked with timeout message + pass + +def test_concurrent_approvals(): + """Test multiple simultaneous approval requests.""" + # Create project + # Try 3 commands at once + # All should queue + # Respond to each individually + # Verify all handled correctly + pass +``` + +### Manual Testing Checklist + +- [ ] CLI mode: Request approval for unknown command +- [ ] CLI mode: Press S → command works this session +- [ ] CLI mode: Press P → command saved to YAML +- [ ] CLI mode: Press D → command denied +- [ ] CLI mode: Wait 5 minutes → timeout, command denied +- [ ] CLI mode: Dangerous command shows emphatic warning +- [ ] UI mode: Banner appears at top +- [ ] UI mode: Click "Session Only" → command works +- [ ] UI mode: Click "Save to Config" → YAML updated +- [ ] UI mode: Click "Deny" → command blocked +- [ ] UI mode: Multiple requests → all shown in queue +- [ ] UI mode: Countdown timer updates +- [ ] Concurrent access: Multiple agents, file locking works +- [ ] Config validation: 50 command limit enforced +- [ ] Session persistence: Session commands available until restart +- [ ] Permanent persistence: Saved commands available after restart + +--- + +## Security Considerations + +### 1. Hardcoded Blocklist is Final + +**NEVER** allow approval for hardcoded blocklist commands: +- `sudo`, `su`, `doas` +- `dd`, `mkfs`, `fdisk` +- `shutdown`, `reboot`, `halt` +- etc. + +These bypass approval entirely - immediate block. + +### 2. Org Blocklist Cannot Be Overridden + +If org config blocks a command, approval is not even requested. + +### 3. Dangerous Commands Require Extra Confirmation + +Commands like `aws`, `kubectl` should: +- Show emphatic warning +- Require typing "CONFIRM" (not just button click) +- Explain potential consequences + +### 4. Timeout is Critical + +Default 5-minute timeout prevents: +- Stale approval requests +- Forgotten dialogs +- Unattended approval accumulation + +### 5. Session vs Permanent + +**Session-only:** +- ✅ Safe for experimentation +- ✅ Doesn't persist across restarts +- ✅ Good for one-off commands + +**Permanent:** +- ⚠️ Saved to YAML forever +- ⚠️ Available to all future sessions +- ⚠️ User should understand impact + +### 6. File Locking is Essential + +Multiple agents or concurrent modifications require: +- Exclusive file locks (fcntl/msvcrt) +- Atomic read-modify-write +- Proper error handling + +Without locking → race conditions → corrupted YAML + +### 7. Audit Trail + +Consider logging all approval decisions: +``` +[2026-01-22 10:30:45] User approved 'swift' (session-only) +[2026-01-22 10:32:12] User approved 'xcodebuild' (permanent) +[2026-01-22 10:35:00] Approval timeout for 'wget' (denied) +``` + +--- + +## Future Enhancements + +Beyond Phase 3 scope, but possible extensions: + +### 1. Approval Profiles + +Pre-defined approval sets: +```yaml +profiles: + ios-dev: + - swift* + - xcodebuild + - xcrun + + rust-dev: + - cargo + - rustc + - clippy +``` + +User can activate profile with one click. + +### 2. Smart Recommendations + +Agent AI suggests commands to add based on: +- Project type detection (iOS, Rust, Python) +- Frequently denied commands +- Similar projects + +### 3. Approval History + +Show past approvals in UI: +- What was approved +- When +- Session vs permanent +- By which agent + +### 4. Bulk Approve/Deny + +When agent requests multiple commands: +- "Approve all for session" +- "Save all to config" +- "Deny all" + +### 5. Temporary Time-Based Approval + +"Allow for next 1 hour" option: +- Not session-only (survives restarts) +- Not permanent (expires) +- Good for contractors/temporary access + +### 6. Command Arguments Validation + +Phase 1 has placeholder, could be fully implemented: +```yaml +- name: rm + description: Remove files + args_whitelist: + - "-rf ./build/*" + - "-rf ./dist/*" +``` + +### 7. Remote Approval + +For team environments: +- Agent requests approval +- Notification sent to team lead +- Lead approves/denies remotely +- Agent proceeds based on decision + +--- + +## Questions for Implementer + +Before starting Phase 3, consider: + +1. **CLI vs UI priority?** + - Implement CLI first (simpler)? + - Or UI first (more users)? + +2. **Approval persistence format?** + - Separate log file for audit trail? + - Just YAML modifications? + +3. **Dangerous commands list?** + - Current list correct? + - Need org-specific dangerous commands? + +4. **Timeout default?** + - 5 minutes reasonable? + - Different for dangerous commands? + +5. **UI placement?** + - Top banner (blocks view)? + - Modal dialog (more prominent)? + - Sidebar notification? + +6. **Multiple agents?** + - How to attribute approvals? + - Show which agent requested? + +7. **Undo permanent approvals?** + - UI for removing saved commands? + - Or manual YAML editing only? + +--- + +## Success Criteria + +Phase 3 is complete when: + +- ✅ Agent can request approval for unknown commands +- ✅ CLI shows Rich TUI dialog with countdown +- ✅ UI shows React banner with buttons +- ✅ Session-only approval works (in-memory) +- ✅ Permanent approval persists to YAML +- ✅ Dangerous commands show emphatic warnings +- ✅ Timeout auto-denies after configured time +- ✅ Multiple requests can queue +- ✅ File locking prevents corruption +- ✅ All tests pass (unit + integration) +- ✅ Documentation updated +- ✅ Backward compatible (Phase 1/2 still work) + +--- + +## Estimated Timeline + +| Task | Time | Dependencies | +|------|------|--------------| +| ApprovalManager core | 2-3 hours | None | +| Modify bash_security_hook | 1 hour | ApprovalManager | +| File locking + persist | 1-2 hours | None | +| CLI approval TUI | 3-4 hours | ApprovalManager | +| React components | 4-5 hours | None | +| WebSocket integration | 2-3 hours | React components | +| Unit tests | 3-4 hours | All core features | +| Integration tests | 2-3 hours | Full implementation | +| Documentation | 2-3 hours | None | +| Manual testing + polish | 4-6 hours | Full implementation | + +**Total: 24-36 hours (3-4.5 days)** + +--- + +## Getting Started + +To implement Phase 3: + +1. **Read this document fully** +2. **Review Phase 1 & 2 code** (`security.py`, `client.py`) +3. **Run existing tests** to understand current behavior +4. **Start with ApprovalManager** (core functionality) +5. **Add file locking** (critical for safety) +6. **Choose CLI or UI** (whichever you're more comfortable with) +7. **Write tests as you go** (don't leave for end) +8. **Manual test frequently** (approval UX needs polish) + +Good luck! 🚀 + +--- + +**Document Version:** 1.0 +**Last Updated:** 2026-01-22 +**Author:** Phase 1 & 2 implementation team +**Status:** Ready for implementation diff --git a/client.py b/client.py index ef7dc34..173abce 100644 --- a/client.py +++ b/client.py @@ -261,6 +261,14 @@ def create_client( if "ANTHROPIC_BASE_URL" in sdk_env: print(f" - GLM Mode: Using {sdk_env['ANTHROPIC_BASE_URL']}") + # Create a wrapper for bash_security_hook that passes project_dir via context + async def bash_hook_with_context(input_data, tool_use_id=None, context=None): + """Wrapper that injects project_dir into context for security hook.""" + if context is None: + context = {} + context["project_dir"] = str(project_dir.resolve()) + return await bash_security_hook(input_data, tool_use_id, context) + return ClaudeSDKClient( options=ClaudeAgentOptions( model=model, @@ -272,7 +280,7 @@ def create_client( mcp_servers=mcp_servers, hooks={ "PreToolUse": [ - HookMatcher(matcher="Bash", hooks=[bash_security_hook]), + HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]), ], }, max_turns=1000, diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 0000000..b995f5e --- /dev/null +++ b/examples/README.md @@ -0,0 +1,531 @@ +# AutoCoder Security Configuration Examples + +This directory contains example configuration files for controlling which bash commands the autonomous coding agent can execute. + +## Table of Contents + +- [Quick Start](#quick-start) +- [Project-Level Configuration](#project-level-configuration) +- [Organization-Level Configuration](#organization-level-configuration) +- [Command Hierarchy](#command-hierarchy) +- [Pattern Matching](#pattern-matching) +- [Common Use Cases](#common-use-cases) +- [Security Best Practices](#security-best-practices) + +--- + +## Quick Start + +### For a Single Project (Most Common) + +When you create a new project with AutoCoder, it automatically creates: + +``` +my-project/ + .autocoder/ + allowed_commands.yaml ← Automatically created from template +``` + +**Edit this file** to add project-specific commands (Swift tools, Rust compiler, etc.). + +### For All Projects (Organization-Wide) + +If you want commands available across **all projects**, manually create: + +```bash +# Copy the example to your home directory +cp examples/org_config.yaml ~/.autocoder/config.yaml + +# Edit it to add org-wide commands +nano ~/.autocoder/config.yaml +``` + +--- + +## Project-Level Configuration + +**File:** `{project_dir}/.autocoder/allowed_commands.yaml` + +**Purpose:** Define commands needed for THIS specific project. + +**Example** (iOS project): + +```yaml +version: 1 +commands: + - name: swift + description: Swift compiler + + - name: xcodebuild + description: Xcode build system + + - name: swift* + description: All Swift tools (swiftc, swiftlint, swiftformat) + + - name: ./scripts/build.sh + description: Project build script +``` + +**When to use:** +- ✅ Project uses a specific language toolchain (Swift, Rust, Go) +- ✅ Project has custom build scripts +- ✅ Temporary tools needed during development + +**Limits:** +- Maximum 50 commands per project +- Cannot override org-level blocked commands +- Cannot allow hardcoded blocklist commands (sudo, dd, etc.) + +**See:** `examples/project_allowed_commands.yaml` for full example with Rust, Python, iOS, etc. + +--- + +## Organization-Level Configuration + +**File:** `~/.autocoder/config.yaml` + +**Purpose:** Define commands and policies for ALL projects. + +**Example** (startup team): + +```yaml +version: 1 + +# Available to all projects +allowed_commands: + - name: jq + description: JSON processor + + - name: python3 + description: Python interpreter + +# Blocked across all projects (cannot be overridden) +blocked_commands: + - aws + - kubectl + - terraform +``` + +**When to use:** +- ✅ Multiple projects need the same tools (jq, python3, etc.) +- ✅ Enforce organization-wide security policies +- ✅ Block dangerous commands across all projects + +**See:** `examples/org_config.yaml` for full example with enterprise/startup configurations. + +--- + +## Command Hierarchy + +When the agent tries to run a command, the system checks in this order: + +``` +┌─────────────────────────────────────────────────────┐ +│ 1. HARDCODED BLOCKLIST (highest priority) │ +│ sudo, dd, shutdown, reboot, chown, etc. │ +│ ❌ NEVER allowed, even with user approval │ +└─────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 2. ORG BLOCKLIST (~/.autocoder/config.yaml) │ +│ Commands you block organization-wide │ +│ ❌ Projects CANNOT override these │ +└─────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 3. ORG ALLOWLIST (~/.autocoder/config.yaml) │ +│ Commands available to all projects │ +│ ✅ Automatically available │ +└─────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 4. GLOBAL ALLOWLIST (security.py) │ +│ Default commands: npm, git, curl, ls, cat, etc. │ +│ ✅ Always available │ +└─────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 5. PROJECT ALLOWLIST (.autocoder/allowed_commands) │ +│ Project-specific commands │ +│ ✅ Available only to this project │ +└─────────────────────────────────────────────────────┘ +``` + +**Key Rules:** +- If a command is BLOCKED at any level above, it cannot be allowed below +- If a command is ALLOWED at any level, it's available (unless blocked above) +- Blocklist always wins over allowlist + +--- + +## Pattern Matching + +You can use patterns to match multiple commands: + +### Exact Match +```yaml +- name: swift + description: Swift compiler only +``` +Matches: `swift` +Does NOT match: `swiftc`, `swiftlint` + +### Prefix Wildcard +```yaml +- name: swift* + description: All Swift tools +``` +Matches: `swift`, `swiftc`, `swiftlint`, `swiftformat` +Does NOT match: `npm`, `rustc` + +### Local Scripts +```yaml +- name: ./scripts/build.sh + description: Build script +``` +Matches: +- `./scripts/build.sh` +- `scripts/build.sh` +- `/full/path/to/scripts/build.sh` +- Running `build.sh` from any directory (matched by filename) + +--- + +## Common Use Cases + +### iOS Development + +**Project config** (`.autocoder/allowed_commands.yaml`): +```yaml +version: 1 +commands: + - name: swift* + description: All Swift tools + - name: xcodebuild + description: Xcode build system + - name: xcrun + description: Xcode tools runner + - name: simctl + description: iOS Simulator control +``` + +### Rust CLI Project + +**Project config**: +```yaml +version: 1 +commands: + - name: cargo + description: Rust package manager + - name: rustc + description: Rust compiler + - name: rustfmt + description: Rust formatter + - name: clippy + description: Rust linter + - name: ./target/debug/my-cli + description: Debug build + - name: ./target/release/my-cli + description: Release build +``` + +### API Testing Project + +**Project config**: +```yaml +version: 1 +commands: + - name: jq + description: JSON processor + - name: httpie + description: HTTP client + - name: ./scripts/test-api.sh + description: API test runner +``` + +### Enterprise Organization (Restrictive) + +**Org config** (`~/.autocoder/config.yaml`): +```yaml +version: 1 + +allowed_commands: + - name: jq + description: JSON processor + +blocked_commands: + - aws # No cloud access + - gcloud + - az + - kubectl # No k8s access + - terraform # No infrastructure changes + - psql # No production DB access + - mysql +``` + +### Startup Team (Permissive) + +**Org config** (`~/.autocoder/config.yaml`): +```yaml +version: 1 + +allowed_commands: + - name: python3 + description: Python interpreter + - name: jq + description: JSON processor + - name: pytest + description: Python tests + +blocked_commands: [] # Rely on hardcoded blocklist only +``` + +--- + +## Security Best Practices + +### ✅ DO + +1. **Start restrictive, add as needed** + - Begin with default commands only + - Add project-specific tools when required + - Review the agent's blocked command errors to understand what's needed + +2. **Use org-level config for shared tools** + - If 3+ projects need `jq`, add it to org config + - Reduces duplication across project configs + +3. **Block dangerous commands at org level** + - Prevent accidental production deployments (`kubectl`, `terraform`) + - Block cloud CLIs if appropriate (`aws`, `gcloud`, `az`) + +4. **Use descriptive command names** + - Good: `description: "Swift compiler for iOS builds"` + - Bad: `description: "Compiler"` + +5. **Prefer patterns for tool families** + - `swift*` instead of listing `swift`, `swiftc`, `swiftlint` separately + - Automatically includes future tools (e.g., new Swift utilities) + +### ❌ DON'T + +1. **Don't add commands "just in case"** + - Only add when the agent actually needs them + - Empty config is fine - defaults are usually enough + +2. **Don't try to allow blocklisted commands** + - Commands like `sudo`, `dd`, `shutdown` can NEVER be allowed + - The system will reject these in validation + +3. **Don't use org config for project-specific tools** + - Bad: Adding `xcodebuild` to org config when only one project uses it + - Good: Add `xcodebuild` to that project's config + +4. **Don't exceed the 50 command limit per project** + - If you need more, you're probably being too specific + - Use wildcards instead: `npm-*` covers many npm tools + +5. **Don't ignore validation errors** + - If your YAML is rejected, fix the structure + - Common issues: missing `version`, malformed lists, over 50 commands + +--- + +## Default Allowed Commands + +These commands are **always available** to all projects: + +**File Operations:** +- `ls`, `cat`, `head`, `tail`, `wc`, `grep`, `cp`, `mkdir`, `mv`, `rm`, `touch` + +**Shell:** +- `pwd`, `echo`, `sh`, `bash`, `sleep` + +**Version Control:** +- `git` + +**Process Management:** +- `ps`, `lsof`, `kill`, `pkill` (dev processes only: node, npm, vite) + +**Network:** +- `curl` + +**Node.js:** +- `npm`, `npx`, `pnpm`, `node` + +**Docker:** +- `docker` + +**Special:** +- `chmod` (only `+x` mode for making scripts executable) + +--- + +## Hardcoded Blocklist + +These commands are **NEVER allowed**, even with user approval: + +**Disk Operations:** +- `dd`, `mkfs`, `fdisk`, `parted` + +**System Control:** +- `shutdown`, `reboot`, `poweroff`, `halt`, `init` + +**Privilege Escalation:** +- `sudo`, `su`, `doas` + +**System Services:** +- `systemctl`, `service`, `launchctl` + +**Network Security:** +- `iptables`, `ufw` + +**Ownership Changes:** +- `chown`, `chgrp` + +**Dangerous Commands** (Phase 3 will add approval): +- `aws`, `gcloud`, `az`, `kubectl`, `docker-compose` + +--- + +## Troubleshooting + +### Error: "Command 'X' is not allowed" + +**Solution:** Add the command to your project config: +```yaml +# In .autocoder/allowed_commands.yaml +commands: + - name: X + description: What this command does +``` + +### Error: "Command 'X' is blocked at organization level" + +**Cause:** The command is in the org blocklist or hardcoded blocklist. + +**Solution:** +- If in org blocklist: Edit `~/.autocoder/config.yaml` to remove it +- If in hardcoded blocklist: Cannot be allowed (by design) + +### Error: "Could not parse YAML config" + +**Cause:** YAML syntax error. + +**Solution:** Check for: +- Missing colons after keys +- Incorrect indentation (use 2 spaces, not tabs) +- Missing quotes around special characters + +### Config not taking effect + +**Solution:** +1. Restart the agent (changes are loaded on startup) +2. Verify file location: + - Project: `{project}/.autocoder/allowed_commands.yaml` + - Org: `~/.autocoder/config.yaml` (must be manually created) +3. Check YAML is valid (run through a YAML validator) + +--- + +## Testing + +### Running the Tests + +AutoCoder has comprehensive tests for the security system: + +**Unit Tests** (136 tests - fast): +```bash +source venv/bin/activate +python test_security.py +``` + +Tests: +- Pattern matching (exact, wildcards, scripts) +- YAML loading and validation +- Blocklist enforcement +- Project and org config hierarchy +- All existing security validations + +**Integration Tests** (9 tests - uses real security hooks): +```bash +source venv/bin/activate +python test_security_integration.py +``` + +Tests: +- Blocked commands are rejected (sudo, shutdown, etc.) +- Default commands work (ls, git, npm, etc.) +- Non-allowed commands are blocked (wget, python, etc.) +- Project config allows commands (swift, xcodebuild, etc.) +- Pattern matching works (swift* matches swiftlint) +- Org blocklist cannot be overridden +- Org allowlist is inherited by projects +- Invalid YAML is safely ignored +- 50 command limit is enforced + +### Manual Testing + +To manually test the security system: + +**1. Create a test project:** +```bash +python start.py +# Choose "Create new project" +# Name it "security-test" +``` + +**2. Edit the project config:** +```bash +# Navigate to the project directory +cd path/to/security-test + +# Edit the config +nano .autocoder/allowed_commands.yaml +``` + +**3. Add a test command (e.g., Swift):** +```yaml +version: 1 +commands: + - name: swift + description: Swift compiler +``` + +**4. Run the agent and observe:** +- Try a blocked command: `"Run sudo apt install nginx"` → Should be blocked +- Try an allowed command: `"Run ls -la"` → Should work +- Try your config command: `"Run swift --version"` → Should work +- Try a non-allowed command: `"Run wget https://example.com"` → Should be blocked + +**5. Check the agent output:** + +The agent will show security hook messages like: +``` +Command 'sudo' is blocked at organization level and cannot be approved. +``` + +Or: +``` +Command 'wget' is not allowed. +To allow this command: + 1. Add to .autocoder/allowed_commands.yaml for this project, OR + 2. Request mid-session approval (the agent can ask) +``` + +--- + +## Files Reference + +- **`examples/project_allowed_commands.yaml`** - Full project config template +- **`examples/org_config.yaml`** - Full org config template +- **`security.py`** - Implementation and hardcoded blocklist +- **`test_security.py`** - Unit tests (136 tests) +- **`test_security_integration.py`** - Integration tests (9 tests) +- **`CLAUDE.md`** - Full system documentation + +--- + +## Questions? + +See the main documentation in `CLAUDE.md` for architecture details and implementation specifics. diff --git a/examples/org_config.yaml b/examples/org_config.yaml new file mode 100644 index 0000000..f86d9f3 --- /dev/null +++ b/examples/org_config.yaml @@ -0,0 +1,172 @@ +# Organization-Level AutoCoder Configuration +# ============================================ +# Location: ~/.autocoder/config.yaml +# +# IMPORTANT: This file is OPTIONAL and must be manually created by you. +# It does NOT exist by default. +# +# Org-level config applies to ALL projects and provides: +# 1. Organization-wide allowed commands (available to all projects) +# 2. Organization-wide blocked commands (cannot be overridden by projects) +# 3. Global settings (approval timeout, etc.) +# +# Use this to: +# - Add commands that ALL your projects need (jq, python3, etc.) +# - Block dangerous commands across ALL projects (aws, kubectl, etc.) +# - Enforce organization-wide security policies + +version: 1 + + +# ========================================== +# Organization-Wide Allowed Commands +# ========================================== +# These commands become available to ALL projects automatically. +# Projects don't need to add them to their own .autocoder/allowed_commands.yaml +# +# By default, this is empty. Uncomment and add commands as needed. + +allowed_commands: [] + + # Common development utilities + # - name: jq + # description: JSON processor for API responses + + # - name: python3 + # description: Python 3 interpreter + + # - name: pip3 + # description: Python package installer + + # - name: pytest + # description: Python testing framework + + # - name: black + # description: Python code formatter + + # Database CLIs (if safe in your environment) + # - name: psql + # description: PostgreSQL client + + # - name: mysql + # description: MySQL client + + +# ========================================== +# Organization-Wide Blocked Commands +# ========================================== +# Commands listed here are BLOCKED across ALL projects. +# Projects CANNOT override these blocks - this is the final word. +# +# Use this to enforce security policies, such as: +# - Preventing accidental production deployments +# - Blocking cloud CLI tools to avoid infrastructure changes +# - Preventing access to production databases +# +# By default, this is empty. Uncomment commands you want to block. + +blocked_commands: [] + + # Block cloud CLIs to prevent accidental production changes + # - aws + # - gcloud + # - az + + # Block container orchestration to prevent production deployments + # - kubectl + # - docker-compose + + # Block infrastructure-as-code tools + # - terraform + # - pulumi + + # Block database CLIs to prevent production data access + # - psql + # - mysql + # - mongosh + + # Block other potentially dangerous tools + # - ansible + # - chef + # - puppet + + +# ========================================== +# Global Settings (Phase 3 feature) +# ========================================== +# These settings control approval behavior when agents request +# commands that aren't in the allowlist. + +# How long to wait for user approval before denying a command request +approval_timeout_minutes: 5 + + +# ========================================== +# Command Hierarchy (for reference) +# ========================================== +# When the agent tries to run a bash command, the system checks in this order: +# +# 1. Hardcoded Blocklist (in security.py) - HIGHEST PRIORITY +# Commands like: sudo, dd, shutdown, reboot, etc. +# These can NEVER be allowed, even with user approval. +# +# 2. Org Blocked Commands (this file) +# Commands you specify in "blocked_commands:" above. +# Projects cannot override these. +# +# 3. Org Allowed Commands (this file) +# Commands you specify in "allowed_commands:" above. +# Available to all projects automatically. +# +# 4. Global Allowed Commands (in security.py) +# Default commands: npm, git, curl, ls, cat, etc. +# Always available to all projects. +# +# 5. Project Allowed Commands (.autocoder/allowed_commands.yaml) +# Project-specific commands defined in each project. +# LOWEST PRIORITY (can't override blocks above). +# +# If a command is in BOTH allowed and blocked lists, BLOCKED wins. + + +# ========================================== +# Example Configurations by Organization Type +# ========================================== + +# Startup / Small Team (permissive): +# allowed_commands: +# - name: python3 +# - name: jq +# blocked_commands: [] # Empty - rely on hardcoded blocklist only + +# Enterprise / Regulated (restrictive): +# allowed_commands: [] # Empty - projects must explicitly request each tool +# blocked_commands: +# - aws +# - gcloud +# - az +# - kubectl +# - terraform +# - psql +# - mysql +# - mongosh + +# Development Team (balanced): +# allowed_commands: +# - name: jq +# - name: python3 +# - name: pytest +# blocked_commands: +# - aws # Block production access +# - kubectl # Block deployments +# - terraform + + +# ========================================== +# To Create This File +# ========================================== +# 1. Copy this example to: ~/.autocoder/config.yaml +# 2. Uncomment and customize the sections you need +# 3. Leave empty lists if you don't need org-level controls +# +# To learn more, see: examples/README.md diff --git a/examples/project_allowed_commands.yaml b/examples/project_allowed_commands.yaml new file mode 100644 index 0000000..48b8bd6 --- /dev/null +++ b/examples/project_allowed_commands.yaml @@ -0,0 +1,139 @@ +# Project-Specific Allowed Commands +# ================================== +# Location: {project_dir}/.autocoder/allowed_commands.yaml +# +# This file defines bash commands that the autonomous coding agent can use +# for THIS SPECIFIC PROJECT, beyond the default allowed commands. +# +# When you create a new project, AutoCoder automatically creates this file +# in your project's .autocoder/ directory. You can customize it for your +# project's specific needs (iOS, Rust, Python, etc.). + +version: 1 + +# Uncomment the commands you need for your specific project. +# By default, this file has NO commands enabled - you must explicitly add them. + +commands: [] + + # ========================================== + # iOS Development Example + # ========================================== + # Uncomment these if building an iOS app: + + # - name: xcodebuild + # description: Xcode build system for compiling iOS apps + + # - name: swift + # description: Swift compiler and REPL + + # - name: swiftc + # description: Swift compiler command-line interface + + # - name: xcrun + # description: Run Xcode developer tools + + # - name: simctl + # description: iOS Simulator control tool + + # Pattern matching with wildcard + # This matches: swift, swiftc, swiftformat, swiftlint, etc. + # - name: swift* + # description: All Swift development tools + + + # ========================================== + # Rust Development Example + # ========================================== + # Uncomment these if building a Rust project: + + # - name: cargo + # description: Rust package manager and build tool + + # - name: rustc + # description: Rust compiler + + # - name: rustfmt + # description: Rust code formatter + + # - name: clippy + # description: Rust linter + + + # ========================================== + # Python Development Example + # ========================================== + # Uncomment these if building a Python project: + + # - name: python3 + # description: Python 3 interpreter + + # - name: pip3 + # description: Python package installer + + # - name: pytest + # description: Python testing framework + + + # ========================================== + # Database Tools Example + # ========================================== + # Uncomment these if you need database access: + + # - name: psql + # description: PostgreSQL command-line client + + # - name: sqlite3 + # description: SQLite database CLI + + + # ========================================== + # Project-Specific Scripts + # ========================================== + # Local scripts are matched by filename, so these work from any directory + # Uncomment and customize for your project: + + # - name: ./scripts/build.sh + # description: Project build script + + # - name: ./scripts/test.sh + # description: Run all project tests + + # - name: ./scripts/deploy-staging.sh + # description: Deploy to staging environment + + +# ========================================== +# Notes and Best Practices +# ========================================== +# +# Pattern Matching: +# - Exact: "swift" matches only "swift" +# - Wildcard: "swift*" matches "swift", "swiftc", "swiftlint", etc. +# - Scripts: "./scripts/build.sh" matches the script by name +# +# Limits: +# - Maximum 50 commands per project +# - Commands in the blocklist (sudo, dd, shutdown, etc.) can NEVER be allowed +# - Org-level blocked commands (see ~/.autocoder/config.yaml) cannot be overridden +# +# Default Allowed Commands (always available): +# File operations: ls, cat, head, tail, wc, grep, cp, mkdir, mv, rm, touch +# Shell: pwd, echo, sh, bash, sleep +# Version control: git +# Process management: ps, lsof, kill, pkill (dev processes only) +# Network: curl +# Node.js: npm, npx, pnpm, node +# Docker: docker +# chmod: Only +x mode (making scripts executable) +# +# Hardcoded Blocklist (NEVER allowed): +# Disk operations: dd, mkfs, fdisk, parted +# System control: shutdown, reboot, poweroff, halt, init +# Privilege escalation: sudo, su, doas +# System services: systemctl, service, launchctl +# Network security: iptables, ufw +# Ownership changes: chown, chgrp +# Dangerous commands: aws, gcloud, az, kubectl (unless org allows) +# +# To learn more, see: examples/README.md diff --git a/prompts.py b/prompts.py index ad76ff0..b03bd47 100644 --- a/prompts.py +++ b/prompts.py @@ -180,6 +180,10 @@ def scaffold_project_prompts(project_dir: Path) -> Path: project_prompts = get_project_prompts_dir(project_dir) project_prompts.mkdir(parents=True, exist_ok=True) + # Create .autocoder directory for configuration files + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir(parents=True, exist_ok=True) + # Define template mappings: (source_template, destination_name) templates = [ ("app_spec.template.txt", "app_spec.txt"), @@ -201,8 +205,19 @@ def scaffold_project_prompts(project_dir: Path) -> Path: except (OSError, PermissionError) as e: print(f" Warning: Could not copy {dest_name}: {e}") + # Copy allowed_commands.yaml template to .autocoder/ + examples_dir = Path(__file__).parent / "examples" + allowed_commands_template = examples_dir / "project_allowed_commands.yaml" + allowed_commands_dest = autocoder_dir / "allowed_commands.yaml" + if allowed_commands_template.exists() and not allowed_commands_dest.exists(): + try: + shutil.copy(allowed_commands_template, allowed_commands_dest) + copied_files.append(".autocoder/allowed_commands.yaml") + except (OSError, PermissionError) as e: + print(f" Warning: Could not copy allowed_commands.yaml: {e}") + if copied_files: - print(f" Created prompt files: {', '.join(copied_files)}") + print(f" Created project files: {', '.join(copied_files)}") return project_prompts diff --git a/requirements.txt b/requirements.txt index 6e32cdb..9cf420e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -9,6 +9,7 @@ psutil>=6.0.0 aiofiles>=24.0.0 apscheduler>=3.10.0,<4.0.0 pywinpty>=2.0.0; sys_platform == "win32" +pyyaml>=6.0.0 # Dev dependencies ruff>=0.8.0 diff --git a/security.py b/security.py index 4e03117..0b8fd57 100644 --- a/security.py +++ b/security.py @@ -8,6 +8,10 @@ Uses an allowlist approach - only explicitly permitted commands can run. import os import shlex +from pathlib import Path +from typing import Optional + +import yaml # Allowed commands for development tasks # Minimal set needed for the autonomous coding demo @@ -58,6 +62,48 @@ ALLOWED_COMMANDS = { # Commands that need additional validation even when in the allowlist COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"} +# Commands that are NEVER allowed, even with user approval +# These commands can cause permanent system damage or security breaches +BLOCKED_COMMANDS = { + # Disk operations + "dd", + "mkfs", + "fdisk", + "parted", + # System control + "shutdown", + "reboot", + "poweroff", + "halt", + "init", + # Ownership changes + "chown", + "chgrp", + # System services + "systemctl", + "service", + "launchctl", + # Network security + "iptables", + "ufw", +} + +# Commands that trigger emphatic warnings but CAN be approved (Phase 3) +# For now, these are blocked like BLOCKED_COMMANDS until Phase 3 implements approval +DANGEROUS_COMMANDS = { + # Privilege escalation + "sudo", + "su", + "doas", + # Cloud CLIs (can modify production infrastructure) + "aws", + "gcloud", + "az", + # Container and orchestration + "kubectl", + "docker-compose", +} + def split_command_segments(command_string: str) -> list[str]: """ @@ -309,16 +355,298 @@ def get_command_for_validation(cmd: str, segments: list[str]) -> str: return "" +def matches_pattern(command: str, pattern: str) -> bool: + """ + Check if a command matches a pattern. + + Supports: + - Exact match: "swift" + - Prefix wildcard: "swift*" matches "swift", "swiftc", "swiftformat" + - Local script paths: "./scripts/build.sh" or "scripts/test.sh" + + Args: + command: The command to check + pattern: The pattern to match against + + Returns: + True if command matches pattern + """ + # Exact match + if command == pattern: + return True + + # Prefix wildcard (e.g., "swift*" matches "swiftc", "swiftlint") + if pattern.endswith("*"): + prefix = pattern[:-1] + return command.startswith(prefix) + + # Local script paths (./scripts/build.sh matches build.sh) + if pattern.startswith("./") or pattern.startswith("../"): + # Extract the script name from the pattern + pattern_name = os.path.basename(pattern) + return command == pattern or command == pattern_name or command.endswith("/" + pattern_name) + + return False + + +def get_org_config_path() -> Path: + """ + Get the organization-level config file path. + + Returns: + Path to ~/.autocoder/config.yaml + """ + return Path.home() / ".autocoder" / "config.yaml" + + +def load_org_config() -> Optional[dict]: + """ + Load organization-level config from ~/.autocoder/config.yaml. + + Returns: + Dict with parsed org config, or None if file doesn't exist or is invalid + """ + config_path = get_org_config_path() + + if not config_path.exists(): + return None + + try: + with open(config_path, "r", encoding="utf-8") as f: + config = yaml.safe_load(f) + + if not config: + return None + + # Validate structure + if not isinstance(config, dict): + return None + + if "version" not in config: + return None + + # Validate allowed_commands if present + if "allowed_commands" in config: + allowed = config["allowed_commands"] + if not isinstance(allowed, list): + return None + for cmd in allowed: + if not isinstance(cmd, dict): + return None + if "name" not in cmd: + return None + + # Validate blocked_commands if present + if "blocked_commands" in config: + blocked = config["blocked_commands"] + if not isinstance(blocked, list): + return None + for cmd in blocked: + if not isinstance(cmd, str): + return None + + return config + + except (yaml.YAMLError, IOError, OSError): + return None + + +def load_project_commands(project_dir: Path) -> Optional[dict]: + """ + Load allowed commands from project-specific YAML config. + + Args: + project_dir: Path to the project directory + + Returns: + Dict with parsed YAML config, or None if file doesn't exist or is invalid + """ + config_path = project_dir / ".autocoder" / "allowed_commands.yaml" + + if not config_path.exists(): + return None + + try: + with open(config_path, "r", encoding="utf-8") as f: + config = yaml.safe_load(f) + + if not config: + return None + + # Validate structure + if not isinstance(config, dict): + return None + + if "version" not in config: + return None + + commands = config.get("commands", []) + if not isinstance(commands, list): + return None + + # Enforce 50 command limit + if len(commands) > 50: + return None + + # Validate each command entry + for cmd in commands: + if not isinstance(cmd, dict): + return None + if "name" not in cmd: + return None + # Validate name is a string + if not isinstance(cmd["name"], str): + return None + + return config + + except (yaml.YAMLError, IOError, OSError): + return None + + +def validate_project_command(cmd_config: dict) -> tuple[bool, str]: + """ + Validate a single command entry from project config. + + Args: + cmd_config: Dict with command configuration (name, description, args) + + Returns: + Tuple of (is_valid, error_message) + """ + if not isinstance(cmd_config, dict): + return False, "Command must be a dict" + + if "name" not in cmd_config: + return False, "Command must have 'name' field" + + name = cmd_config["name"] + if not isinstance(name, str) or not name: + return False, "Command name must be a non-empty string" + + # Check if command is in the blocklist or dangerous commands + base_cmd = os.path.basename(name.rstrip("*")) + if base_cmd in BLOCKED_COMMANDS: + return False, f"Command '{name}' is in the blocklist and cannot be allowed" + if base_cmd in DANGEROUS_COMMANDS: + return False, f"Command '{name}' is in the blocklist and cannot be allowed" + + # Description is optional + if "description" in cmd_config and not isinstance(cmd_config["description"], str): + return False, "Description must be a string" + + # Args validation (Phase 1 - just check structure) + if "args" in cmd_config: + args = cmd_config["args"] + if not isinstance(args, list): + return False, "Args must be a list" + for arg in args: + if not isinstance(arg, str): + return False, "Each arg must be a string" + + return True, "" + + +def get_effective_commands(project_dir: Optional[Path]) -> tuple[set[str], set[str]]: + """ + Get effective allowed and blocked commands after hierarchy resolution. + + Hierarchy (highest to lowest priority): + 1. BLOCKED_COMMANDS (hardcoded) - always blocked + 2. Org blocked_commands - cannot be unblocked + 3. Org allowed_commands - adds to global + 4. Project allowed_commands - adds to global + org + + Args: + project_dir: Path to the project directory, or None + + Returns: + Tuple of (allowed_commands, blocked_commands) + """ + # Start with global allowed commands + allowed = ALLOWED_COMMANDS.copy() + blocked = BLOCKED_COMMANDS.copy() + + # Add dangerous commands to blocked (Phase 3 will add approval flow) + blocked |= DANGEROUS_COMMANDS + + # Load org config and apply + org_config = load_org_config() + if org_config: + # Add org-level blocked commands (cannot be overridden) + org_blocked = org_config.get("blocked_commands", []) + blocked |= set(org_blocked) + + # Add org-level allowed commands + for cmd_config in org_config.get("allowed_commands", []): + if isinstance(cmd_config, dict) and "name" in cmd_config: + allowed.add(cmd_config["name"]) + + # Load project config and apply + if project_dir: + project_config = load_project_commands(project_dir) + if project_config: + # Add project-specific commands + for cmd_config in project_config.get("commands", []): + valid, error = validate_project_command(cmd_config) + if valid: + allowed.add(cmd_config["name"]) + + # Remove blocked commands from allowed (blocklist takes precedence) + allowed -= blocked + + return allowed, blocked + + +def get_project_allowed_commands(project_dir: Optional[Path]) -> set[str]: + """ + Get the set of allowed commands for a project. + + Uses hierarchy resolution from get_effective_commands(). + + Args: + project_dir: Path to the project directory, or None + + Returns: + Set of allowed command names (including patterns) + """ + allowed, blocked = get_effective_commands(project_dir) + return allowed + + +def is_command_allowed(command: str, allowed_commands: set[str]) -> bool: + """ + Check if a command is allowed (supports patterns). + + Args: + command: The command to check + allowed_commands: Set of allowed commands (may include patterns) + + Returns: + True if command is allowed + """ + # Check exact match first + if command in allowed_commands: + return True + + # Check pattern matches + for pattern in allowed_commands: + if matches_pattern(command, pattern): + return True + + return False + + async def bash_security_hook(input_data, tool_use_id=None, context=None): """ Pre-tool-use hook that validates bash commands using an allowlist. - Only commands in ALLOWED_COMMANDS are permitted. + Only commands in ALLOWED_COMMANDS and project-specific commands are permitted. Args: input_data: Dict containing tool_name and tool_input tool_use_id: Optional tool use ID - context: Optional context + context: Optional context dict with 'project_dir' key Returns: Empty dict to allow, or {"decision": "block", "reason": "..."} to block @@ -340,15 +668,39 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None): "reason": f"Could not parse command for security validation: {command}", } + # Get project directory from context + project_dir = None + if context and isinstance(context, dict): + project_dir_str = context.get("project_dir") + if project_dir_str: + project_dir = Path(project_dir_str) + + # Get effective commands using hierarchy resolution + allowed_commands, blocked_commands = get_effective_commands(project_dir) + # Split into segments for per-command validation segments = split_command_segments(command) - # Check each command against the allowlist + # Check each command against the blocklist and allowlist for cmd in commands: - if cmd not in ALLOWED_COMMANDS: + # Check blocklist first (highest priority) + if cmd in blocked_commands: return { "decision": "block", - "reason": f"Command '{cmd}' is not in the allowed commands list", + "reason": f"Command '{cmd}' is blocked at organization level and cannot be approved.", + } + + # Check allowlist (with pattern matching) + if not is_command_allowed(cmd, allowed_commands): + # Provide helpful error message with config hint + error_msg = f"Command '{cmd}' is not allowed.\n" + error_msg += "To allow this command:\n" + error_msg += " 1. Add to .autocoder/allowed_commands.yaml for this project, OR\n" + error_msg += " 2. Request mid-session approval (the agent can ask)\n" + error_msg += "Note: Some commands are blocked at org-level and cannot be overridden." + return { + "decision": "block", + "reason": error_msg, } # Additional validation for sensitive commands diff --git a/test_security.py b/test_security.py index 6788a6d..27a1521 100644 --- a/test_security.py +++ b/test_security.py @@ -9,12 +9,19 @@ Run with: python test_security.py import asyncio import sys +import tempfile +from pathlib import Path from security import ( bash_security_hook, extract_commands, + get_effective_commands, + load_org_config, + load_project_commands, + matches_pattern, validate_chmod_command, validate_init_script, + validate_project_command, ) @@ -151,6 +158,440 @@ def test_validate_init_script(): return passed, failed +def test_pattern_matching(): + """Test command pattern matching.""" + print("\nTesting pattern matching:\n") + passed = 0 + failed = 0 + + # Test cases: (command, pattern, should_match, description) + test_cases = [ + # Exact matches + ("swift", "swift", True, "exact match"), + ("npm", "npm", True, "exact npm"), + ("xcodebuild", "xcodebuild", True, "exact xcodebuild"), + + # Prefix wildcards + ("swiftc", "swift*", True, "swiftc matches swift*"), + ("swiftlint", "swift*", True, "swiftlint matches swift*"), + ("swiftformat", "swift*", True, "swiftformat matches swift*"), + ("swift", "swift*", True, "swift matches swift*"), + ("npm", "swift*", False, "npm doesn't match swift*"), + + # Local script paths + ("build.sh", "./scripts/build.sh", True, "script name matches path"), + ("./scripts/build.sh", "./scripts/build.sh", True, "exact script path"), + ("scripts/build.sh", "./scripts/build.sh", True, "relative script path"), + ("/abs/path/scripts/build.sh", "./scripts/build.sh", True, "absolute path matches"), + ("test.sh", "./scripts/build.sh", False, "different script name"), + + # Non-matches + ("go", "swift*", False, "go doesn't match swift*"), + ("rustc", "swift*", False, "rustc doesn't match swift*"), + ] + + for command, pattern, should_match, description in test_cases: + result = matches_pattern(command, pattern) + if result == should_match: + print(f" PASS: {command!r} vs {pattern!r} ({description})") + passed += 1 + else: + expected = "match" if should_match else "no match" + actual = "match" if result else "no match" + print(f" FAIL: {command!r} vs {pattern!r} ({description})") + print(f" Expected: {expected}, Got: {actual}") + failed += 1 + + return passed, failed + + +def test_yaml_loading(): + """Test YAML config loading and validation.""" + print("\nTesting YAML loading:\n") + passed = 0 + failed = 0 + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + + # Test 1: Valid YAML + config_path = autocoder_dir / "allowed_commands.yaml" + config_path.write_text("""version: 1 +commands: + - name: swift + description: Swift compiler + - name: xcodebuild + description: Xcode build + - name: swift* + description: All Swift tools +""") + config = load_project_commands(project_dir) + if config and config["version"] == 1 and len(config["commands"]) == 3: + print(" PASS: Load valid YAML") + passed += 1 + else: + print(" FAIL: Load valid YAML") + print(f" Got: {config}") + failed += 1 + + # Test 2: Missing file returns None + (project_dir / ".autocoder" / "allowed_commands.yaml").unlink() + config = load_project_commands(project_dir) + if config is None: + print(" PASS: Missing file returns None") + passed += 1 + else: + print(" FAIL: Missing file returns None") + print(f" Got: {config}") + failed += 1 + + # Test 3: Invalid YAML returns None + config_path.write_text("invalid: yaml: content:") + config = load_project_commands(project_dir) + if config is None: + print(" PASS: Invalid YAML returns None") + passed += 1 + else: + print(" FAIL: Invalid YAML returns None") + print(f" Got: {config}") + failed += 1 + + # Test 4: Over limit (50 commands) + commands = [f" - name: cmd{i}\n description: Command {i}" for i in range(51)] + config_path.write_text("version: 1\ncommands:\n" + "\n".join(commands)) + config = load_project_commands(project_dir) + if config is None: + print(" PASS: Over limit rejected") + passed += 1 + else: + print(" FAIL: Over limit rejected") + print(f" Got: {config}") + failed += 1 + + return passed, failed + + +def test_command_validation(): + """Test project command validation.""" + print("\nTesting command validation:\n") + passed = 0 + failed = 0 + + # Test cases: (cmd_config, should_be_valid, description) + test_cases = [ + # Valid commands + ({"name": "swift", "description": "Swift compiler"}, True, "valid command"), + ({"name": "swift"}, True, "command without description"), + ({"name": "swift*", "description": "All Swift tools"}, True, "pattern command"), + ({"name": "./scripts/build.sh", "description": "Build script"}, True, "local script"), + + # Invalid commands + ({}, False, "missing name"), + ({"description": "No name"}, False, "missing name field"), + ({"name": ""}, False, "empty name"), + ({"name": 123}, False, "non-string name"), + + # Blocklisted commands + ({"name": "sudo"}, False, "blocklisted sudo"), + ({"name": "shutdown"}, False, "blocklisted shutdown"), + ({"name": "dd"}, False, "blocklisted dd"), + ] + + for cmd_config, should_be_valid, description in test_cases: + valid, error = validate_project_command(cmd_config) + if valid == should_be_valid: + print(f" PASS: {description}") + passed += 1 + else: + expected = "valid" if should_be_valid else "invalid" + actual = "valid" if valid else "invalid" + print(f" FAIL: {description}") + print(f" Expected: {expected}, Got: {actual}") + if error: + print(f" Error: {error}") + failed += 1 + + return passed, failed + + +def test_blocklist_enforcement(): + """Test blocklist enforcement in security hook.""" + print("\nTesting blocklist enforcement:\n") + passed = 0 + failed = 0 + + # All blocklisted commands should be rejected + for cmd in ["sudo apt install", "shutdown now", "dd if=/dev/zero", "aws s3 ls"]: + input_data = {"tool_name": "Bash", "tool_input": {"command": cmd}} + result = asyncio.run(bash_security_hook(input_data)) + if result.get("decision") == "block": + print(f" PASS: Blocked {cmd.split()[0]}") + passed += 1 + else: + print(f" FAIL: Should block {cmd.split()[0]}") + failed += 1 + + return passed, failed + + +def test_project_commands(): + """Test project-specific commands in security hook.""" + print("\nTesting project-specific commands:\n") + passed = 0 + failed = 0 + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + + # Create a config with Swift commands + config_path = autocoder_dir / "allowed_commands.yaml" + config_path.write_text("""version: 1 +commands: + - name: swift + description: Swift compiler + - name: xcodebuild + description: Xcode build + - name: swift* + description: All Swift tools +""") + + # Test 1: Project command should be allowed + input_data = {"tool_name": "Bash", "tool_input": {"command": "swift --version"}} + context = {"project_dir": str(project_dir)} + result = asyncio.run(bash_security_hook(input_data, context=context)) + if result.get("decision") != "block": + print(" PASS: Project command 'swift' allowed") + passed += 1 + else: + print(" FAIL: Project command 'swift' should be allowed") + print(f" Reason: {result.get('reason')}") + failed += 1 + + # Test 2: Pattern match should work + input_data = {"tool_name": "Bash", "tool_input": {"command": "swiftlint"}} + result = asyncio.run(bash_security_hook(input_data, context=context)) + if result.get("decision") != "block": + print(" PASS: Pattern 'swift*' matches 'swiftlint'") + passed += 1 + else: + print(" FAIL: Pattern 'swift*' should match 'swiftlint'") + print(f" Reason: {result.get('reason')}") + failed += 1 + + # Test 3: Non-allowed command should be blocked + input_data = {"tool_name": "Bash", "tool_input": {"command": "rustc"}} + result = asyncio.run(bash_security_hook(input_data, context=context)) + if result.get("decision") == "block": + print(" PASS: Non-allowed command 'rustc' blocked") + passed += 1 + else: + print(" FAIL: Non-allowed command 'rustc' should be blocked") + failed += 1 + + return passed, failed + + +def test_org_config_loading(): + """Test organization-level config loading.""" + print("\nTesting org config loading:\n") + passed = 0 + failed = 0 + + # Save original org config path + original_home = Path.home() + + with tempfile.TemporaryDirectory() as tmpdir: + # Temporarily override home directory for testing + import os + os.environ["HOME"] = tmpdir + + org_dir = Path(tmpdir) / ".autocoder" + org_dir.mkdir() + org_config_path = org_dir / "config.yaml" + + # Test 1: Valid org config + org_config_path.write_text("""version: 1 +allowed_commands: + - name: jq + description: JSON processor +blocked_commands: + - aws + - kubectl +""") + config = load_org_config() + if config and config["version"] == 1: + if len(config["allowed_commands"]) == 1 and len(config["blocked_commands"]) == 2: + print(" PASS: Load valid org config") + passed += 1 + else: + print(" FAIL: Load valid org config (wrong counts)") + failed += 1 + else: + print(" FAIL: Load valid org config") + print(f" Got: {config}") + failed += 1 + + # Test 2: Missing file returns None + org_config_path.unlink() + config = load_org_config() + if config is None: + print(" PASS: Missing org config returns None") + passed += 1 + else: + print(" FAIL: Missing org config returns None") + failed += 1 + + # Restore HOME + os.environ["HOME"] = str(original_home) + + return passed, failed + + +def test_hierarchy_resolution(): + """Test command hierarchy resolution.""" + print("\nTesting hierarchy resolution:\n") + passed = 0 + failed = 0 + + with tempfile.TemporaryDirectory() as tmphome: + with tempfile.TemporaryDirectory() as tmpproject: + # Setup fake home directory + import os + original_home = os.environ.get("HOME") + os.environ["HOME"] = tmphome + + org_dir = Path(tmphome) / ".autocoder" + org_dir.mkdir() + org_config_path = org_dir / "config.yaml" + + # Create org config with allowed and blocked commands + org_config_path.write_text("""version: 1 +allowed_commands: + - name: jq + description: JSON processor + - name: python3 + description: Python interpreter +blocked_commands: + - terraform + - kubectl +""") + + project_dir = Path(tmpproject) + project_autocoder = project_dir / ".autocoder" + project_autocoder.mkdir() + project_config = project_autocoder / "allowed_commands.yaml" + + # Create project config + project_config.write_text("""version: 1 +commands: + - name: swift + description: Swift compiler +""") + + # Test 1: Org allowed commands are included + allowed, blocked = get_effective_commands(project_dir) + if "jq" in allowed and "python3" in allowed: + print(" PASS: Org allowed commands included") + passed += 1 + else: + print(" FAIL: Org allowed commands included") + print(f" jq in allowed: {'jq' in allowed}") + print(f" python3 in allowed: {'python3' in allowed}") + failed += 1 + + # Test 2: Org blocked commands are in blocklist + if "terraform" in blocked and "kubectl" in blocked: + print(" PASS: Org blocked commands in blocklist") + passed += 1 + else: + print(" FAIL: Org blocked commands in blocklist") + failed += 1 + + # Test 3: Project commands are included + if "swift" in allowed: + print(" PASS: Project commands included") + passed += 1 + else: + print(" FAIL: Project commands included") + failed += 1 + + # Test 4: Global commands are included + if "npm" in allowed and "git" in allowed: + print(" PASS: Global commands included") + passed += 1 + else: + print(" FAIL: Global commands included") + failed += 1 + + # Test 5: Hardcoded blocklist cannot be overridden + if "sudo" in blocked and "shutdown" in blocked: + print(" PASS: Hardcoded blocklist enforced") + passed += 1 + else: + print(" FAIL: Hardcoded blocklist enforced") + failed += 1 + + # Restore HOME + if original_home: + os.environ["HOME"] = original_home + else: + del os.environ["HOME"] + + return passed, failed + + +def test_org_blocklist_enforcement(): + """Test that org-level blocked commands cannot be used.""" + print("\nTesting org blocklist enforcement:\n") + passed = 0 + failed = 0 + + with tempfile.TemporaryDirectory() as tmphome: + with tempfile.TemporaryDirectory() as tmpproject: + # Setup fake home directory + import os + original_home = os.environ.get("HOME") + os.environ["HOME"] = tmphome + + org_dir = Path(tmphome) / ".autocoder" + org_dir.mkdir() + org_config_path = org_dir / "config.yaml" + + # Create org config that blocks terraform + org_config_path.write_text("""version: 1 +blocked_commands: + - terraform +""") + + project_dir = Path(tmpproject) + project_autocoder = project_dir / ".autocoder" + project_autocoder.mkdir() + + # Try to use terraform (should be blocked) + input_data = {"tool_name": "Bash", "tool_input": {"command": "terraform apply"}} + context = {"project_dir": str(project_dir)} + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") == "block": + print(" PASS: Org blocked command 'terraform' rejected") + passed += 1 + else: + print(" FAIL: Org blocked command 'terraform' should be rejected") + failed += 1 + + # Restore HOME + if original_home: + os.environ["HOME"] = original_home + else: + del os.environ["HOME"] + + return passed, failed + + def main(): print("=" * 70) print(" SECURITY HOOK TESTS") @@ -174,6 +615,46 @@ def main(): passed += init_passed failed += init_failed + # Test pattern matching (Phase 1) + pattern_passed, pattern_failed = test_pattern_matching() + passed += pattern_passed + failed += pattern_failed + + # Test YAML loading (Phase 1) + yaml_passed, yaml_failed = test_yaml_loading() + passed += yaml_passed + failed += yaml_failed + + # Test command validation (Phase 1) + validation_passed, validation_failed = test_command_validation() + passed += validation_passed + failed += validation_failed + + # Test blocklist enforcement (Phase 1) + blocklist_passed, blocklist_failed = test_blocklist_enforcement() + passed += blocklist_passed + failed += blocklist_failed + + # Test project commands (Phase 1) + project_passed, project_failed = test_project_commands() + passed += project_passed + failed += project_failed + + # Test org config loading (Phase 2) + org_loading_passed, org_loading_failed = test_org_config_loading() + passed += org_loading_passed + failed += org_loading_failed + + # Test hierarchy resolution (Phase 2) + hierarchy_passed, hierarchy_failed = test_hierarchy_resolution() + passed += hierarchy_passed + failed += hierarchy_failed + + # Test org blocklist enforcement (Phase 2) + org_block_passed, org_block_failed = test_org_blocklist_enforcement() + passed += org_block_passed + failed += org_block_failed + # Commands that SHOULD be blocked print("\nCommands that should be BLOCKED:\n") dangerous = [ diff --git a/test_security_integration.py b/test_security_integration.py new file mode 100644 index 0000000..4a13329 --- /dev/null +++ b/test_security_integration.py @@ -0,0 +1,411 @@ +#!/usr/bin/env python3 +""" +Security Integration Tests +=========================== + +Integration tests that spin up real agent instances and verify +bash command security policies are enforced correctly. + +These tests actually run the agent (not just unit tests), so they: +- Create real temporary projects +- Configure real YAML files +- Execute the agent with test prompts +- Parse agent output to verify behavior + +Run with: python test_security_integration.py +""" + +import asyncio +import os +import sys +import tempfile +from pathlib import Path + +from security import bash_security_hook + + +def test_blocked_command_via_hook(): + """Test that hardcoded blocked commands are rejected by the security hook.""" + print("\n" + "=" * 70) + print("TEST 1: Hardcoded blocked command (sudo)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create minimal project structure + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text( + "version: 1\ncommands: []" + ) + + # Try to run sudo (should be blocked) + input_data = { + "tool_name": "Bash", + "tool_input": {"command": "sudo apt install nginx"}, + } + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") == "block": + print("✅ PASS: sudo was blocked") + print(f" Reason: {result.get('reason', 'N/A')[:80]}...") + return True + else: + print("❌ FAIL: sudo should have been blocked") + print(f" Got: {result}") + return False + + +def test_allowed_command_via_hook(): + """Test that default allowed commands work.""" + print("\n" + "=" * 70) + print("TEST 2: Default allowed command (ls)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create minimal project structure + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text( + "version: 1\ncommands: []" + ) + + # Try to run ls (should be allowed - in default allowlist) + input_data = {"tool_name": "Bash", "tool_input": {"command": "ls -la"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") != "block": + print("✅ PASS: ls was allowed (default allowlist)") + return True + else: + print("❌ FAIL: ls should have been allowed") + print(f" Reason: {result.get('reason', 'N/A')}") + return False + + +def test_non_allowed_command_via_hook(): + """Test that commands not in any allowlist are blocked.""" + print("\n" + "=" * 70) + print("TEST 3: Non-allowed command (wget)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create minimal project structure + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text( + "version: 1\ncommands: []" + ) + + # Try to run wget (not in default allowlist) + input_data = { + "tool_name": "Bash", + "tool_input": {"command": "wget https://example.com"}, + } + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") == "block": + print("✅ PASS: wget was blocked (not in allowlist)") + print(f" Reason: {result.get('reason', 'N/A')[:80]}...") + return True + else: + print("❌ FAIL: wget should have been blocked") + return False + + +def test_project_config_allows_command(): + """Test that adding a command to project config allows it.""" + print("\n" + "=" * 70) + print("TEST 4: Project config allows command (swift)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create project config with swift allowed + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1 +commands: + - name: swift + description: Swift compiler + - name: xcodebuild + description: Xcode build system +""") + + # Try to run swift (should be allowed via project config) + input_data = {"tool_name": "Bash", "tool_input": {"command": "swift --version"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") != "block": + print("✅ PASS: swift was allowed (project config)") + return True + else: + print("❌ FAIL: swift should have been allowed") + print(f" Reason: {result.get('reason', 'N/A')}") + return False + + +def test_pattern_matching(): + """Test that wildcard patterns work correctly.""" + print("\n" + "=" * 70) + print("TEST 5: Pattern matching (swift*)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create project config with swift* pattern + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1 +commands: + - name: swift* + description: All Swift tools +""") + + # Try to run swiftlint (should match swift* pattern) + input_data = {"tool_name": "Bash", "tool_input": {"command": "swiftlint"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") != "block": + print("✅ PASS: swiftlint matched swift* pattern") + return True + else: + print("❌ FAIL: swiftlint should have matched swift*") + print(f" Reason: {result.get('reason', 'N/A')}") + return False + + +def test_org_blocklist_enforcement(): + """Test that org-level blocked commands cannot be overridden.""" + print("\n" + "=" * 70) + print("TEST 6: Org blocklist enforcement (terraform)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmphome: + with tempfile.TemporaryDirectory() as tmpproject: + # Setup fake home directory with org config + original_home = os.environ.get("HOME") + os.environ["HOME"] = tmphome + + org_dir = Path(tmphome) / ".autocoder" + org_dir.mkdir() + (org_dir / "config.yaml").write_text("""version: 1 +allowed_commands: [] +blocked_commands: + - terraform + - kubectl +""") + + project_dir = Path(tmpproject) + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + + # Try to allow terraform in project config (should fail - org blocked) + (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1 +commands: + - name: terraform + description: Infrastructure as code +""") + + # Try to run terraform (should be blocked by org config) + input_data = { + "tool_name": "Bash", + "tool_input": {"command": "terraform apply"}, + } + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + # Restore HOME + if original_home: + os.environ["HOME"] = original_home + else: + del os.environ["HOME"] + + if result.get("decision") == "block": + print("✅ PASS: terraform blocked by org config (cannot override)") + print(f" Reason: {result.get('reason', 'N/A')[:80]}...") + return True + else: + print("❌ FAIL: terraform should have been blocked by org config") + return False + + +def test_org_allowlist_inheritance(): + """Test that org-level allowed commands are available to projects.""" + print("\n" + "=" * 70) + print("TEST 7: Org allowlist inheritance (jq)") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmphome: + with tempfile.TemporaryDirectory() as tmpproject: + # Setup fake home directory with org config + original_home = os.environ.get("HOME") + os.environ["HOME"] = tmphome + + org_dir = Path(tmphome) / ".autocoder" + org_dir.mkdir() + (org_dir / "config.yaml").write_text("""version: 1 +allowed_commands: + - name: jq + description: JSON processor +blocked_commands: [] +""") + + project_dir = Path(tmpproject) + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text( + "version: 1\ncommands: []" + ) + + # Try to run jq (should be allowed via org config) + input_data = {"tool_name": "Bash", "tool_input": {"command": "jq '.data'"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + # Restore HOME + if original_home: + os.environ["HOME"] = original_home + else: + del os.environ["HOME"] + + if result.get("decision") != "block": + print("✅ PASS: jq allowed via org config") + return True + else: + print("❌ FAIL: jq should have been allowed via org config") + print(f" Reason: {result.get('reason', 'N/A')}") + return False + + +def test_invalid_yaml_ignored(): + """Test that invalid YAML config is safely ignored.""" + print("\n" + "=" * 70) + print("TEST 8: Invalid YAML safely ignored") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create invalid YAML + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + (autocoder_dir / "allowed_commands.yaml").write_text("invalid: yaml: content:") + + # Try to run ls (should still work - falls back to defaults) + input_data = {"tool_name": "Bash", "tool_input": {"command": "ls"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") != "block": + print("✅ PASS: Invalid YAML ignored, defaults still work") + return True + else: + print("❌ FAIL: Should fall back to defaults when YAML is invalid") + print(f" Reason: {result.get('reason', 'N/A')}") + return False + + +def test_50_command_limit(): + """Test that configs with >50 commands are rejected.""" + print("\n" + "=" * 70) + print("TEST 9: 50 command limit enforced") + print("=" * 70) + + with tempfile.TemporaryDirectory() as tmpdir: + project_dir = Path(tmpdir) + + # Create config with 51 commands + autocoder_dir = project_dir / ".autocoder" + autocoder_dir.mkdir() + + commands = [ + f" - name: cmd{i}\n description: Command {i}" for i in range(51) + ] + (autocoder_dir / "allowed_commands.yaml").write_text( + "version: 1\ncommands:\n" + "\n".join(commands) + ) + + # Try to run cmd0 (should be blocked - config is invalid) + input_data = {"tool_name": "Bash", "tool_input": {"command": "cmd0"}} + context = {"project_dir": str(project_dir)} + + result = asyncio.run(bash_security_hook(input_data, context=context)) + + if result.get("decision") == "block": + print("✅ PASS: Config with >50 commands rejected") + return True + else: + print("❌ FAIL: Config with >50 commands should be rejected") + return False + + +def main(): + print("=" * 70) + print(" SECURITY INTEGRATION TESTS") + print("=" * 70) + print("\nThese tests verify bash command security policies using real hooks.") + print("They test the actual security.py implementation, not just unit tests.\n") + + tests = [ + test_blocked_command_via_hook, + test_allowed_command_via_hook, + test_non_allowed_command_via_hook, + test_project_config_allows_command, + test_pattern_matching, + test_org_blocklist_enforcement, + test_org_allowlist_inheritance, + test_invalid_yaml_ignored, + test_50_command_limit, + ] + + passed = 0 + failed = 0 + + for test in tests: + try: + if test(): + passed += 1 + else: + failed += 1 + except Exception as e: + print(f"❌ FAIL: Test raised exception: {e}") + import traceback + + traceback.print_exc() + failed += 1 + + print("\n" + "=" * 70) + print(f" RESULTS: {passed} passed, {failed} failed") + print("=" * 70) + + if failed == 0: + print("\n✅ ALL INTEGRATION TESTS PASSED") + return 0 + else: + print(f"\n❌ {failed} INTEGRATION TEST(S) FAILED") + return 1 + + +if __name__ == "__main__": + sys.exit(main())