## Summary - Put the existing git and test workflows on rails: a repeatable, automated process that can run autonomously, with guardrails and a compact TUI for visibility. - Flow: for a selected task, create a branch named with the tag + task id → generate tests for the first subtask (red) using the Surgical Test Generator → implement code (green) → verify tests → commit → repeat per subtask → final verify → push → open PR against the default branch. - Build on existing rules: .cursor/rules/git_workflow.mdc, .cursor/rules/test_workflow.mdc, .claude/agents/surgical-test-generator.md, and existing CLI/core services. ## Goals - Deterministic, resumable automation to execute the TDD loop per subtask with minimal human intervention. - Strong guardrails: never commit to the default branch; only commit when tests pass; enforce status transitions; persist logs/state for debuggability. - Visibility: a compact terminal UI (like lazygit) to pick tag, view tasks, and start work; right-side pane opens an executor terminal (via tmux) for agent coding. - Extensible: framework-agnostic test generation via the Surgical Test Generator; detect and use the repo’s test command for execution with coverage thresholds. ## Non‑Goals (initial) - Full multi-language runner parity beyond detection and executing the project’s test command. - Complex GUI; start with CLI/TUI + tmux pane. IDE/extension can hook into the same state later. - Rich executor selection UX (codex/gemini/claude) — we’ll prompt per run; defaults can come later. ## Success Criteria - One command can autonomously complete a task's subtasks via TDD and open a PR when done. - All commits made on a branch that includes the tag and task id (see Branch Naming); no commits to the default branch directly. - Every subtask iteration: failing tests added first (red), then code added to pass them (green), commit only after green. - End-to-end logs + artifacts stored in .taskmaster/reports/runs//. ## Success Metrics (Phase 1) - **Adoption**: 80% of tasks in a pilot repo completed via `tm autopilot` - **Safety**: 0 commits to default branch; 100% of commits have green tests - **Efficiency**: Average time from task start to PR < 30min for simple subtasks - **Reliability**: < 5% of runs require manual intervention (timeout/conflicts) ## User Stories - As a developer, I can run tm autopilot and watch a structured, safe workflow execute. - As a reviewer, I can inspect commits per subtask, and a PR summarizing the work when the task completes. - As an operator, I can see current step, active subtask, tests status, and logs in a compact CLI view and read a final run report. ## Example Workflow Traces ### Happy Path: Complete a 3-subtask feature ```bash # Developer starts $ tm autopilot 42 → Checks preflight: ✓ clean tree, ✓ npm test detected → Creates branch: analytics/task-42-user-metrics → Subtask 42.1: "Add metrics schema" RED: generates test_metrics_schema.test.js → 3 failures GREEN: implements schema.js → all pass COMMIT: "feat(metrics): add metrics schema (task 42.1)" → Subtask 42.2: "Add collection endpoint" RED: generates test_metrics_endpoint.test.js → 5 failures GREEN: implements api/metrics.js → all pass COMMIT: "feat(metrics): add collection endpoint (task 42.2)" → Subtask 42.3: "Add dashboard widget" RED: generates test_metrics_widget.test.js → 4 failures GREEN: implements components/MetricsWidget.jsx → all pass COMMIT: "feat(metrics): add dashboard widget (task 42.3)" → Final: all 3 subtasks complete ✓ Run full test suite → all pass ✓ Coverage check → 85% (meets 80% threshold) PUSH: confirms with user → pushed to origin PR: opens #123 "Task #42 [analytics]: User metrics tracking" ✓ Task 42 complete. PR: https://github.com/org/repo/pull/123 Run report: .taskmaster/reports/runs/2025-01-15-142033/ ``` ### Error Recovery: Failing tests timeout ```bash $ tm autopilot 42 → Subtask 42.2 GREEN phase: attempt 1 fails (2 tests still red) → Subtask 42.2 GREEN phase: attempt 2 fails (1 test still red) → Subtask 42.2 GREEN phase: attempt 3 fails (1 test still red) ⚠️ Paused: Could not achieve green state after 3 attempts 📋 State saved to: .taskmaster/reports/runs/2025-01-15-142033/ Last error: "POST /api/metrics returns 500 instead of 201" Next steps: - Review diff: git diff HEAD - Inspect logs: cat .taskmaster/reports/runs/2025-01-15-142033/log.jsonl - Check test output: cat .taskmaster/reports/runs/2025-01-15-142033/test-results/subtask-42.2-green-attempt3.json - Resume after manual fix: tm autopilot --resume # Developer manually fixes the issue, then: $ tm autopilot --resume → Resuming subtask 42.2 GREEN phase GREEN: all tests pass COMMIT: "feat(metrics): add collection endpoint (task 42.2)" → Continuing to subtask 42.3... ``` ### Dry Run: Preview before execution ```bash $ tm autopilot 42 --dry-run Autopilot Plan for Task #42 [analytics]: User metrics tracking ───────────────────────────────────────────────────────────── Preflight: ✓ Working tree is clean ✓ Test command detected: npm test ✓ Tools available: git, gh, node, npm ✓ Current branch: main (will create new branch) Branch & Tag: → Create branch: analytics/task-42-user-metrics → Set active tag: analytics Subtasks (3 pending): 1. 42.1: Add metrics schema - RED: generate tests in src/__tests__/schema.test.js - GREEN: implement src/schema.js - COMMIT: "feat(metrics): add metrics schema (task 42.1)" 2. 42.2: Add collection endpoint [depends on 42.1] - RED: generate tests in src/api/__tests__/metrics.test.js - GREEN: implement src/api/metrics.js - COMMIT: "feat(metrics): add collection endpoint (task 42.2)" 3. 42.3: Add dashboard widget [depends on 42.2] - RED: generate tests in src/components/__tests__/MetricsWidget.test.jsx - GREEN: implement src/components/MetricsWidget.jsx - COMMIT: "feat(metrics): add dashboard widget (task 42.3)" Finalization: → Run full test suite with coverage → Push branch to origin (will confirm) → Create PR targeting main Run without --dry-run to execute. ``` ## High‑Level Workflow 1) Pre‑flight - Verify clean working tree or confirm staging/commit policy (configurable). - Detect repo type and the project’s test command (e.g., npm test, pnpm test, pytest, go test). - Validate tools: git, gh (optional for PR), node/npm, and (if used) claude CLI. - Load TaskMaster state and selected task; if no subtasks exist, automatically run “expand” before working. 2) Branch & Tag Setup - Checkout default branch and update (optional), then create a branch using Branch Naming (below). - Map branch ↔ tag via existing tag management; explicitly set active tag to the branch’s tag. 3) Subtask Loop (for each pending/in-progress subtask in dependency order) - Select next eligible subtask using tm-core TaskService getNextTask() and subtask eligibility logic. - Red: generate or update failing tests for the subtask - Use the Surgical Test Generator system prompt .claude/agents/surgical-test-generator.md) to produce high-signal tests following project conventions. - Run tests to confirm red; record results. If not red (already passing), skip to next subtask or escalate. - Green: implement code to pass tests - Use executor to implement changes (initial: claude CLI prompt with focused context). - Re-run tests until green or timeout/backoff policy triggers. - Commit: when green - Commit tests + code with conventional commit message. Optionally update subtask status to done. - Persist run step metadata/logs. 4) Finalization - Run full test suite and coverage (if configured); optionally lint/format. - Commit any final adjustments. - Push branch (ask user to confirm); create PR (via gh pr create) targeting the default branch. Title format: Task # []: . 5) Post‑Run - Update task status if desired (e.g., review). - Persist run report (JSON + markdown summary) to .taskmaster/reports/runs/<run-id>/. ## Guardrails - Never commit to the default branch. - Commit only if all tests (targeted and suite) pass; allow override flags. - Enforce 80% coverage thresholds (lines/branches/functions/statements) by default; configurable. - Timebox/model ops and retries; if not green within N attempts, pause with actionable state for resume. - Always log actions, commands, and outcomes; include dry-run mode. - Ask before branch creation, pushing, and opening a PR unless --no-confirm is set. ## Integration Points (Current Repo) - CLI: apps/cli provides command structure and UI components. - New command: tm autopilot (alias: task-master autopilot). - Reuse UI components under apps/cli/src/ui/components/ for headers/task details/next-task. - Core services: packages/tm-core - TaskService for selection, status, tags. - TaskExecutionService for prompt formatting and executor prep. - Executors: claude executor and ExecutorFactory to run external tools. - Proposed new: WorkflowOrchestrator to drive the autonomous loop and emit progress events. - Tag/Git utilities: scripts/modules/utils/git-utils.js and scripts/modules/task-manager/tag-management.js for branch→tag mapping and explicit tag switching. - Rules: .cursor/rules/git_workflow.mdc and .cursor/rules/test_workflow.mdc to steer behavior and ensure consistency. - Test generation prompt: .claude/agents/surgical-test-generator.md. ## Proposed Components - Orchestrator (tm-core): WorkflowOrchestrator (new) - State machine driving phases: Preflight → Branch/Tag → SubtaskIter (Red/Green/Commit) → Finalize → PR. - Exposes an evented API (progress events) that the CLI can render. - Stores run state artifacts. - Test Runner Adapter - Detects and runs tests via the project’s test command (e.g., npm test), with targeted runs where feasible. - API: runTargeted(files/pattern), runAll(), report summary (failures, duration, coverage), enforce 80% threshold by default. - Git/PR Adapter - Encapsulates git ops: branch create/checkout, add/commit, push. - Optional gh integration to open PR; fallback to instructions if gh unavailable. - Confirmation gates for branch creation and pushes. - Prompt/Exec Adapter - Uses existing executor service to call the selected coding assistant (initially claude) with tight prompts: task/subtask context, surgical tests first, then minimal code to green. - Run State + Reporting - JSONL log of steps, timestamps, commands, test results. - Markdown summary for PR description and post-run artifact. ## CLI UX (MVP) - Command: tm autopilot [taskId] - Flags: --dry-run, --no-push, --no-pr, --no-confirm, --force, --max-attempts <n>, --runner <auto|custom>, --commit-scope <scope> - Output: compact header (project, tag, branch), current phase, subtask line, last test summary, next actions. - Resume: If interrupted, tm autopilot --resume picks up from last checkpoint in run state. ### TUI with tmux (Linear Execution) - Left pane: Tag selector, task list (status/priority), start/expand shortcuts; "Start" triggers the next task or a selected task. - Right pane: Executor terminal (tmux split) that runs the coding agent (claude-code/codex). Autopilot can hand over to the right pane during green. - MCP integration: use MCP tools for task queries/updates and for shell/test invocations where available. ## TUI Layout (tmux-based) ### Pane Structure ``` ┌─────────────────────────────────────┬──────────────────────────────────┐ │ Task Navigator (left) │ Executor Terminal (right) │ │ │ │ │ Project: my-app │ $ tm autopilot --executor-mode │ │ Branch: analytics/task-42 │ > Running subtask 42.2 GREEN... │ │ Tag: analytics │ > Implementing endpoint... │ │ │ > Tests: 3 passed, 0 failed │ │ Tasks: │ > Ready to commit │ │ → 42 [in-progress] User metrics │ │ │ → 42.1 [done] Schema │ [Live output from Claude Code] │ │ → 42.2 [active] Endpoint ◀ │ │ │ → 42.3 [pending] Dashboard │ │ │ │ │ │ [s] start [p] pause [q] quit │ │ └─────────────────────────────────────┴──────────────────────────────────┘ ``` ### Implementation Notes - **Left pane**: `apps/cli/src/ui/tui/navigator.ts` (new, uses `blessed` or `ink`) - **Right pane**: spawned via `tmux split-window -h` running `tm autopilot --executor-mode` - **Communication**: shared state file `.taskmaster/state/current-run.json` + file watching or event stream - **Keybindings**: - `s` - Start selected task - `p` - Pause/resume current run - `q` - Quit (with confirmation if run active) - `↑/↓` - Navigate task list - `Enter` - Expand/collapse subtasks ## Prompt Composition (Detailed) ### System Prompt Assembly Prompts are composed in three layers: 1. **Base rules** (loaded in order from `.cursor/rules/` and `.claude/agents/`): - `git_workflow.mdc` → git commit conventions, branch policy, PR guidelines - `test_workflow.mdc` → TDD loop requirements, coverage thresholds, test structure - `surgical-test-generator.md` → test generation methodology, project-specific test patterns 2. **Task context injection**: ``` You are implementing: Task #42 [analytics]: User metrics tracking Subtask 42.2: Add collection endpoint Description: Implement POST /api/metrics endpoint to collect user metrics events Acceptance criteria: - POST /api/metrics accepts { userId, eventType, timestamp } - Validates input schema (reject missing/invalid fields) - Persists to database - Returns 201 on success with created record - Returns 400 on validation errors Dependencies: - Subtask 42.1 (metrics schema) is complete Current phase: RED (generate failing tests) Test command: npm test Test file convention: src/**/*.test.js (vitest framework detected) Branch: analytics/task-42-user-metrics Project language: JavaScript (Node.js) ``` 3. **Phase-specific instructions**: - **RED phase**: "Generate minimal failing tests for this subtask. Do NOT implement any production code. Only create test files. Confirm tests fail with clear error messages indicating missing implementation." - **GREEN phase**: "Implement minimal code to pass the failing tests. Follow existing project patterns in `src/`. Only modify files necessary for this subtask. Keep changes focused and reviewable." ### Example Full Prompt (RED Phase) ```markdown <SYSTEM PROMPT> [Contents of .cursor/rules/git_workflow.mdc] [Contents of .cursor/rules/test_workflow.mdc] [Contents of .claude/agents/surgical-test-generator.md] <TASK CONTEXT> You are implementing: Task #42.2: Add collection endpoint Description: Implement POST /api/metrics endpoint to collect user metrics events Acceptance criteria: - POST /api/metrics accepts { userId, eventType, timestamp } - Validates input schema (reject missing/invalid fields) - Persists to database using MetricsSchema from subtask 42.1 - Returns 201 on success with created record - Returns 400 on validation errors with details Dependencies: Subtask 42.1 (metrics schema) is complete <INSTRUCTION> Generate failing tests for this subtask. Follow project conventions: - Test file: src/api/__tests__/metrics.test.js - Framework: vitest (detected from package.json) - Test cases to cover: * POST /api/metrics with valid payload → should return 201 (will fail: endpoint not implemented) * POST /api/metrics with missing userId → should return 400 (will fail: validation not implemented) * POST /api/metrics with invalid timestamp → should return 400 (will fail: validation not implemented) * POST /api/metrics should persist to database → should save record (will fail: persistence not implemented) Do NOT implement the endpoint code yet. Only create test file(s). Confirm tests fail with messages like "Cannot POST /api/metrics" or "endpoint not defined". Output format: 1. File path to create: src/api/__tests__/metrics.test.js 2. Complete test code 3. Command to run: npm test src/api/__tests__/metrics.test.js ``` ### Example Full Prompt (GREEN Phase) ```markdown <SYSTEM PROMPT> [Contents of .cursor/rules/git_workflow.mdc] [Contents of .cursor/rules/test_workflow.mdc] <TASK CONTEXT> Task #42.2: Add collection endpoint [same context as RED phase] <CURRENT STATE> Tests created in RED phase: - src/api/__tests__/metrics.test.js - 5 tests written, all failing as expected Test output: ``` FAIL src/api/__tests__/metrics.test.js POST /api/metrics ✗ should return 201 with valid payload (endpoint not found) ✗ should return 400 with missing userId (endpoint not found) ✗ should return 400 with invalid timestamp (endpoint not found) ✗ should persist to database (endpoint not found) ``` <INSTRUCTION> Implement minimal code to make all tests pass. Guidelines: - Create/modify file: src/api/metrics.js - Use existing patterns from src/api/ (e.g., src/api/users.js for reference) - Import MetricsSchema from subtask 42.1 (src/models/schema.js) - Implement validation, persistence, and response handling - Follow project error handling conventions - Keep implementation focused on this subtask only After implementation: 1. Run tests: npm test src/api/__tests__/metrics.test.js 2. Confirm all 5 tests pass 3. Report results Output format: 1. File(s) created/modified 2. Implementation code 3. Test command and results ``` ### Prompt Loading Configuration See `.taskmaster/config.json` → `prompts` section for paths and load order. ## Configuration Schema ### .taskmaster/config.json ```json { "autopilot": { "enabled": true, "requireCleanWorkingTree": true, "commitTemplate": "{type}({scope}): {msg}", "defaultCommitType": "feat", "maxGreenAttempts": 3, "testTimeout": 300000 }, "test": { "runner": "auto", "coverageThresholds": { "lines": 80, "branches": 80, "functions": 80, "statements": 80 }, "targetedRunPattern": "**/*.test.js" }, "git": { "branchPattern": "{tag}/task-{id}-{slug}", "pr": { "enabled": true, "base": "default", "bodyTemplate": ".taskmaster/templates/pr-body.md" } }, "prompts": { "rulesPath": ".cursor/rules", "testGeneratorPath": ".claude/agents/surgical-test-generator.md", "loadOrder": ["git_workflow.mdc", "test_workflow.mdc"] } } ``` ### Configuration Fields #### autopilot - `enabled` (boolean): Enable/disable autopilot functionality - `requireCleanWorkingTree` (boolean): Require clean git state before starting - `commitTemplate` (string): Template for commit messages (tokens: `{type}`, `{scope}`, `{msg}`) - `defaultCommitType` (string): Default commit type (feat, fix, chore, etc.) - `maxGreenAttempts` (number): Maximum retry attempts to achieve green tests (default: 3) - `testTimeout` (number): Timeout in milliseconds per test run (default: 300000 = 5min) #### test - `runner` (string): Test runner detection mode (`"auto"` or explicit command like `"npm test"`) - `coverageThresholds` (object): Minimum coverage percentages required - `lines`, `branches`, `functions`, `statements` (number): Threshold percentages (0-100) - `targetedRunPattern` (string): Glob pattern for targeted subtask test runs #### git - `branchPattern` (string): Branch naming pattern (tokens: `{tag}`, `{id}`, `{slug}`) - `pr.enabled` (boolean): Enable automatic PR creation - `pr.base` (string): Target branch for PRs (`"default"` uses repo default, or specify like `"main"`) - `pr.bodyTemplate` (string): Path to PR body template file (optional) #### prompts - `rulesPath` (string): Directory containing rule files (e.g., `.cursor/rules`) - `testGeneratorPath` (string): Path to test generator prompt file - `loadOrder` (array): Order to load rule files from `rulesPath` ### Environment Variables ```bash # Required for executor ANTHROPIC_API_KEY=sk-ant-... # Claude API key # Optional: for PR creation GITHUB_TOKEN=ghp_... # GitHub personal access token # Optional: for other executors (future) OPENAI_API_KEY=sk-... GOOGLE_API_KEY=... ``` ## Run Artifacts & Observability ### Per-Run Artifact Structure Each autopilot run creates a timestamped directory with complete traceability: ``` .taskmaster/reports/runs/2025-01-15-142033/ ├── manifest.json # run metadata (task id, start/end time, status) ├── log.jsonl # timestamped event stream ├── commits.txt # list of commit SHAs made during run ├── test-results/ │ ├── subtask-42.1-red.json │ ├── subtask-42.1-green.json │ ├── subtask-42.2-red.json │ ├── subtask-42.2-green-attempt1.json │ ├── subtask-42.2-green-attempt2.json │ ├── subtask-42.2-green-attempt3.json │ └── final-suite.json └── pr.md # generated PR body ``` ### manifest.json Format ```json { "runId": "2025-01-15-142033", "taskId": "42", "tag": "analytics", "branch": "analytics/task-42-user-metrics", "startTime": "2025-01-15T14:20:33Z", "endTime": "2025-01-15T14:45:12Z", "status": "completed", "subtasksCompleted": ["42.1", "42.2", "42.3"], "subtasksFailed": [], "totalCommits": 3, "prUrl": "https://github.com/org/repo/pull/123", "finalCoverage": { "lines": 85.3, "branches": 82.1, "functions": 88.9, "statements": 85.0 } } ``` ### log.jsonl Format Event stream in JSON Lines format for easy parsing and debugging: ```jsonl {"ts":"2025-01-15T14:20:33Z","phase":"preflight","status":"ok","details":{"testCmd":"npm test","gitClean":true}} {"ts":"2025-01-15T14:20:45Z","phase":"branch","status":"ok","branch":"analytics/task-42-user-metrics"} {"ts":"2025-01-15T14:21:00Z","phase":"red","subtask":"42.1","status":"ok","tests":{"failed":3,"passed":0}} {"ts":"2025-01-15T14:22:15Z","phase":"green","subtask":"42.1","status":"ok","tests":{"passed":3,"failed":0},"attempts":2} {"ts":"2025-01-15T14:22:20Z","phase":"commit","subtask":"42.1","status":"ok","sha":"a1b2c3d","message":"feat(metrics): add metrics schema (task 42.1)"} {"ts":"2025-01-15T14:23:00Z","phase":"red","subtask":"42.2","status":"ok","tests":{"failed":5,"passed":0}} {"ts":"2025-01-15T14:25:30Z","phase":"green","subtask":"42.2","status":"error","tests":{"passed":3,"failed":2},"attempts":3,"error":"Max attempts reached"} {"ts":"2025-01-15T14:25:35Z","phase":"pause","reason":"max_attempts","nextAction":"manual_review"} ``` ### Test Results Format Each test run stores detailed results: ```json { "subtask": "42.2", "phase": "green", "attempt": 3, "timestamp": "2025-01-15T14:25:30Z", "command": "npm test src/api/__tests__/metrics.test.js", "exitCode": 1, "duration": 2340, "summary": { "total": 5, "passed": 3, "failed": 2, "skipped": 0 }, "failures": [ { "test": "POST /api/metrics should return 201 with valid payload", "error": "Expected status 201, got 500", "stack": "..." } ], "coverage": { "lines": 78.5, "branches": 75.0, "functions": 80.0, "statements": 78.5 } } ``` ## Execution Model ### Orchestration vs Direct Execution The autopilot system uses an **orchestration model** rather than direct code execution: **Orchestrator Role** (tm-core WorkflowOrchestrator): - Maintains state machine tracking current phase (RED/GREEN/COMMIT) per subtask - Validates preconditions (tests pass, git state clean, etc.) - Returns "work units" describing what needs to be done next - Records completion and advances to next phase - Persists state for resumability **Executor Role** (Claude Code/AI session via MCP): - Queries orchestrator for next work unit - Executes the work (generates tests, writes code, runs tests, makes commits) - Reports results back to orchestrator - Handles file operations and tool invocations **Why This Approach?** - Leverages existing AI capabilities (Claude Code) rather than duplicating them - MCP protocol provides clean separation between state management and execution - Allows human oversight and intervention at each phase - Simpler to implement: orchestrator is pure state logic, no code generation needed - Enables multiple executor types (Claude Code, other AI tools, human developers) **Example Flow**: ```typescript // Claude Code (via MCP) queries orchestrator const workUnit = await orchestrator.getNextWorkUnit('42'); // => { // phase: 'RED', // subtask: '42.1', // action: 'Generate failing tests for metrics schema', // context: { title, description, dependencies, testFile: 'src/__tests__/schema.test.js' } // } // Claude Code executes the work (writes test file, runs tests) // Then reports back await orchestrator.completeWorkUnit('42', '42.1', 'RED', { success: true, testsCreated: ['src/__tests__/schema.test.js'], testsFailed: 3 }); // Query again for next phase const nextWorkUnit = await orchestrator.getNextWorkUnit('42'); // => { phase: 'GREEN', subtask: '42.1', action: 'Implement code to pass tests', ... } ``` ## Design Decisions ### Why commit per subtask instead of per task? **Decision**: Commit after each subtask's green state, not after the entire task. **Rationale**: - Atomic commits make code review easier (reviewers can see logical progression) - Easier to revert a single subtask if it causes issues downstream - Matches the TDD loop's natural checkpoint and cognitive boundary - Provides resumability points if the run is interrupted **Trade-off**: More commits per task (can use squash-merge in PRs if desired) ### Why not support parallel subtask execution? **Decision**: Sequential subtask execution in Phase 1; parallel execution deferred to Phase 3. **Rationale**: - Subtasks often have implicit dependencies (e.g., schema before endpoint, endpoint before UI) - Simpler orchestrator state machine (less complexity = faster to ship) - Parallel execution requires explicit dependency DAG and conflict resolution - Can be added in Phase 3 once core workflow is proven stable **Trade-off**: Slower for truly independent subtasks (mitigated by keeping subtasks small and focused) ### Why require 80% coverage by default? **Decision**: Enforce 80% coverage threshold (lines/branches/functions/statements) before allowing commits. **Rationale**: - Industry standard baseline for production code quality - Forces test generation to be comprehensive, not superficial - Configurable per project via `.taskmaster/config.json` if too strict - Prevents "green tests" that only test happy paths **Trade-off**: May require more test generation iterations; can be lowered per project ### Why use tmux instead of a rich GUI? **Decision**: MVP uses tmux split panes for TUI, not Electron/web-based GUI. **Rationale**: - Tmux is universally available on dev machines; no installation burden - Terminal-first workflows match developer mental model (no context switching) - Simpler to implement and maintain; can add GUI later via extensions - State stored in files allows IDE/extension integration without coupling **Trade-off**: Less visual polish than GUI; requires tmux familiarity ### Why not support multiple executors (codex/gemini/claude) in Phase 1? **Decision**: Start with Claude executor only; add others in Phase 2+. **Rationale**: - Reduces scope and complexity for initial delivery - Claude Code already integrated with existing executor service - Executor abstraction already exists; adding more is straightforward later - Different executors may need different prompt strategies (requires experimentation) **Trade-off**: Users locked to Claude initially; can work around with manual executor selection ## Risks and Mitigations - Model hallucination/large diffs: restrict prompt scope; enforce minimal changes; show diff previews (optional) before commit. - Flaky tests: allow retries, isolate targeted runs for speed, then full suite before commit. - Environment variability: detect runners/tools; provide fallbacks and actionable errors. - PR creation fails: still push and print manual commands; persist PR body to reuse. ## Open Questions 1) Slugging rules for branch names; any length limits or normalization beyond {slug} token sanitize? 2) PR body standard sections beyond run report (e.g., checklist, coverage table)? 3) Default executor prompt fine-tuning once codex/gemini integration is available. 4) Where to store persistent TUI state (pane layout, last selection) in .taskmaster/state.json? ## Branch Naming - Include both the tag and the task id in the branch name to make lineage explicit. - Default pattern: <tag>/task-<id>[-slug] (e.g., master/task-12, tag-analytics/task-4-user-auth). - Configurable via .taskmaster/config.json: git.branchPattern supports tokens {tag}, {id}, {slug}. ## PR Base Branch - Use the repository’s default branch (detected via git) unless overridden. - Title format: Task #<id> [<tag>]: <title>. ## RPG Mapping (Repository Planning Graph) Functional nodes (capabilities): - Autopilot Orchestration → drives TDD loop and lifecycle - Test Generation (Surgical) → produces failing tests from subtask context - Test Execution + Coverage → runs suite, enforces thresholds - Git/Branch/PR Management → safe operations and PR creation - TUI/Terminal Integration → interactive control and visibility via tmux - MCP Integration → structured task/status/context operations Structural nodes (code organization): - packages/tm-core: - services/workflow-orchestrator.ts (new) - services/test-runner-adapter.ts (new) - services/git-adapter.ts (new) - existing: task-service.ts, task-execution-service.ts, executors/* - apps/cli: - src/commands/autopilot.command.ts (new) - src/ui/tui/ (new tmux/TUI helpers) - scripts/modules: - reuse utils/git-utils.js, task-manager/tag-management.js - .claude/agents/: - surgical-test-generator.md Edges (data/control flow): - Autopilot → Test Generation → Test Execution → Git Commit → loop - Autopilot → Git Adapter (branch, tag, PR) - Autopilot → TUI (event stream) → tmux pane control - Autopilot → MCP tools for task/status updates - Test Execution → Coverage gate → Autopilot decision Topological traversal (implementation order): 1) Git/Test adapters (foundations) 2) Orchestrator skeleton + events 3) CLI autopilot command and dry-run 4) Surgical test-gen integration and execution gate 5) PR creation, run reports, resumability ## Phased Roadmap - Phase 0: Spike - Implement CLI skeleton tm autopilot with dry-run showing planned steps from a real task + subtasks. - Detect test runner (package.json) and git state; render a preflight report. - Phase 1: Core Rails (State Machine & Orchestration) - Implement WorkflowOrchestrator in tm-core as a **state machine** that tracks TDD phases per subtask. - Orchestrator **guides** the current AI session (Claude Code/MCP client) rather than executing code itself. - Add Git/Test adapters for status checks and validation (not direct execution). - WorkflowOrchestrator API: - `getNextWorkUnit(taskId)` → returns next phase to execute (RED/GREEN/COMMIT) with context - `completeWorkUnit(taskId, subtaskId, phase, result)` → records completion and advances state - `getRunState(taskId)` → returns current progress and resumability data - MCP integration: expose work unit endpoints so Claude Code can query "what to do next" and report back. - Branch/tag mapping via existing tag-management APIs. - Run report persisted under .taskmaster/reports/runs/ with state checkpoints for resumability. - Phase 2: PR + Resumability - Add gh PR creation with well-formed body using the run report. - Introduce resumable checkpoints and --resume flag. - Add coverage enforcement and optional lint/format step. - Phase 3: Extensibility + Guardrails - Add support for basic pytest/go test adapters. - Add safeguards: diff preview mode, manual confirm gates, aggressive minimal-change prompts. - Optional: small TUI panel and extension panel leveraging the same run state file. ## References (Repo) - Test Workflow: .cursor/rules/test_workflow.mdc - Git Workflow: .cursor/rules/git_workflow.mdc - CLI: apps/cli/src/commands/start.command.ts, apps/cli/src/ui/components/*.ts - Core Services: packages/tm-core/src/services/task-service.ts, task-execution-service.ts - Executors: packages/tm-core/src/executors/* - Git Utilities: scripts/modules/utils/git-utils.js - Tag Management: scripts/modules/task-manager/tag-management.js - Surgical Test Generator: .claude/agents/surgical-test-generator.md