chore: create plan for task execution

2025-10-06 18:07:50 +02:00
parent 5cb7ed557a
commit 27b2348a9a
5 changed files with 877 additions and 5 deletions
--- a/.taskmaster/config.json
+++ b/.taskmaster/config.json
@@ -1,9 +1,9 @@
 {
 	"models": {
 		"main": {
-			"provider": "anthropic",
-			"modelId": "claude-sonnet-4-20250514",
-			"maxTokens": 64000,
+			"provider": "claude-code",
+			"modelId": "opus",
+			"maxTokens": 32000,
 			"temperature": 0.2
 		},
 		"research": {
--- a/.taskmaster/docs/autonomous-tdd-git-workflow.md
+++ b/.taskmaster/docs/autonomous-tdd-git-workflow.md
@@ -0,0 +1,360 @@
+## Summary
+
+- Put the existing git and test workflows on rails: a repeatable, automated process that can run autonomously, with guardrails and a compact TUI for visibility.
+
+- Flow: for a selected task, create a branch named with the tag + task id → generate tests for the first subtask (red) using the Surgical Test Generator → implement code (green) → verify tests → commit → repeat per subtask → final verify → push → open PR against the default branch.
+
+- Build on existing rules: .cursor/rules/git_workflow.mdc, .cursor/rules/test_workflow.mdc, .claude/agents/surgical-test-generator.md, and existing CLI/core services.
+
+## Goals
+
+- Deterministic, resumable automation to execute the TDD loop per subtask with minimal human intervention.
+
+- Strong guardrails: never commit to the default branch; only commit when tests pass; enforce status transitions; persist logs/state for debuggability.
+
+- Visibility: a compact terminal UI (like lazygit) to pick tag, view tasks, and start work; right-side pane opens an executor terminal (via tmux) for agent coding.
+
+- Extensible: framework-agnostic test generation via the Surgical Test Generator; detect and use the repo’s test command for execution with coverage thresholds.
+
+## Non‑Goals (initial)
+
+- Full multi-language runner parity beyond detection and executing the project’s test command.
+
+- Complex GUI; start with CLI/TUI + tmux pane. IDE/extension can hook into the same state later.
+
+- Rich executor selection UX (codex/gemini/claude) — we’ll prompt per run; defaults can come later.
+
+## Success Criteria
+
+- One command can autonomously complete a task’s subtasks via TDD and open a PR when done.
+
+- All commits made on a branch that includes the tag and task id (see Branch Naming); no commits to the default branch directly.
+
+- Every subtask iteration: failing tests added first (red), then code added to pass them (green), commit only after green.
+
+- End-to-end logs + artifacts stored in .taskmaster/reports/runs/<timestamp-or-id>/.
+
+## User Stories
+
+- As a developer, I can run tm autopilot <taskId> and watch a structured, safe workflow execute.
+
+- As a reviewer, I can inspect commits per subtask, and a PR summarizing the work when the task completes.
+
+- As an operator, I can see current step, active subtask, tests status, and logs in a compact CLI view and read a final run report.
+
+## High‑Level Workflow
+
+1) Pre‑flight
+
+   - Verify clean working tree or confirm staging/commit policy (configurable).
+
+   - Detect repo type and the project’s test command (e.g., npm test, pnpm test, pytest, go test).
+
+   - Validate tools: git, gh (optional for PR), node/npm, and (if used) claude CLI.
+
+   - Load TaskMaster state and selected task; if no subtasks exist, automatically run “expand” before working.
+
+2) Branch & Tag Setup
+
+   - Checkout default branch and update (optional), then create a branch using Branch Naming (below).
+
+   - Map branch ↔ tag via existing tag management; explicitly set active tag to the branch’s tag.
+
+3) Subtask Loop (for each pending/in-progress subtask in dependency order)
+
+   - Select next eligible subtask using tm-core TaskService getNextTask() and subtask eligibility logic.
+
+   - Red: generate or update failing tests for the subtask
+
+     - Use the Surgical Test Generator system prompt .claude/agents/surgical-test-generator.md) to produce high-signal tests following project conventions.
+
+     - Run tests to confirm red; record results. If not red (already passing), skip to next subtask or escalate.
+
+   - Green: implement code to pass tests
+
+     - Use executor to implement changes (initial: claude CLI prompt with focused context).
+
+     - Re-run tests until green or timeout/backoff policy triggers.
+
+   - Commit: when green
+
+     - Commit tests + code with conventional commit message. Optionally update subtask status to done.
+
+     - Persist run step metadata/logs.
+
+4) Finalization
+
+   - Run full test suite and coverage (if configured); optionally lint/format.
+
+   - Commit any final adjustments.
+
+   - Push branch (ask user to confirm); create PR (via gh pr create) targeting the default branch. Title format: Task #<id> [<tag>]: <title>.
+
+5) Post‑Run
+
+   - Update task status if desired (e.g., review).
+
+   - Persist run report (JSON + markdown summary) to .taskmaster/reports/runs/<run-id>/.
+
+## Guardrails
+
+- Never commit to the default branch.
+
+- Commit only if all tests (targeted and suite) pass; allow override flags.
+
+- Enforce 80% coverage thresholds (lines/branches/functions/statements) by default; configurable.
+
+- Timebox/model ops and retries; if not green within N attempts, pause with actionable state for resume.
+
+- Always log actions, commands, and outcomes; include dry-run mode.
+
+- Ask before branch creation, pushing, and opening a PR unless --no-confirm is set.
+
+## Integration Points (Current Repo)
+
+- CLI: apps/cli provides command structure and UI components.
+
+  - New command: tm autopilot (alias: task-master autopilot).
+
+  - Reuse UI components under apps/cli/src/ui/components/ for headers/task details/next-task.
+
+- Core services: packages/tm-core
+
+  - TaskService for selection, status, tags.
+
+  - TaskExecutionService for prompt formatting and executor prep.
+
+  - Executors: claude executor and ExecutorFactory to run external tools.
+
+  - Proposed new: WorkflowOrchestrator to drive the autonomous loop and emit progress events.
+
+- Tag/Git utilities: scripts/modules/utils/git-utils.js and scripts/modules/task-manager/tag-management.js for branch→tag mapping and explicit tag switching.
+
+- Rules: .cursor/rules/git_workflow.mdc and .cursor/rules/test_workflow.mdc to steer behavior and ensure consistency.
+
+- Test generation prompt: .claude/agents/surgical-test-generator.md.
+
+## Proposed Components
+
+- Orchestrator (tm-core): WorkflowOrchestrator (new)
+
+  - State machine driving phases: Preflight → Branch/Tag → SubtaskIter (Red/Green/Commit) → Finalize → PR.
+
+  - Exposes an evented API (progress events) that the CLI can render.
+
+  - Stores run state artifacts.
+
+- Test Runner Adapter
+
+  - Detects and runs tests via the project’s test command (e.g., npm test), with targeted runs where feasible.
+
+  - API: runTargeted(files/pattern), runAll(), report summary (failures, duration, coverage), enforce 80% threshold by default.
+
+- Git/PR Adapter
+
+  - Encapsulates git ops: branch create/checkout, add/commit, push.
+
+  - Optional gh integration to open PR; fallback to instructions if gh unavailable.
+
+  - Confirmation gates for branch creation and pushes.
+
+- Prompt/Exec Adapter
+
+  - Uses existing executor service to call the selected coding assistant (initially claude) with tight prompts: task/subtask context, surgical tests first, then minimal code to green.
+
+- Run State + Reporting
+
+  - JSONL log of steps, timestamps, commands, test results.
+
+  - Markdown summary for PR description and post-run artifact.
+
+## CLI UX (MVP)
+
+- Command: tm autopilot [taskId]
+
+  - Flags: --dry-run, --no-push, --no-pr, --no-confirm, --force, --max-attempts <n>, --runner <auto|custom>, --commit-scope <scope>
+
+  - Output: compact header (project, tag, branch), current phase, subtask line, last test summary, next actions.
+
+- Resume: If interrupted, tm autopilot --resume picks up from last checkpoint in run state.
+
+### TUI with tmux (Linear Execution)
+
+- Left pane: Tag selector, task list (status/priority), start/expand shortcuts; “Start” triggers the next task or a selected task.
+
+- Right pane: Executor terminal (tmux split) that runs the coding agent (claude-code/codex). Autopilot can hand over to the right pane during green.
+
+- MCP integration: use MCP tools for task queries/updates and for shell/test invocations where available.
+
+## Prompts (Initial Direction)
+
+- Red phase prompt skeleton (tests):
+
+  - Use .claude/agents/surgical-test-generator.md as the system prompt to generate high-signal failing tests tailored to the project’s language and conventions. Keep scope minimal and deterministic; no code changes yet.
+
+- Green phase prompt skeleton (code):
+
+  - “Make the tests pass by changing the smallest amount of code, following project patterns. Only modify necessary files. Keep commits focused to this subtask.”
+
+## Configuration
+
+- .taskmaster/config.json additions
+
+  - autopilot: { enabled: true, requireCleanWorkingTree: true, commitTemplate: "{type}({scope}): {msg}", defaultCommitType: "feat" }
+
+  - test: { runner: "auto", coverageThresholds: { lines: 80, branches: 80, functions: 80, statements: 80 } }
+
+  - git: { branchPattern: "{tag}/task-{id}-{slug}", pr: { enabled: true, base: "default" } }
+
+## Risks and Mitigations
+
+- Model hallucination/large diffs: restrict prompt scope; enforce minimal changes; show diff previews (optional) before commit.
+
+- Flaky tests: allow retries, isolate targeted runs for speed, then full suite before commit.
+
+- Environment variability: detect runners/tools; provide fallbacks and actionable errors.
+
+- PR creation fails: still push and print manual commands; persist PR body to reuse.
+
+## Open Questions
+
+1) Slugging rules for branch names; any length limits or normalization beyond {slug} token sanitize?
+
+2) PR body standard sections beyond run report (e.g., checklist, coverage table)?
+
+3) Default executor prompt fine-tuning once codex/gemini integration is available.
+
+4) Where to store persistent TUI state (pane layout, last selection) in .taskmaster/state.json?
+
+## Branch Naming
+
+- Include both the tag and the task id in the branch name to make lineage explicit.
+
+- Default pattern: <tag>/task-<id>[-slug] (e.g., master/task-12, tag-analytics/task-4-user-auth).
+
+- Configurable via .taskmaster/config.json: git.branchPattern supports tokens {tag}, {id}, {slug}.
+
+## PR Base Branch
+
+- Use the repository’s default branch (detected via git) unless overridden.
+
+- Title format: Task #<id> [<tag>]: <title>.
+
+## RPG Mapping (Repository Planning Graph)
+
+Functional nodes (capabilities):
+
+- Autopilot Orchestration → drives TDD loop and lifecycle
+
+- Test Generation (Surgical) → produces failing tests from subtask context
+
+- Test Execution + Coverage → runs suite, enforces thresholds
+
+- Git/Branch/PR Management → safe operations and PR creation
+
+- TUI/Terminal Integration → interactive control and visibility via tmux
+
+- MCP Integration → structured task/status/context operations
+
+Structural nodes (code organization):
+
+- packages/tm-core:
+
+  - services/workflow-orchestrator.ts (new)
+
+  - services/test-runner-adapter.ts (new)
+
+  - services/git-adapter.ts (new)
+
+  - existing: task-service.ts, task-execution-service.ts, executors/*
+
+- apps/cli:
+
+  - src/commands/autopilot.command.ts (new)
+
+  - src/ui/tui/ (new tmux/TUI helpers)
+
+- scripts/modules:
+
+  - reuse utils/git-utils.js, task-manager/tag-management.js
+
+- .claude/agents/:
+
+  - surgical-test-generator.md
+
+Edges (data/control flow):
+
+- Autopilot → Test Generation → Test Execution → Git Commit → loop
+
+- Autopilot → Git Adapter (branch, tag, PR)
+
+- Autopilot → TUI (event stream) → tmux pane control
+
+- Autopilot → MCP tools for task/status updates
+
+- Test Execution → Coverage gate → Autopilot decision
+
+Topological traversal (implementation order):
+
+1) Git/Test adapters (foundations)
+
+2) Orchestrator skeleton + events
+
+3) CLI autopilot command and dry-run
+
+4) Surgical test-gen integration and execution gate
+
+5) PR creation, run reports, resumability
+
+## Phased Roadmap
+
+- Phase 0: Spike
+
+  - Implement CLI skeleton tm autopilot with dry-run showing planned steps from a real task + subtasks.
+
+  - Detect test runner (package.json) and git state; render a preflight report.
+
+- Phase 1: Core Rails
+
+  - Implement WorkflowOrchestrator in tm-core with event stream; add Git/Test adapters.
+
+  - Support subtask loop (red/green/commit) with framework-agnostic test generation and detected test command; commit gating on passing tests and coverage.
+
+  - Branch/tag mapping via existing tag-management APIs.
+
+  - Run report persisted under .taskmaster/reports/runs/.
+
+- Phase 2: PR + Resumability
+
+  - Add gh PR creation with well-formed body using the run report.
+
+  - Introduce resumable checkpoints and --resume flag.
+
+  - Add coverage enforcement and optional lint/format step.
+
+- Phase 3: Extensibility + Guardrails
+
+  - Add support for basic pytest/go test adapters.
+
+  - Add safeguards: diff preview mode, manual confirm gates, aggressive minimal-change prompts.
+
+  - Optional: small TUI panel and extension panel leveraging the same run state file.
+
+## References (Repo)
+
+- Test Workflow: .cursor/rules/test_workflow.mdc
+
+- Git Workflow: .cursor/rules/git_workflow.mdc
+
+- CLI: apps/cli/src/commands/start.command.ts, apps/cli/src/ui/components/*.ts
+
+- Core Services: packages/tm-core/src/services/task-service.ts, task-execution-service.ts
+
+- Executors: packages/tm-core/src/executors/*
+
+- Git Utilities: scripts/modules/utils/git-utils.js
+
+- Tag Management: scripts/modules/task-manager/tag-management.js
+
+ - Surgical Test Generator: .claude/agents/surgical-test-generator.md
+
--- a/.taskmaster/reports/task-complexity-report_autonomous-tdd-git-workflow.json
+++ b/.taskmaster/reports/task-complexity-report_autonomous-tdd-git-workflow.json
@@ -0,0 +1,173 @@
+{
+	"meta": {
+		"generatedAt": "2025-10-03T09:04:22.505Z",
+		"tasksAnalyzed": 20,
+		"totalTasks": 20,
+		"analysisCount": 20,
+		"thresholdScore": 5,
+		"projectName": "Taskmaster",
+		"usedResearch": false
+	},
+	"complexityAnalysis": [
+		{
+			"taskId": 11,
+			"taskTitle": "Create WorkflowOrchestrator Core Service",
+			"complexityScore": 8,
+			"recommendedSubtasks": 6,
+			"expansionPrompt": "Break down the WorkflowOrchestrator implementation into: 1) Core state machine with phase transitions and event emission, 2) Workflow state persistence and checkpoint system, 3) Resume/pause functionality with state restoration, 4) Integration points for adapters (test runner, git, executors), 5) Progress event system with EventEmitter, 6) Error handling and recovery mechanisms. Each subtask should focus on a specific aspect of the orchestrator.",
+			"reasoning": "High complexity due to state machine implementation, event-driven architecture, checkpoint persistence, and multiple integration points. Requires EventEmitter setup (not currently in codebase), state persistence to JSON files, and complex phase transition logic."
+		},
+		{
+			"taskId": 12,
+			"taskTitle": "Implement Test Runner Adapter Service",
+			"complexityScore": 7,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Divide test runner adapter into: 1) Test runner detection from package.json scripts, 2) Command execution wrapper with output capture, 3) Test output parser for various formats (Jest, Vitest, etc.), 4) Coverage metrics extraction and reporting, 5) Threshold enforcement and validation logic. Focus on framework-agnostic design with extensible parsers.",
+			"reasoning": "Requires parsing different test output formats, detecting test runners from package.json, implementing coverage threshold logic, and creating a framework-agnostic interface. Vitest is used in tm-core, need to support multiple runners."
+		},
+		{
+			"taskId": 13,
+			"taskTitle": "Build Git Operations Adapter",
+			"complexityScore": 6,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Split git adapter into: 1) Core git command wrapper using child_process, 2) Branch naming pattern system with template support, 3) Confirmation gates and default branch protection, 4) Push and commit operations with safety checks. Ensure proper error handling for git command failures.",
+			"reasoning": "Moderate complexity for git operations wrapper. No git library currently in use, will need child_process implementation. Includes branch naming patterns, confirmation prompts, and safety checks for default branch protection."
+		},
+		{
+			"taskId": 14,
+			"taskTitle": "Create Autopilot CLI Command",
+			"complexityScore": 5,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement autopilot command in three parts: 1) Command setup with Commander.js and flag parsing following existing patterns in apps/cli/src/commands/, 2) WorkflowOrchestrator initialization and event subscription, 3) Progress UI rendering using existing dashboard components and graceful shutdown handling. Follow patterns from list.command.ts and start.command.ts.",
+			"reasoning": "Straightforward CLI command implementation following existing patterns. Commander.js is already used, UI components exist in apps/cli/src/ui/. Main complexity is in orchestrator integration and event handling."
+		},
+		{
+			"taskId": 15,
+			"taskTitle": "Integrate Surgical Test Generator",
+			"complexityScore": 6,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Break down test generator integration: 1) Agent prompt loader from .claude/agents/surgical-test-generator.md, 2) Context formatter for subtask details and existing code, 3) Executor service integration using existing ExecutorFactory, 4) Test code parser and file writer with project convention detection. Leverage existing executor infrastructure.",
+			"reasoning": "Requires loading agent prompts, formatting context, integrating with existing ExecutorFactory and executor-service.ts, and parsing/writing generated test code. Builds on existing executor infrastructure."
+		},
+		{
+			"taskId": 16,
+			"taskTitle": "Implement Code Generation Executor",
+			"complexityScore": 5,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement code generation in phases: 1) Extend task-execution-service.ts with autopilot-specific prompt generation for making tests pass, 2) Integration with ExecutorFactory for multiple executor support (claude/codex/gemini), 3) Code change parser and conflict resolution handler. Build on existing executor patterns.",
+			"reasoning": "Extends existing task-execution-service.ts and uses ExecutorFactory. Main work is prompt generation for test-driven implementation and handling code application with conflict resolution."
+		},
+		{
+			"taskId": 17,
+			"taskTitle": "Add Branch and Tag Management Integration",
+			"complexityScore": 4,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Integrate tag management: 1) Branch-to-tag mapping registration in tag-management.js, 2) Active tag switching when creating branches, 3) Tag-filtered task loading and branch naming with tag prefixes. Use existing tag management infrastructure.",
+			"reasoning": "Relatively simple integration with existing tag-management.js. Mainly involves calling existing functions for tag registration, switching, and filtering. Infrastructure already exists."
+		},
+		{
+			"taskId": 18,
+			"taskTitle": "Build Run State Persistence System",
+			"complexityScore": 6,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Implement state persistence: 1) Checkpoint serialization to JSON after each phase, 2) JSONL logger for operation history, 3) State restoration logic for workflow resume, 4) Graceful handling of corrupted or partial state files. Use FileStorage from tm-core for consistency.",
+			"reasoning": "Requires implementing checkpoint system, JSONL logging, state restoration, and error recovery. Builds on existing FileStorage patterns in packages/tm-core/src/storage/."
+		},
+		{
+			"taskId": 19,
+			"taskTitle": "Implement Preflight Validation Service",
+			"complexityScore": 5,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Create preflight validation: 1) Environment checks for git state, test runner, and CLI tools availability, 2) Task validation with auto-expansion trigger when no subtasks exist, 3) Structured validation report with errors/warnings and --force override support. Integrate with existing services.",
+			"reasoning": "Moderate complexity for various validation checks. Integrates with existing services for task expansion and test runner detection. Main work is aggregating checks and reporting."
+		},
+		{
+			"taskId": 20,
+			"taskTitle": "Create PR Generation Service",
+			"complexityScore": 4,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement PR creation: 1) Extend git-adapter.ts with gh CLI wrapper for PR operations, 2) PR body formatter using run reports and task completion data, 3) Fallback instructions when gh is unavailable and PR URL persistence. Build on git adapter foundation.",
+			"reasoning": "Straightforward gh CLI integration extending git-adapter. Main work is formatting PR body from run reports. Relatively simple with clear requirements."
+		},
+		{
+			"taskId": 21,
+			"taskTitle": "Add Subtask Selection Logic",
+			"complexityScore": 5,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement subtask selection: 1) Integration with existing find-next-task.js for dependency-aware selection, 2) Status filtering and update logic for in-progress/done transitions, 3) Blocked subtask handling and skip logic. Leverage existing task service methods.",
+			"reasoning": "Builds on existing find-next-task.js logic. Main complexity is in dependency resolution and status management. Most infrastructure exists in task-service.ts."
+		},
+		{
+			"taskId": 22,
+			"taskTitle": "Implement Test-Driven Commit Gating",
+			"complexityScore": 5,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Create commit gating: 1) Test result and coverage validation against thresholds, 2) Retry logic with exponential backoff for flaky tests, 3) Commit creation only when tests pass with --force-commit override. Integrate with test runner adapter.",
+			"reasoning": "Moderate complexity for test validation, retry logic, and threshold enforcement. Builds on test runner adapter output. Main work is retry mechanism and gating logic."
+		},
+		{
+			"taskId": 23,
+			"taskTitle": "Build Progress Event System",
+			"complexityScore": 4,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement event system: 1) EventEmitter setup with typed events for workflow phases, 2) Event aggregator for statistics collection, 3) Event filtering and buffering mechanisms. Create clean event API for UI consumption.",
+			"reasoning": "EventEmitter not currently in codebase but straightforward to add. Main work is defining event types, implementing aggregation, and creating clean API for consumers."
+		},
+		{
+			"taskId": 24,
+			"taskTitle": "Create Autopilot Configuration Schema",
+			"complexityScore": 3,
+			"recommendedSubtasks": 2,
+			"expansionPrompt": "Add autopilot config: 1) Extend existing config.json schema with autopilot section using Zod validation, 2) Config migration logic and environment variable overrides. Follow existing config patterns in config-manager.ts.",
+			"reasoning": "Simple schema extension to existing config.json. Config infrastructure exists in packages/tm-core/src/config/. Main work is schema definition and migration."
+		},
+		{
+			"taskId": 25,
+			"taskTitle": "Implement Dry Run Mode",
+			"complexityScore": 3,
+			"recommendedSubtasks": 2,
+			"expansionPrompt": "Add dry-run support: 1) Flag propagation through all adapter methods with simulation output, 2) Clear formatting to distinguish simulated vs actual operations. Ensure validation phases still execute normally.",
+			"reasoning": "Simple flag propagation and output formatting. Most complexity handled by individual adapters. Main work is consistent implementation across all operations."
+		},
+		{
+			"taskId": 26,
+			"taskTitle": "Add tmux Integration Support",
+			"complexityScore": 4,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Implement tmux support: 1) Tmux availability detection and pane management commands, 2) Split-window layout with command execution in executor pane, 3) Graceful fallback when tmux unavailable. Handle cleanup and debugging scenarios.",
+			"reasoning": "Moderate complexity for tmux integration. Requires command wrapping, pane management, and fallback handling. Optional enhancement with clear boundaries."
+		},
+		{
+			"taskId": 27,
+			"taskTitle": "Build Run Report Generator",
+			"complexityScore": 5,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Already has 5 subtasks defined. Focus on implementing each component: report generator service core, markdown formatter, JSONL logger, metrics collector, and archival system. Each subtask is well-scoped.",
+			"reasoning": "Already expanded with 5 subtasks. Moderate complexity for report generation, formatting, and archival. Clear separation of concerns across subtasks."
+		},
+		{
+			"taskId": 28,
+			"taskTitle": "Add MCP Tools Integration",
+			"complexityScore": 3,
+			"recommendedSubtasks": 2,
+			"expansionPrompt": "Integrate MCP tools: 1) MCP tool availability detection and wrapper functions, 2) Fallback to direct service calls when MCP unavailable. Use existing MCP infrastructure in mcp-server/src/.",
+			"reasoning": "Simple integration with existing MCP infrastructure. Main work is detection and fallback logic. MCP server already implemented with all needed tools."
+		},
+		{
+			"taskId": 29,
+			"taskTitle": "Implement Retry and Backoff Logic",
+			"complexityScore": 4,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Add retry mechanisms: 1) Exponential backoff calculator with configurable limits, 2) Retry wrapper for test execution, executor calls, and git operations, 3) Circuit breaker pattern for repeated failures. Track attempts in run state.",
+			"reasoning": "Moderate complexity for retry patterns, backoff calculation, and circuit breaker. Generic retry wrapper can be reused across different operations."
+		},
+		{
+			"taskId": 30,
+			"taskTitle": "Create End-to-End Integration Tests",
+			"complexityScore": 7,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Build comprehensive test suite: 1) Test fixtures with mock git repo and task data setup, 2) Happy path scenario with all tests passing, 3) Retry and failure scenarios with flaky tests, 4) Resume from interruption testing, 5) Flag combination testing and artifact verification. Use Vitest for consistency.",
+			"reasoning": "High complexity for comprehensive integration testing. Requires extensive mocking, multiple scenarios, and artifact verification. Critical for validating entire workflow."
+		}
+	]
+}
--- a/.taskmaster/state.json
+++ b/.taskmaster/state.json
@@ -1,6 +1,6 @@
 {
-	"currentTag": "master",
-	"lastSwitched": "2025-09-12T22:25:27.535Z",
+	"currentTag": "autonomous-tdd-git-workflow",
+	"lastSwitched": "2025-09-30T13:32:48.187Z",
 	"branchTagMapping": {
 		"v017-adds": "v017-adds",
 		"next": "next"
--- a/.taskmaster/tasks/tasks.json
+++ b/.taskmaster/tasks/tasks.json
@@ -7901,5 +7901,344 @@
      "updated": "2025-09-12T04:02:07.346Z",
      "description": "Tasks for tm-start context"
    }
+  },
+  "autonomous-tdd-git-workflow": {
+    "tasks": [
+      {
+        "id": 11,
+        "title": "Create WorkflowOrchestrator Core Service",
+        "description": "Implement the core orchestration service that drives the autonomous TDD workflow with state machine phases",
+        "details": "Create packages/tm-core/src/services/workflow-orchestrator.ts implementing a state machine with phases: Preflight → Branch/Tag → SubtaskIter (Red/Green/Commit) → Finalize → PR. Use EventEmitter for progress events. Include methods: startWorkflow(taskId, options), resumeWorkflow(runId), pauseWorkflow(), getWorkflowState(). Store state in memory with persistence to .taskmaster/reports/runs/<run-id>/state.json. Implement checkpoint saving after each phase transition.",
+        "testStrategy": "Unit tests for state transitions, event emission, checkpoint persistence. Integration tests for full workflow lifecycle with mock adapters. Test resume capability from various checkpoints.",
+        "priority": "high",
+        "dependencies": [],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 12,
+        "title": "Implement Test Runner Adapter Service",
+        "description": "Create framework-agnostic test runner adapter that detects and executes project test commands",
+        "details": "Create packages/tm-core/src/services/test-runner-adapter.ts with methods: detectRunner() (checks package.json for test scripts), runTargeted(files/pattern), runAll(), getCoverageReport(), enforceCoverageThresholds(thresholds). Support npm/pnpm/yarn test detection. Parse test output for pass/fail counts and coverage metrics. Return structured TestResult interface with failures, duration, coverage data. Default 80% coverage thresholds.",
+        "testStrategy": "Mock different package.json configurations for runner detection. Test parsing of various test output formats. Verify coverage threshold enforcement logic. Integration test with actual npm test execution.",
+        "priority": "high",
+        "dependencies": [],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 13,
+        "title": "Build Git Operations Adapter",
+        "description": "Encapsulate all git operations with confirmation gates and branch naming patterns",
+        "details": "Create packages/tm-core/src/services/git-adapter.ts wrapping git commands: createBranch(pattern, tag, taskId), checkout(branch), add(files), commit(message, scope), push(options), getCurrentBranch(), getDefaultBranch(). Implement branch naming with configurable pattern support ({tag}/task-{id}[-slug]). Add confirmation prompts for destructive operations unless --no-confirm. Never allow commits to default branch. Use simple-git library or child_process for git commands.",
+        "testStrategy": "Mock git commands and verify correct invocations. Test branch naming pattern generation. Verify default branch protection. Test confirmation gate behavior with different flags.",
+        "priority": "high",
+        "dependencies": [],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 14,
+        "title": "Create Autopilot CLI Command",
+        "description": "Implement the main autopilot command with argument parsing and orchestrator invocation",
+        "details": "Create apps/cli/src/commands/autopilot.command.ts using Commander.js. Accept taskId argument and flags: --dry-run, --no-push, --no-pr, --no-confirm, --force, --max-attempts <n>, --resume. Initialize WorkflowOrchestrator with options. Subscribe to orchestrator events and render progress using existing UI components from apps/cli/src/ui/components/. Handle interrupt signals gracefully for resumability.",
+        "testStrategy": "Test command parsing with various flag combinations. Mock orchestrator and verify correct initialization. Test event subscription and UI rendering. Verify graceful shutdown on SIGINT.",
+        "priority": "high",
+        "dependencies": [
+          11
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 15,
+        "title": "Integrate Surgical Test Generator",
+        "description": "Connect the existing surgical test generator agent to the autopilot workflow for red phase",
+        "details": "Create test generation prompt adapter in packages/tm-core/src/services/test-generator.ts. Load .claude/agents/surgical-test-generator.md as system prompt. Format subtask context into user prompt with file paths, existing code, and requirements. Use existing executor service to invoke claude with the prompt. Parse generated test code and write to appropriate test files following project conventions. Validate tests compile/parse before proceeding.",
+        "testStrategy": "Mock executor responses with sample test generation. Verify prompt formatting includes all context. Test file writing to correct locations. Validate test syntax checking logic.",
+        "priority": "high",
+        "dependencies": [
+          11,
+          12
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 16,
+        "title": "Implement Code Generation Executor",
+        "description": "Create green phase code implementation using focused prompts to make tests pass",
+        "details": "Extend packages/tm-core/src/services/task-execution-service.ts with autopilot-specific prompt generation. Create minimal implementation prompt: 'Make these failing tests pass with the smallest code changes following project patterns. Only modify necessary files.' Include test failures, subtask context, and existing code. Use ExecutorFactory to invoke selected executor (claude/codex/gemini). Parse and apply code changes, handling conflicts gracefully.",
+        "testStrategy": "Test prompt generation with various failure scenarios. Mock executor responses and verify code application. Test conflict resolution strategies. Verify minimal change enforcement.",
+        "priority": "high",
+        "dependencies": [
+          11,
+          15
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 17,
+        "title": "Add Branch and Tag Management Integration",
+        "description": "Connect autopilot to existing tag management for branch-tag mapping",
+        "details": "Integrate with scripts/modules/task-manager/tag-management.js for branch→tag mapping. When creating branch, register mapping in tag system. Explicitly switch active tag to match branch tag. Load task data filtered by active tag. Ensure branch name includes both tag and task ID per spec. Handle tag switching when resuming workflows. Persist tag-branch associations.",
+        "testStrategy": "Test branch-tag registration and retrieval. Verify active tag switching. Test filtered task loading by tag. Validate branch naming includes tag and task ID.",
+        "priority": "medium",
+        "dependencies": [
+          13
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 18,
+        "title": "Build Run State Persistence System",
+        "description": "Implement checkpoint saving and workflow resumability with detailed logging",
+        "details": "Create run state management in WorkflowOrchestrator. Save checkpoints to .taskmaster/reports/runs/<timestamp>/state.json after each phase. Include: current phase, subtask progress, test results, git state, timestamps. Implement JSONL logging for all operations to .taskmaster/reports/runs/<timestamp>/log.jsonl. Add resume() method to restore from checkpoint. Handle partial state recovery gracefully.",
+        "testStrategy": "Test checkpoint creation at each phase. Verify JSONL log format and completeness. Test resume from various interruption points. Validate state recovery with corrupted files.",
+        "priority": "medium",
+        "dependencies": [
+          11
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 19,
+        "title": "Implement Preflight Validation Service",
+        "description": "Create comprehensive pre-execution validation checking git state, tools, and configuration",
+        "details": "Add preflight checks in WorkflowOrchestrator: verify clean working tree (configurable), detect test runner availability, validate git/gh CLI installation, check for required API keys/executors, verify task has subtasks (auto-expand if not), ensure not on default branch. Return structured validation report with errors/warnings. Allow --force to bypass non-critical checks.",
+        "testStrategy": "Mock various environment states for validation. Test clean/dirty working tree detection. Verify tool availability checks. Test auto-expansion trigger when no subtasks.",
+        "priority": "medium",
+        "dependencies": [
+          11,
+          12,
+          13
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 20,
+        "title": "Create PR Generation Service",
+        "description": "Implement GitHub PR creation with formatted body from run reports",
+        "details": "Extend git-adapter.ts with PR operations using gh CLI. Generate PR title: 'Task #<id> [<tag>]: <title>'. Format PR body with: summary of changes, subtask completion list, test coverage report, run statistics. Include link to full run report. Handle gh unavailability with fallback instructions. Support --no-pr flag to skip. Store PR URL in run state.",
+        "testStrategy": "Mock gh CLI responses for PR creation. Test PR title and body formatting. Verify fallback behavior without gh. Test PR URL persistence in run state.",
+        "priority": "medium",
+        "dependencies": [
+          13,
+          18
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 21,
+        "title": "Add Subtask Selection Logic",
+        "description": "Implement intelligent subtask selection respecting dependencies and status",
+        "details": "Enhance WorkflowOrchestrator with subtask selection using TaskService.getNextTask(). Filter subtasks by: pending/in-progress status, satisfied dependencies, task ownership. Process in dependency order. Skip already-done subtasks. Handle blocked subtasks gracefully. Update subtask status to in-progress when starting, done when tests pass and committed.",
+        "testStrategy": "Test selection with various dependency graphs. Verify status filtering logic. Test dependency satisfaction checking. Validate status transitions during workflow.",
+        "priority": "high",
+        "dependencies": [
+          11
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 22,
+        "title": "Implement Test-Driven Commit Gating",
+        "description": "Enforce commit-only-on-green policy with configurable coverage thresholds",
+        "details": "Add commit gating logic in WorkflowOrchestrator. After code generation, run tests and check: all tests pass, coverage meets thresholds (default 80% for lines/branches/functions/statements). Only commit if both conditions met. Support --force-commit override. Implement retry logic with backoff for flaky tests. Log all attempts and results.",
+        "testStrategy": "Test gating with various test results and coverage levels. Verify threshold enforcement. Test override flag behavior. Validate retry logic with intermittent failures.",
+        "priority": "high",
+        "dependencies": [
+          11,
+          12,
+          16
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 23,
+        "title": "Build Progress Event System",
+        "description": "Create event-driven progress reporting for CLI rendering and future integrations",
+        "details": "Implement EventEmitter-based progress system in WorkflowOrchestrator. Emit events: workflow:start, phase:change, subtask:start/complete, test:run/pass/fail, commit:created, pr:created, workflow:complete/error. Include detailed payloads with timestamps, durations, results. Create event aggregator for summary statistics. Support event filtering and buffering.",
+        "testStrategy": "Test event emission at each workflow step. Verify event payload completeness. Test event aggregation logic. Validate buffering and filtering mechanisms.",
+        "priority": "medium",
+        "dependencies": [
+          11
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 24,
+        "title": "Create Autopilot Configuration Schema",
+        "description": "Extend taskmaster config with autopilot-specific settings and validation",
+        "details": "Add autopilot section to .taskmaster/config.json schema: autopilot: { enabled, requireCleanWorkingTree, commitTemplate, defaultCommitType }, test: { runner, coverageThresholds }, git: { branchPattern, pr: { enabled, base } }. Create validation with Zod schema. Add config migration for existing projects. Provide sensible defaults. Support environment variable overrides.",
+        "testStrategy": "Test schema validation with various configurations. Verify migration from old configs. Test default value application. Validate environment override behavior.",
+        "priority": "medium",
+        "dependencies": [],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 25,
+        "title": "Implement Dry Run Mode",
+        "description": "Add simulation mode showing planned operations without execution",
+        "details": "Add --dry-run support throughout workflow. In dry-run: show planned git operations, display test commands without running, preview commit messages, show PR body without creating. Format output clearly indicating simulated vs actual. Still perform validation and planning phases. Useful for debugging and verification.",
+        "testStrategy": "Test dry-run flag propagation to all adapters. Verify no side effects occur. Test output formatting for clarity. Validate planning phases still execute.",
+        "priority": "low",
+        "dependencies": [
+          14,
+          19
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 26,
+        "title": "Add tmux Integration Support",
+        "description": "Create tmux pane management for split-view executor terminal",
+        "details": "Create apps/cli/src/ui/tui/tmux-manager.ts for pane control. Detect tmux availability. Support: split-window for executor pane, send-keys for command execution, capture-pane for output, kill-pane for cleanup. Left pane shows autopilot progress, right pane runs executor. Handle non-tmux fallback gracefully. Preserve pane on interrupt for debugging.",
+        "testStrategy": "Mock tmux commands and verify invocations. Test pane creation and command sending. Verify fallback behavior without tmux. Test cleanup on exit.",
+        "priority": "low",
+        "dependencies": [
+          14
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 27,
+        "title": "Build Run Report Generator",
+        "description": "Create comprehensive markdown and JSON reports for completed workflows",
+        "details": "Generate reports in .taskmaster/reports/runs/<run-id>/: summary.md with task details, subtask results, test coverage, commit list, duration stats. Full log.jsonl with all operations. coverage.json with detailed metrics. state.json for resumability. Include charts/tables for readability. Generate PR-ready summary section. Archive old runs automatically.",
+        "testStrategy": "Test report generation with various workflow outcomes. Verify markdown formatting and readability. Test JSON structure validity. Validate archival logic for old runs.",
+        "priority": "medium",
+        "dependencies": [
+          18
+        ],
+        "status": "pending",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Create Report Generator Service Core",
+            "description": "Implement the core WorkflowReportGenerator service that orchestrates report generation for completed workflow runs",
+            "dependencies": [],
+            "details": "Create packages/tm-core/src/services/report-generator.service.ts with WorkflowReportGenerator class. Implement methods: generateRunReport(runId, workflowState), generateSummaryMarkdown(state), generateJSONLogs(operations), generateCoverageMetrics(testResults), archiveOldRuns(threshold). Use EventEmitter for progress updates. Store reports in .taskmaster/reports/runs/<run-id>/ directory structure. Integrate with existing ConfigManager for paths and FileStorage for persistence.",
+            "status": "pending",
+            "testStrategy": "Unit tests for report generation methods, markdown formatting validation, JSON structure tests, archive logic tests"
+          },
+          {
+            "id": 2,
+            "title": "Build Markdown Summary Generator",
+            "description": "Create comprehensive markdown report generation with tables, charts, and PR-ready sections",
+            "dependencies": [
+              "27.1"
+            ],
+            "details": "Implement markdown generation in packages/tm-core/src/services/report-generators/markdown-generator.ts. Create formatted sections: Executive Summary (task completion stats, duration, test coverage), Task Details Table (ID, title, status, duration), Subtask Results (grouped by parent, with test outcomes), Test Coverage Charts (using ASCII art or markdown badges), Commit History (list with links), Performance Metrics (timings per phase). Include generatePRBody() method for GitHub-ready summaries. Use markdown tables and proper formatting for readability.",
+            "status": "pending",
+            "testStrategy": "Test markdown output formatting, table generation, special character escaping, PR body validation"
+          },
+          {
+            "id": 3,
+            "title": "Implement JSONL Operation Logger",
+            "description": "Build detailed operation logging system that captures all workflow operations in JSONL format",
+            "dependencies": [
+              "27.1"
+            ],
+            "details": "Create packages/tm-core/src/services/report-generators/jsonl-logger.ts with JSONLOperationLogger class. Implement streaming JSONL writer for log.jsonl file. Capture operations: task starts/completions, test executions, git operations, phase transitions, errors/retries. Each line contains: timestamp, operation type, phase, task/subtask ID, duration, result, metadata. Implement buffered writing for performance. Include log rotation when file exceeds size limit.",
+            "status": "pending",
+            "testStrategy": "Test JSONL format validity, streaming performance, log rotation, operation capture completeness"
+          },
+          {
+            "id": 4,
+            "title": "Create Coverage and Metrics Collectors",
+            "description": "Build test coverage collection and performance metrics aggregation components",
+            "dependencies": [
+              "27.1"
+            ],
+            "details": "Create packages/tm-core/src/services/report-generators/metrics-collector.ts. Implement CoverageCollector to parse test runner outputs (Jest, Vitest, etc.), aggregate line/branch/function coverage, generate coverage.json with detailed metrics per file/module. Implement PerformanceCollector to track phase durations, operation timings, resource usage. Create state.json generator for workflow resumability with checkpoints, completed operations, pending tasks.",
+            "status": "pending",
+            "testStrategy": "Mock various test runner outputs, verify coverage parsing accuracy, test metric aggregation logic"
+          },
+          {
+            "id": 5,
+            "title": "Build Report Archival and Management System",
+            "description": "Implement automatic archival of old run reports and report lifecycle management",
+            "dependencies": [
+              "27.1",
+              "27.2",
+              "27.3",
+              "27.4"
+            ],
+            "details": "Create packages/tm-core/src/services/report-generators/archive-manager.ts. Implement automatic archival: move reports older than 30 days to .taskmaster/reports/archived/, compress old reports to .tar.gz, maintain index of archived reports. Add report management CLI commands in apps/cli/src/commands/reports.command.ts: list-reports, view-report <run-id>, archive-reports, clean-reports. Integrate with WorkflowOrchestrator to trigger report generation on workflow completion.",
+            "status": "pending",
+            "testStrategy": "Test archival thresholds, compression functionality, index maintenance, CLI command integration"
+          }
+        ]
+      },
+      {
+        "id": 28,
+        "title": "Add MCP Tools Integration",
+        "description": "Integrate with MCP server for structured task operations during autopilot",
+        "details": "Use MCP tools where available: get_tasks for task loading, set_task_status for status updates, update_subtask for progress notes, expand_task if subtasks needed. Fallback to direct service calls if MCP unavailable. Improve context passing to executors via MCP. Support MCP-based shell/test execution where available.",
+        "testStrategy": "Mock MCP tool availability and responses. Test fallback to direct service calls. Verify status updates through MCP. Test context enhancement via MCP.",
+        "priority": "low",
+        "dependencies": [
+          14,
+          21
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 29,
+        "title": "Implement Retry and Backoff Logic",
+        "description": "Add intelligent retry mechanisms for flaky tests and transient failures",
+        "details": "Implement exponential backoff in WorkflowOrchestrator for: test execution (max 3 retries), executor calls (max 2 retries), git operations (max 2 retries). Detect flaky test patterns. Add --max-attempts flag (default 3). Track retry attempts in run state. Implement circuit breaker for repeated failures. Provide clear failure reasons.",
+        "testStrategy": "Test retry logic with simulated failures. Verify exponential backoff timing. Test max attempts enforcement. Validate circuit breaker activation.",
+        "priority": "medium",
+        "dependencies": [
+          11,
+          22
+        ],
+        "status": "pending",
+        "subtasks": []
+      },
+      {
+        "id": 30,
+        "title": "Create End-to-End Integration Tests",
+        "description": "Build comprehensive test suite validating full autopilot workflow",
+        "details": "Create test/integration/autopilot.test.ts with scenarios: happy path (all tests pass first try), retry scenarios (flaky tests), resume from interruption, various flag combinations, multi-subtask workflows. Use test fixtures with mock tasks/subtasks. Verify all outputs: commits, branches, reports, PR body. Test with different executors and test runners.",
+        "testStrategy": "Integration tests with mock git repo and task data. Test complete workflow execution. Verify all artifacts created correctly. Validate resume functionality. Performance benchmarks for workflow duration.",
+        "priority": "low",
+        "dependencies": [
+          11,
+          12,
+          13,
+          14,
+          15,
+          16,
+          17,
+          18,
+          19,
+          20,
+          21,
+          22
+        ],
+        "status": "pending",
+        "subtasks": []
+      }
+    ],
+    "metadata": {
+      "created": "2025-09-30T13:32:28.649Z",
+      "updated": "2025-09-30T15:13:53.999Z",
+      "description": "Tasks for autonomous-tdd-git-workflow context"
+    }
  }
 }