13 KiB
13 KiB
PRD: Autonomous TDD + Git Workflow (On Rails)
Summary
- Put the existing git and test workflows on rails: a repeatable, automated process that can run autonomously, with guardrails and a compact TUI for visibility.
- Flow: for a selected task, create a branch named with the tag + task id → generate tests for the first subtask (red) using the Surgical Test Generator → implement code (green) → verify tests → commit → repeat per subtask → final verify → push → open PR against the default branch.
- Build on existing rules:
.cursor/rules/git_workflow.mdc,.cursor/rules/test_workflow.mdc,.claude/agents/surgical-test-generator.md, and existing CLI/core services.
Goals
- Deterministic, resumable automation to execute the TDD loop per subtask with minimal human intervention.
- Strong guardrails: never commit to the default branch; only commit when tests pass; enforce status transitions; persist logs/state for debuggability.
- Visibility: a compact terminal UI (like lazygit) to pick tag, view tasks, and start work; right-side pane opens an executor terminal (via tmux) for agent coding.
- Extensible: framework-agnostic test generation via the Surgical Test Generator; detect and use the repo’s test command for execution with coverage thresholds.
Non‑Goals (initial)
- Full multi-language runner parity beyond detection and executing the project’s test command.
- Complex GUI; start with CLI/TUI + tmux pane. IDE/extension can hook into the same state later.
- Rich executor selection UX (codex/gemini/claude) — we’ll prompt per run; defaults can come later.
Success Criteria
- One command can autonomously complete a task’s subtasks via TDD and open a PR when done.
- All commits made on a branch that includes the tag and task id (see Branch Naming); no commits to the default branch directly.
- Every subtask iteration: failing tests added first (red), then code added to pass them (green), commit only after green.
- End-to-end logs + artifacts stored in
.taskmaster/reports/runs/<timestamp-or-id>/.
User Stories
- As a developer, I can run
tm autopilot <taskId>and watch a structured, safe workflow execute. - As a reviewer, I can inspect commits per subtask, and a PR summarizing the work when the task completes.
- As an operator, I can see current step, active subtask, tests status, and logs in a compact CLI view and read a final run report.
High‑Level Workflow
-
Pre‑flight
- Verify clean working tree or confirm staging/commit policy (configurable).
- Detect repo type and the project’s test command (e.g.,
npm test,pnpm test,pytest,go test). - Validate tools:
git,gh(optional for PR),node/npm, and (if used)claudeCLI. - Load TaskMaster state and selected task; if no subtasks exist, automatically run “expand” before working.
-
Branch & Tag Setup
- Checkout default branch and update (optional), then create a branch using Branch Naming (below).
- Map branch ↔ tag via existing tag management; explicitly set active tag to the branch’s tag.
-
Subtask Loop (for each pending/in-progress subtask in dependency order)
- Select next eligible subtask using
tm-coreTaskServicegetNextTask()and subtask eligibility logic. - Red: generate or update failing tests for the subtask
- Use the Surgical Test Generator system prompt (
.claude/agents/surgical-test-generator.md) to produce high-signal tests following project conventions. - Run tests to confirm red; record results. If not red (already passing), skip to next subtask or escalate.
- Use the Surgical Test Generator system prompt (
- Green: implement code to pass tests
- Use executor to implement changes (initial:
claudeCLI prompt with focused context). - Re-run tests until green or timeout/backoff policy triggers.
- Use executor to implement changes (initial:
- Commit: when green
- Commit tests + code with conventional commit message. Optionally update subtask status to
done. - Persist run step metadata/logs.
- Commit tests + code with conventional commit message. Optionally update subtask status to
- Select next eligible subtask using
-
Finalization
- Run full test suite and coverage (if configured); optionally lint/format.
- Commit any final adjustments.
- Push branch (ask user to confirm); create PR (via
gh pr create) targeting the default branch. Title format:Task #<id> [<tag>]: <title>.
-
Post‑Run
- Update task status if desired (e.g.,
review). - Persist run report (JSON + markdown summary) to
.taskmaster/reports/runs/<run-id>/.
- Update task status if desired (e.g.,
Guardrails
- Never commit to the default branch.
- Commit only if all tests (targeted and suite) pass; allow override flags.
- Enforce 80% coverage thresholds (lines/branches/functions/statements) by default; configurable.
- Timebox/model ops and retries; if not green within N attempts, pause with actionable state for resume.
- Always log actions, commands, and outcomes; include dry-run mode.
- Ask before branch creation, pushing, and opening a PR unless
--no-confirmis set.
Integration Points (Current Repo)
- CLI:
apps/cliprovides command structure and UI components.- New command:
tm autopilot(alias:task-master autopilot). - Reuse UI components under
apps/cli/src/ui/components/for headers/task details/next-task.
- New command:
- Core services:
packages/tm-coreTaskServicefor selection, status, tags.TaskExecutionServicefor prompt formatting and executor prep.- Executors:
claudeexecutor andExecutorFactoryto run external tools. - Proposed new:
WorkflowOrchestratorto drive the autonomous loop and emit progress events.
- Tag/Git utilities:
scripts/modules/utils/git-utils.jsandscripts/modules/task-manager/tag-management.jsfor branch→tag mapping and explicit tag switching. - Rules:
.cursor/rules/git_workflow.mdcand.cursor/rules/test_workflow.mdcto steer behavior and ensure consistency. - Test generation prompt:
.claude/agents/surgical-test-generator.md.
Proposed Components
-
Orchestrator (tm-core):
WorkflowOrchestrator(new)- State machine driving phases: Preflight → Branch/Tag → SubtaskIter (Red/Green/Commit) → Finalize → PR.
- Exposes an evented API (progress events) that the CLI can render.
- Stores run state artifacts.
-
Test Runner Adapter
- Detects and runs tests via the project’s test command (e.g.,
npm test), with targeted runs where feasible. - API: runTargeted(files/pattern), runAll(), report summary (failures, duration, coverage), enforce 80% threshold by default.
- Detects and runs tests via the project’s test command (e.g.,
-
Git/PR Adapter
- Encapsulates
gitops: branch create/checkout, add/commit, push. - Optional
ghintegration to open PR; fallback to instructions ifghunavailable. - Confirmation gates for branch creation and pushes.
- Adds commit footers and a unified trailer (
Refs: TM-<tag>-<id>[.<sub>]) for robust mapping to tasks/subtasks.
- Encapsulates
-
Prompt/Exec Adapter
- Uses existing executor service to call the selected coding assistant (initially
claude) with tight prompts: task/subtask context, surgical tests first, then minimal code to green.
- Uses existing executor service to call the selected coding assistant (initially
-
Run State + Reporting
- JSONL log of steps, timestamps, commands, test results.
- Markdown summary for PR description and post-run artifact.
CLI UX (MVP)
- Command:
tm autopilot [taskId]- Flags:
--dry-run,--no-push,--no-pr,--no-confirm,--force,--max-attempts <n>,--runner <auto|custom>,--commit-scope <scope> - Output: compact header (project, tag, branch), current phase, subtask line, last test summary, next actions.
- Flags:
- Resume: If interrupted,
tm autopilot --resumepicks up from last checkpoint in run state.
TUI with tmux (Linear Execution)
- Left pane: Tag selector, task list (status/priority), start/expand shortcuts; “Start” triggers the next task or a selected task.
- Right pane: Executor terminal (tmux split) that runs the coding agent (claude-code/codex). Autopilot can hand over to the right pane during green.
- MCP integration: use MCP tools for task queries/updates and for shell/test invocations where available.
Prompts (Initial Direction)
- Red phase prompt skeleton (tests):
- Use
.claude/agents/surgical-test-generator.mdas the system prompt to generate high-signal failing tests tailored to the project’s language and conventions. Keep scope minimal and deterministic; no code changes yet.
- Use
- Green phase prompt skeleton (code):
- “Make the tests pass by changing the smallest amount of code, following project patterns. Only modify necessary files. Keep commits focused to this subtask.”
Configuration
.taskmaster/config.jsonadditionsautopilot:{ enabled: true, requireCleanWorkingTree: true, commitTemplate: "{type}({scope}): {msg}", defaultCommitType: "feat" }test:{ runner: "auto", coverageThresholds: { lines: 80, branches: 80, functions: 80, statements: 80 } }git:{ branchPattern: "{tag}/task-{id}-{slug}", pr: { enabled: true, base: "default" }, commitFooters: { task: "Task-Id", subtask: "Subtask-Id", tag: "Tag" }, commitTrailer: "Refs: TM-{tag}-{id}{.sub?}" }
Decisions
- Use conventional commits plus footers and a unified trailer
Refs: TM-<tag>-<id>[.<sub>]for all commits; Git/PR adapter is responsible for injecting these.
Risks and Mitigations
- Model hallucination/large diffs: restrict prompt scope; enforce minimal changes; show diff previews (optional) before commit.
- Flaky tests: allow retries, isolate targeted runs for speed, then full suite before commit.
- Environment variability: detect runners/tools; provide fallbacks and actionable errors.
- PR creation fails: still push and print manual commands; persist PR body to reuse.
Open Questions
- Slugging rules for branch names; any length limits or normalization beyond
{slug}token sanitize? - PR body standard sections beyond run report (e.g., checklist, coverage table)?
- Default executor prompt fine-tuning once codex/gemini integration is available.
- Where to store persistent TUI state (pane layout, last selection) in
.taskmaster/state.json?
Branch Naming
- Include both the tag and the task id in the branch name to make lineage explicit.
- Default pattern:
<tag>/task-<id>[-slug](e.g.,master/task-12,tag-analytics/task-4-user-auth). - Configurable via
.taskmaster/config.json:git.branchPatternsupports tokens{tag},{id},{slug}.
PR Base Branch
- Use the repository’s default branch (detected via git) unless overridden.
- Title format:
Task #<id> [<tag>]: <title>.
RPG Mapping (Repository Planning Graph)
Functional nodes (capabilities):
- Autopilot Orchestration → drives TDD loop and lifecycle
- Test Generation (Surgical) → produces failing tests from subtask context
- Test Execution + Coverage → runs suite, enforces thresholds
- Git/Branch/PR Management → safe operations and PR creation
- TUI/Terminal Integration → interactive control and visibility via tmux
- MCP Integration → structured task/status/context operations
Structural nodes (code organization):
packages/tm-core:services/workflow-orchestrator.ts(new)services/test-runner-adapter.ts(new)services/git-adapter.ts(new)- existing:
task-service.ts,task-execution-service.ts,executors/*
apps/cli:src/commands/autopilot.command.ts(new)src/ui/tui/(new tmux/TUI helpers)
scripts/modules:- reuse
utils/git-utils.js,task-manager/tag-management.js
- reuse
.claude/agents/:surgical-test-generator.md
Edges (data/control flow):
- Autopilot → Test Generation → Test Execution → Git Commit → loop
- Autopilot → Git Adapter (branch, tag, PR)
- Autopilot → TUI (event stream) → tmux pane control
- Autopilot → MCP tools for task/status updates
- Test Execution → Coverage gate → Autopilot decision
Topological traversal (implementation order):
- Git/Test adapters (foundations)
- Orchestrator skeleton + events
- CLI
autopilotcommand and dry-run - Surgical test-gen integration and execution gate
- PR creation, run reports, resumability
Phased Roadmap
-
Phase 0: Spike
- Implement CLI skeleton
tm autopilotwith dry-run showing planned steps from a real task + subtasks. - Detect test runner (package.json) and git state; render a preflight report.
- Implement CLI skeleton
-
Phase 1: Core Rails
- Implement
WorkflowOrchestratorintm-corewith event stream; add Git/Test adapters. - Support subtask loop (red/green/commit) with framework-agnostic test generation and detected test command; commit gating on passing tests and coverage.
- Branch/tag mapping via existing tag-management APIs.
- Run report persisted under
.taskmaster/reports/runs/.
- Implement
-
Phase 2: PR + Resumability
- Add
ghPR creation with well-formed body using the run report. - Introduce resumable checkpoints and
--resumeflag. - Add coverage enforcement and optional lint/format step.
- Add
-
Phase 3: Extensibility + Guardrails
- Add support for basic pytest/go test adapters.
- Add safeguards: diff preview mode, manual confirm gates, aggressive minimal-change prompts.
- Optional: small TUI panel and extension panel leveraging the same run state file.
References (Repo)
- Test Workflow:
.cursor/rules/test_workflow.mdc - Git Workflow:
.cursor/rules/git_workflow.mdc - CLI:
apps/cli/src/commands/start.command.ts,apps/cli/src/ui/components/*.ts - Core Services:
packages/tm-core/src/services/task-service.ts,task-execution-service.ts - Executors:
packages/tm-core/src/executors/* - Git Utilities:
scripts/modules/utils/git-utils.js - Tag Management:
scripts/modules/task-manager/tag-management.js - Surgical Test Generator:
.claude/agents/surgical-test-generator.md