chore: prepare branch

This commit is contained in:
Ralph Khreish
2025-10-07 16:31:54 +02:00
parent dfbd5974da
commit ae65a7e6c2
10 changed files with 2761 additions and 8 deletions

View File

@@ -0,0 +1,125 @@
# Phase 0: Spike - Autonomous TDD Workflow
## Objective
Validate feasibility and build foundational understanding before full implementation.
## Scope
- Implement CLI skeleton `tm autopilot` with dry-run mode
- Show planned steps from a real task with subtasks
- Detect test runner from package.json
- Detect git state and render preflight report
## Deliverables
### 1. CLI Command Skeleton
- Create `apps/cli/src/commands/autopilot.command.ts`
- Support `tm autopilot <taskId>` command
- Implement `--dry-run` flag
- Basic help text and usage information
### 2. Preflight Detection System
- Detect test runner from package.json (npm test, pnpm test, etc.)
- Check git working tree state (clean/dirty)
- Validate required tools are available (git, gh, node/npm)
- Detect default branch
### 3. Dry-Run Execution Plan Display
Display planned execution for a task including:
- Preflight checks status
- Branch name that would be created
- Tag that would be set
- List of subtasks in execution order
- For each subtask:
- RED phase: test file that would be created
- GREEN phase: implementation files that would be modified
- COMMIT: commit message that would be used
- Finalization steps: test suite run, coverage check, push, PR creation
### 4. Task Loading & Validation
- Load task from TaskMaster state
- Validate task exists and has subtasks
- If no subtasks, show message about needing to expand first
- Show dependency order for subtasks
## Example Output
```bash
$ tm autopilot 42 --dry-run
Autopilot Plan for Task #42 [analytics]: User metrics tracking
─────────────────────────────────────────────────────────────
Preflight Checks:
✓ Working tree is clean
✓ Test command detected: npm test
✓ Tools available: git, gh, node, npm
✓ Current branch: main (will create new branch)
✓ Task has 3 subtasks ready to execute
Branch & Tag:
→ Will create branch: analytics/task-42-user-metrics
→ Will set active tag: analytics
Execution Plan (3 subtasks):
1. Subtask 42.1: Add metrics schema
RED: Generate tests → src/__tests__/schema.test.js
GREEN: Implement code → src/schema.js
COMMIT: "feat(metrics): add metrics schema (task 42.1)"
2. Subtask 42.2: Add collection endpoint [depends on 42.1]
RED: Generate tests → src/api/__tests__/metrics.test.js
GREEN: Implement code → src/api/metrics.js
COMMIT: "feat(metrics): add collection endpoint (task 42.2)"
3. Subtask 42.3: Add dashboard widget [depends on 42.2]
RED: Generate tests → src/components/__tests__/MetricsWidget.test.jsx
GREEN: Implement code → src/components/MetricsWidget.jsx
COMMIT: "feat(metrics): add dashboard widget (task 42.3)"
Finalization:
→ Run full test suite with coverage (threshold: 80%)
→ Push branch to origin (will confirm)
→ Create PR targeting main
Estimated commits: 3
Estimated duration: ~20-30 minutes (depends on implementation complexity)
Run without --dry-run to execute.
```
## Success Criteria
- Dry-run output is clear and matches expected workflow
- Preflight detection works correctly on the project repo
- Task loading integrates with existing TaskMaster state
- No actual git operations or file modifications occur in dry-run mode
## Out of Scope
- Actual test generation
- Actual code implementation
- Git operations (branch creation, commits, push)
- PR creation
- Test execution
## Implementation Notes
- Reuse existing `TaskService` from `packages/tm-core`
- Use existing git utilities from `scripts/modules/utils/git-utils.js`
- Load task/subtask data from `.taskmaster/tasks/tasks.json`
- Detect test command via package.json → scripts.test field
## Dependencies
- Existing TaskMaster CLI structure
- Existing task storage format
- Git utilities
## Estimated Effort
2-3 days
## Validation
Test dry-run mode with:
- Task with 1 subtask
- Task with multiple subtasks
- Task with dependencies between subtasks
- Task without subtasks (should show warning)
- Dirty git working tree (should warn)
- Missing tools (should error with helpful message)

View File

@@ -0,0 +1,380 @@
# Phase 1: Core Rails - Autonomous TDD Workflow
## Objective
Implement the core autonomous TDD workflow with safe git operations, test generation/execution, and commit gating.
## Scope
- WorkflowOrchestrator with event stream
- Git and Test adapters
- Subtask loop (RED → GREEN → COMMIT)
- Framework-agnostic test generation using Surgical Test Generator
- Test execution with detected test command
- Commit gating on passing tests and coverage
- Branch/tag mapping
- Run report persistence
## Deliverables
### 1. WorkflowOrchestrator (`packages/tm-core/src/services/workflow-orchestrator.ts`)
**Responsibilities:**
- State machine driving phases: Preflight → Branch/Tag → SubtaskIter → Finalize
- Event emission for progress tracking
- Coordination of Git, Test, and Executor adapters
- Run state persistence
**API:**
```typescript
class WorkflowOrchestrator {
async executeTask(taskId: string, options: AutopilotOptions): Promise<RunResult>
async resume(runId: string): Promise<RunResult>
on(event: string, handler: (data: any) => void): void
// Events emitted:
// - 'phase:start' { phase, timestamp }
// - 'phase:complete' { phase, status, timestamp }
// - 'subtask:start' { subtaskId, phase }
// - 'subtask:complete' { subtaskId, phase, status }
// - 'test:run' { subtaskId, phase, results }
// - 'commit:created' { subtaskId, sha, message }
// - 'error' { phase, error, recoverable }
}
```
**State Machine Phases:**
1. Preflight - validate environment
2. BranchSetup - create branch, set tag
3. SubtaskLoop - for each subtask: RED → GREEN → COMMIT
4. Finalize - full test suite, coverage check
5. Complete - run report, cleanup
### 2. GitAdapter (`packages/tm-core/src/services/git-adapter.ts`)
**Responsibilities:**
- All git operations with safety checks
- Branch name generation from tag/task
- Confirmation gates for destructive operations
**API:**
```typescript
class GitAdapter {
async isWorkingTreeClean(): Promise<boolean>
async getCurrentBranch(): Promise<string>
async getDefaultBranch(): Promise<string>
async createBranch(name: string): Promise<void>
async checkoutBranch(name: string): Promise<void>
async commit(message: string, files?: string[]): Promise<string>
async push(branch: string, remote?: string): Promise<void>
// Safety checks
async assertNotOnDefaultBranch(): Promise<void>
async assertCleanOrConfirm(): Promise<void>
// Branch naming
generateBranchName(tag: string, taskId: string, slug: string): string
}
```
**Guardrails:**
- Never allow commits on default branch
- Always check working tree before branch creation
- Confirm destructive operations unless `--no-confirm` flag
### 3. TestRunnerAdapter (`packages/tm-core/src/services/test-runner-adapter.ts`)
**Responsibilities:**
- Detect test command from package.json
- Execute tests (targeted and full suite)
- Parse test results and coverage
- Enforce coverage thresholds
**API:**
```typescript
class TestRunnerAdapter {
async detectTestCommand(): Promise<string>
async runTargeted(pattern: string): Promise<TestResults>
async runAll(): Promise<TestResults>
async getCoverage(): Promise<CoverageReport>
async meetsThresholds(coverage: CoverageReport): Promise<boolean>
}
interface TestResults {
exitCode: number
duration: number
summary: {
total: number
passed: number
failed: number
skipped: number
}
failures: Array<{
test: string
error: string
stack?: string
}>
}
interface CoverageReport {
lines: number
branches: number
functions: number
statements: number
}
```
**Detection Logic:**
- Check package.json → scripts.test
- Support: npm test, pnpm test, yarn test, bun test
- Fall back to explicit command from config
### 4. Test Generation Integration
**Use Surgical Test Generator:**
- Load prompt from `.claude/agents/surgical-test-generator.md`
- Compose with task/subtask context
- Generate tests via executor (Claude)
- Write test files to detected locations
**Prompt Composition:**
```typescript
async function composeRedPrompt(subtask: Subtask, context: ProjectContext): Promise<string> {
const systemPrompts = [
loadFile('.cursor/rules/git_workflow.mdc'),
loadFile('.cursor/rules/test_workflow.mdc'),
loadFile('.claude/agents/surgical-test-generator.md')
]
const taskContext = formatTaskContext(subtask)
const instruction = formatRedInstruction(subtask, context)
return [
...systemPrompts,
'<TASK CONTEXT>',
taskContext,
'<INSTRUCTION>',
instruction
].join('\n\n')
}
```
### 5. Subtask Loop Implementation
**RED Phase:**
1. Compose test generation prompt with subtask context
2. Execute via Claude executor
3. Parse generated test file paths and code
4. Write test files to filesystem
5. Run tests to confirm they fail (red state)
6. Store test results in run artifacts
7. If tests pass unexpectedly, warn and skip to next subtask
**GREEN Phase:**
1. Compose implementation prompt with test failures
2. Execute via Claude executor with max attempts (default: 3)
3. Parse implementation changes
4. Apply changes to filesystem
5. Run tests to verify passing (green state)
6. If tests still fail after max attempts:
- Save current state
- Emit pause event
- Return resumable checkpoint
7. If tests pass, proceed to COMMIT
**COMMIT Phase:**
1. Verify all tests pass
2. Check coverage meets thresholds (if enabled)
3. Generate conventional commit message
4. Stage test files + implementation files
5. Commit with message
6. Update subtask status to 'done'
7. Emit commit event with SHA
8. Continue to next subtask
### 6. Branch & Tag Management
**Integration with existing tag system:**
- Use `scripts/modules/task-manager/tag-management.js`
- Explicit tag switching when branch created
- Store branch ↔ tag mapping in run state
**Branch Naming:**
- Pattern from config: `{tag}/task-{id}-{slug}`
- Default: `analytics/task-42-user-metrics`
- Sanitize: lowercase, replace spaces with hyphens
### 7. Run Artifacts & State Persistence
**Directory structure:**
```
.taskmaster/reports/runs/<run-id>/
├── manifest.json # run metadata
├── log.jsonl # event stream
├── commits.txt # commit SHAs
├── test-results/
│ ├── subtask-42.1-red.json
│ ├── subtask-42.1-green.json
│ ├── subtask-42.2-red.json
│ ├── subtask-42.2-green-attempt1.json
│ ├── subtask-42.2-green-attempt2.json
│ └── final-suite.json
└── state.json # resumable checkpoint
```
**manifest.json:**
```json
{
"runId": "2025-01-15-142033",
"taskId": "42",
"tag": "analytics",
"branch": "analytics/task-42-user-metrics",
"startTime": "2025-01-15T14:20:33Z",
"endTime": null,
"status": "in-progress",
"currentPhase": "subtask-loop",
"currentSubtask": "42.2",
"subtasksCompleted": ["42.1"],
"subtasksFailed": [],
"totalCommits": 1
}
```
**log.jsonl** (append-only event log):
```jsonl
{"ts":"2025-01-15T14:20:33Z","event":"phase:start","phase":"preflight","status":"ok"}
{"ts":"2025-01-15T14:21:00Z","event":"subtask:start","subtask":"42.1","phase":"red"}
{"ts":"2025-01-15T14:22:00Z","event":"test:run","subtask":"42.1","phase":"red","results":{"passed":0,"failed":3}}
{"ts":"2025-01-15T14:23:00Z","event":"subtask:start","subtask":"42.1","phase":"green"}
{"ts":"2025-01-15T14:24:30Z","event":"test:run","subtask":"42.1","phase":"green","attempt":1,"results":{"passed":3,"failed":0}}
{"ts":"2025-01-15T14:24:35Z","event":"commit:created","subtask":"42.1","sha":"a1b2c3d","message":"feat(metrics): add metrics schema (task 42.1)"}
```
### 8. CLI Command Implementation
**Update `tm autopilot` command:**
- Remove `--dry-run` only behavior
- Execute actual workflow when flag not present
- Add progress reporting via orchestrator events
- Support `--no-confirm` for CI/automation
- Support `--max-attempts` to override default
**Real-time output:**
```bash
$ tm autopilot 42
🚀 Starting autopilot for Task #42 [analytics]: User metrics tracking
✓ Preflight checks passed
✓ Created branch: analytics/task-42-user-metrics
✓ Set active tag: analytics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[1/3] Subtask 42.1: Add metrics schema
RED Generating tests... ⏳
RED ✓ Tests created: src/__tests__/schema.test.js
RED ✓ Tests failing: 3 failed, 0 passed
GREEN Implementing code... ⏳
GREEN ✓ Tests passing: 3 passed, 0 failed (attempt 1)
COMMIT ✓ Committed: a1b2c3d
"feat(metrics): add metrics schema (task 42.1)"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[2/3] Subtask 42.2: Add collection endpoint
...
```
## Success Criteria
- Can execute a simple task end-to-end without manual intervention
- All commits made on feature branch, never on default branch
- Tests are generated before implementation (RED → GREEN order enforced)
- Only commits when tests pass and coverage meets threshold
- Run state is persisted and can be inspected post-run
- Clear error messages when things go wrong
- Orchestrator events allow CLI to show live progress
## Configuration
**Add to `.taskmaster/config.json`:**
```json
{
"autopilot": {
"enabled": true,
"requireCleanWorkingTree": true,
"commitTemplate": "{type}({scope}): {msg}",
"defaultCommitType": "feat",
"maxGreenAttempts": 3,
"testTimeout": 300000
},
"test": {
"runner": "auto",
"coverageThresholds": {
"lines": 80,
"branches": 80,
"functions": 80,
"statements": 80
},
"targetedRunPattern": "**/*.test.js"
},
"git": {
"branchPattern": "{tag}/task-{id}-{slug}",
"defaultRemote": "origin"
}
}
```
## Out of Scope (defer to Phase 2)
- PR creation (gh integration)
- Resume functionality (`--resume` flag)
- Lint/format step
- Multiple executor support (only Claude)
## Implementation Order
1. GitAdapter with safety checks
2. TestRunnerAdapter with detection logic
3. WorkflowOrchestrator state machine skeleton
4. RED phase: test generation integration
5. GREEN phase: implementation with retry logic
6. COMMIT phase: gating and persistence
7. CLI command wiring with event handling
8. Run artifacts and logging
## Testing Strategy
- Unit tests for each adapter (mock git/test commands)
- Integration tests with real git repo (temporary directory)
- End-to-end test with sample task in test project
- Verify no commits on default branch (security test)
- Verify commit gating works (force test failure, ensure no commit)
## Dependencies
- Phase 0 completed (CLI skeleton, preflight checks)
- Existing TaskService and executor infrastructure
- Surgical Test Generator prompt file exists
## Estimated Effort
2-3 weeks
## Risks & Mitigations
- **Risk:** Test generation produces invalid/wrong tests
- **Mitigation:** Use Surgical Test Generator prompt, add manual review step in early iterations
- **Risk:** Implementation attempts timeout/fail repeatedly
- **Mitigation:** Max attempts with pause/resume; store state for manual intervention
- **Risk:** Coverage parsing fails on different test frameworks
- **Mitigation:** Start with one framework (vitest), add parsers incrementally
- **Risk:** Git operations fail (conflicts, permissions)
- **Mitigation:** Detailed error messages, save state before destructive ops
## Validation
Test with:
- Simple task (1 subtask, clear requirements)
- Medium task (3 subtasks with dependencies)
- Task requiring multiple GREEN attempts
- Task with dirty working tree (should error)
- Task on default branch (should error)
- Project without test command (should error with helpful message)

View File

@@ -0,0 +1,433 @@
# Phase 2: PR + Resumability - Autonomous TDD Workflow
## Objective
Add PR creation with GitHub CLI integration, resumable checkpoints for interrupted runs, and enhanced guardrails with coverage enforcement.
## Scope
- GitHub PR creation via `gh` CLI
- Well-formed PR body using run report
- Resumable checkpoints and `--resume` flag
- Coverage enforcement before finalization
- Optional lint/format step
- Enhanced error recovery
## Deliverables
### 1. PR Creation Integration
**PRAdapter** (`packages/tm-core/src/services/pr-adapter.ts`):
```typescript
class PRAdapter {
async isGHAvailable(): Promise<boolean>
async createPR(options: PROptions): Promise<PRResult>
async getPRTemplate(runReport: RunReport): Promise<string>
// Fallback for missing gh CLI
async getManualPRInstructions(options: PROptions): Promise<string>
}
interface PROptions {
branch: string
base: string
title: string
body: string
draft?: boolean
}
interface PRResult {
url: string
number: number
}
```
**PR Title Format:**
```
Task #<id> [<tag>]: <title>
```
Example: `Task #42 [analytics]: User metrics tracking`
**PR Body Template:**
Located at `.taskmaster/templates/pr-body.md`:
```markdown
## Summary
Implements Task #42 from TaskMaster autonomous workflow.
**Branch:** {branch}
**Tag:** {tag}
**Subtasks completed:** {subtaskCount}
{taskDescription}
## Subtasks
{subtasksList}
## Test Coverage
| Metric | Coverage |
|--------|----------|
| Lines | {lines}% |
| Branches | {branches}% |
| Functions | {functions}% |
| Statements | {statements}% |
**All subtasks passed with {totalTests} tests.**
## Commits
{commitsList}
## Run Report
Full execution report: `.taskmaster/reports/runs/{runId}/`
---
🤖 Generated with [Task Master](https://github.com/cline/task-master) autonomous TDD workflow
```
**Token replacement:**
- `{branch}` → branch name
- `{tag}` → active tag
- `{subtaskCount}` → number of completed subtasks
- `{taskDescription}` → task description from TaskMaster
- `{subtasksList}` → markdown list of subtask titles
- `{lines}`, `{branches}`, `{functions}`, `{statements}` → coverage percentages
- `{totalTests}` → total test count
- `{commitsList}` → markdown list of commit SHAs and messages
- `{runId}` → run ID timestamp
### 2. GitHub CLI Integration
**Detection:**
```bash
which gh
```
If not found, show fallback instructions:
```bash
✓ Branch pushed: analytics/task-42-user-metrics
✗ gh CLI not found - cannot create PR automatically
To create PR manually:
gh pr create \
--base main \
--head analytics/task-42-user-metrics \
--title "Task #42 [analytics]: User metrics tracking" \
--body-file .taskmaster/reports/runs/2025-01-15-142033/pr.md
Or visit:
https://github.com/org/repo/compare/main...analytics/task-42-user-metrics
```
**Confirmation gate:**
```bash
Ready to create PR:
Title: Task #42 [analytics]: User metrics tracking
Base: main
Head: analytics/task-42-user-metrics
Create PR? [Y/n]
```
Unless `--no-confirm` flag is set.
### 3. Resumable Workflow
**State Checkpoint** (`state.json`):
```json
{
"runId": "2025-01-15-142033",
"taskId": "42",
"phase": "subtask-loop",
"currentSubtask": "42.2",
"currentPhase": "green",
"attempts": 2,
"completedSubtasks": ["42.1"],
"commits": ["a1b2c3d"],
"branch": "analytics/task-42-user-metrics",
"tag": "analytics",
"canResume": true,
"pausedAt": "2025-01-15T14:25:35Z",
"pausedReason": "max_attempts_reached",
"nextAction": "manual_review_required"
}
```
**Resume Command:**
```bash
$ tm autopilot --resume
Resuming run: 2025-01-15-142033
Task: #42 [analytics] User metrics tracking
Branch: analytics/task-42-user-metrics
Last subtask: 42.2 (GREEN phase, attempt 2/3 failed)
Paused: 5 minutes ago
Reason: Could not achieve green state after 3 attempts
Last error: POST /api/metrics returns 500 instead of 201
Resume from subtask 42.2 GREEN phase? [Y/n]
```
**Resume logic:**
1. Load state from `.taskmaster/reports/runs/<runId>/state.json`
2. Verify branch still exists and is checked out
3. Verify no uncommitted changes (unless `--force`)
4. Continue from last checkpoint phase
5. Update state file as execution progresses
**Multiple interrupted runs:**
```bash
$ tm autopilot --resume
Found 2 resumable runs:
1. 2025-01-15-142033 - Task #42 (paused 5 min ago at subtask 42.2 GREEN)
2. 2025-01-14-103022 - Task #38 (paused 2 hours ago at subtask 38.3 RED)
Select run to resume [1-2]:
```
### 4. Coverage Enforcement
**Coverage Check Phase** (before finalization):
```typescript
async function enforceCoverage(runId: string): Promise<void> {
const testResults = await testRunner.runAll()
const coverage = await testRunner.getCoverage()
const thresholds = config.test.coverageThresholds
const failures = []
if (coverage.lines < thresholds.lines) {
failures.push(`Lines: ${coverage.lines}% < ${thresholds.lines}%`)
}
// ... check branches, functions, statements
if (failures.length > 0) {
throw new CoverageError(
`Coverage thresholds not met:\n${failures.join('\n')}`
)
}
// Store coverage in run report
await storeRunArtifact(runId, 'coverage.json', coverage)
}
```
**Handling coverage failures:**
```bash
⚠️ Coverage check failed:
Lines: 78.5% < 80%
Branches: 75.0% < 80%
Options:
1. Add more tests and resume
2. Lower thresholds in .taskmaster/config.json
3. Skip coverage check: tm autopilot --resume --skip-coverage
Run paused. Fix coverage and resume with:
tm autopilot --resume
```
### 5. Optional Lint/Format Step
**Configuration:**
```json
{
"autopilot": {
"finalization": {
"lint": {
"enabled": true,
"command": "npm run lint",
"fix": true,
"failOnError": false
},
"format": {
"enabled": true,
"command": "npm run format",
"commitChanges": true
}
}
}
}
```
**Execution:**
```bash
Finalization Steps:
✓ All tests passing (12 tests, 0 failures)
✓ Coverage thresholds met (85% lines, 82% branches)
LINT Running linter... ⏳
LINT ✓ No lint errors
FORMAT Running formatter... ⏳
FORMAT ✓ Formatted 3 files
FORMAT ✓ Committed formatting changes: "chore: auto-format code"
PUSH Pushing to origin... ⏳
PUSH ✓ Pushed analytics/task-42-user-metrics
PR Creating pull request... ⏳
PR ✓ Created PR #123
https://github.com/org/repo/pull/123
```
### 6. Enhanced Error Recovery
**Pause Points:**
- Max GREEN attempts reached (current)
- Coverage check failed (new)
- Lint errors (if `failOnError: true`)
- Git push failed (new)
- PR creation failed (new)
**Each pause saves:**
- Full state checkpoint
- Last command output
- Suggested next actions
- Resume instructions
**Automatic recovery attempts:**
- Git push: retry up to 3 times with backoff
- PR creation: fall back to manual instructions
- Lint: auto-fix if enabled, otherwise pause
### 7. Finalization Phase Enhancement
**Updated workflow:**
1. Run full test suite
2. Check coverage thresholds → pause if failed
3. Run lint (if enabled) → pause if failed and `failOnError: true`
4. Run format (if enabled) → auto-commit changes
5. Confirm push (unless `--no-confirm`)
6. Push branch → retry on failure
7. Generate PR body from template
8. Create PR via gh → fall back to manual instructions
9. Update task status to 'review' (configurable)
10. Save final run report
**Final output:**
```bash
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Task #42 [analytics]: User metrics tracking - COMPLETE
Branch: analytics/task-42-user-metrics
Subtasks completed: 3/3
Commits: 3
Total tests: 12 (12 passed, 0 failed)
Coverage: 85% lines, 82% branches, 88% functions, 85% statements
PR #123: https://github.com/org/repo/pull/123
Run report: .taskmaster/reports/runs/2025-01-15-142033/
Next steps:
- Review PR and request changes if needed
- Merge when ready
- Task status updated to 'review'
Completed in 24 minutes
```
## CLI Updates
**New flags:**
- `--resume` → Resume from last checkpoint
- `--skip-coverage` → Skip coverage checks
- `--skip-lint` → Skip lint step
- `--skip-format` → Skip format step
- `--skip-pr` → Push branch but don't create PR
- `--draft-pr` → Create draft PR instead of ready-for-review
## Configuration Updates
**Add to `.taskmaster/config.json`:**
```json
{
"autopilot": {
"finalization": {
"lint": {
"enabled": false,
"command": "npm run lint",
"fix": true,
"failOnError": false
},
"format": {
"enabled": false,
"command": "npm run format",
"commitChanges": true
},
"updateTaskStatus": "review"
}
},
"git": {
"pr": {
"enabled": true,
"base": "default",
"bodyTemplate": ".taskmaster/templates/pr-body.md",
"draft": false
},
"pushRetries": 3,
"pushRetryDelay": 5000
}
}
```
## Success Criteria
- Can create PR automatically with well-formed body
- Can resume interrupted runs from any checkpoint
- Coverage checks prevent low-quality code from being merged
- Clear error messages and recovery paths for all failure modes
- Run reports include full PR context for review
## Out of Scope (defer to Phase 3)
- Multiple test framework support (pytest, go test)
- Diff preview before commits
- TUI panel implementation
- Extension/IDE integration
## Testing Strategy
- Mock `gh` CLI for PR creation tests
- Test resume from each possible pause point
- Test coverage failure scenarios
- Test lint/format integration with mock commands
- End-to-end test with PR creation on test repo
## Dependencies
- Phase 1 completed (core workflow)
- GitHub CLI (`gh`) installed (optional, fallback provided)
- Test framework supports coverage output
## Estimated Effort
1-2 weeks
## Risks & Mitigations
- **Risk:** GitHub CLI auth issues
- **Mitigation:** Clear auth setup docs, fallback to manual instructions
- **Risk:** PR body template doesn't match all project needs
- **Mitigation:** Make template customizable via config path
- **Risk:** Resume state gets corrupted
- **Mitigation:** Validate state on load, provide --force-reset option
- **Risk:** Coverage calculation differs between runs
- **Mitigation:** Store coverage with each test run for comparison
## Validation
Test with:
- Successful PR creation end-to-end
- Resume from GREEN attempt failure
- Resume from coverage failure
- Resume from lint failure
- Missing `gh` CLI (fallback to manual)
- Lint/format integration enabled
- Multiple interrupted runs (selection UI)

View File

@@ -0,0 +1,534 @@
# Phase 3: Extensibility + Guardrails - Autonomous TDD Workflow
## Objective
Add multi-language/framework support, enhanced safety guardrails, TUI interface, and extensibility for IDE/editor integration.
## Scope
- Multi-language test runner support (pytest, go test, etc.)
- Enhanced safety: diff preview, confirmation gates, minimal-change prompts
- Optional TUI panel with tmux integration
- State-based extension API for IDE integration
- Parallel subtask execution (experimental)
## Deliverables
### 1. Multi-Language Test Runner Support
**Extend TestRunnerAdapter:**
```typescript
class TestRunnerAdapter {
// Existing methods...
async detectLanguage(): Promise<Language>
async detectFramework(language: Language): Promise<Framework>
async getFrameworkAdapter(framework: Framework): Promise<FrameworkAdapter>
}
enum Language {
JavaScript = 'javascript',
TypeScript = 'typescript',
Python = 'python',
Go = 'go',
Rust = 'rust'
}
enum Framework {
Vitest = 'vitest',
Jest = 'jest',
Pytest = 'pytest',
GoTest = 'gotest',
CargoTest = 'cargotest'
}
interface FrameworkAdapter {
runTargeted(pattern: string): Promise<TestResults>
runAll(): Promise<TestResults>
parseCoverage(output: string): Promise<CoverageReport>
getTestFilePattern(): string
getTestFileExtension(): string
}
```
**Framework-specific adapters:**
**PytestAdapter** (`packages/tm-core/src/services/test-adapters/pytest-adapter.ts`):
```typescript
class PytestAdapter implements FrameworkAdapter {
async runTargeted(pattern: string): Promise<TestResults> {
const output = await exec(`pytest ${pattern} --json-report`)
return this.parseResults(output)
}
async runAll(): Promise<TestResults> {
const output = await exec('pytest --cov --json-report')
return this.parseResults(output)
}
parseCoverage(output: string): Promise<CoverageReport> {
// Parse pytest-cov XML output
}
getTestFilePattern(): string {
return '**/test_*.py'
}
getTestFileExtension(): string {
return '.py'
}
}
```
**GoTestAdapter** (`packages/tm-core/src/services/test-adapters/gotest-adapter.ts`):
```typescript
class GoTestAdapter implements FrameworkAdapter {
async runTargeted(pattern: string): Promise<TestResults> {
const output = await exec(`go test ${pattern} -json`)
return this.parseResults(output)
}
async runAll(): Promise<TestResults> {
const output = await exec('go test ./... -coverprofile=coverage.out -json')
return this.parseResults(output)
}
parseCoverage(output: string): Promise<CoverageReport> {
// Parse go test coverage output
}
getTestFilePattern(): string {
return '**/*_test.go'
}
getTestFileExtension(): string {
return '_test.go'
}
}
```
**Detection Logic:**
```typescript
async function detectFramework(): Promise<Framework> {
// Check for package.json
if (await exists('package.json')) {
const pkg = await readJSON('package.json')
if (pkg.devDependencies?.vitest) return Framework.Vitest
if (pkg.devDependencies?.jest) return Framework.Jest
}
// Check for Python files
if (await exists('pytest.ini') || await exists('setup.py')) {
return Framework.Pytest
}
// Check for Go files
if (await exists('go.mod')) {
return Framework.GoTest
}
// Check for Rust files
if (await exists('Cargo.toml')) {
return Framework.CargoTest
}
throw new Error('Could not detect test framework')
}
```
### 2. Enhanced Safety Guardrails
**Diff Preview Mode:**
```bash
$ tm autopilot 42 --preview-diffs
[2/3] Subtask 42.2: Add collection endpoint
RED ✓ Tests created: src/api/__tests__/metrics.test.js
GREEN Implementing code...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Proposed changes (src/api/metrics.js):
+ import { MetricsSchema } from '../models/schema.js'
+
+ export async function createMetric(data) {
+ const validated = MetricsSchema.parse(data)
+ const result = await db.metrics.create(validated)
+ return result
+ }
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Apply these changes? [Y/n/e(dit)/s(kip)]
Y - Apply and continue
n - Reject and retry GREEN phase
e - Open in editor for manual changes
s - Skip this subtask
```
**Minimal Change Enforcement:**
Add to system prompt:
```markdown
CRITICAL: Make MINIMAL changes to pass the failing tests.
- Only modify files directly related to the subtask
- Do not refactor existing code unless absolutely necessary
- Do not add features beyond the acceptance criteria
- Keep changes under 50 lines per file when possible
- Prefer composition over modification
```
**Change Size Warnings:**
```bash
⚠️ Large change detected:
Files modified: 5
Lines changed: +234, -12
This subtask was expected to be small (~50 lines).
Consider:
- Breaking into smaller subtasks
- Reviewing acceptance criteria
- Checking for unintended changes
Continue anyway? [y/N]
```
### 3. TUI Interface with tmux
**Layout:**
```
┌──────────────────────────────────┬─────────────────────────────────┐
│ Task Navigator (left) │ Executor Terminal (right) │
│ │ │
│ Project: my-app │ $ tm autopilot --executor-mode │
│ Branch: analytics/task-42 │ > Running subtask 42.2 GREEN... │
│ Tag: analytics │ > Implementing endpoint... │
│ │ > Tests: 3 passed, 0 failed │
│ Tasks: │ > Ready to commit │
│ → 42 [in-progress] User metrics │ │
│ → 42.1 [done] Schema │ [Live output from executor] │
│ → 42.2 [active] Endpoint ◀ │ │
│ → 42.3 [pending] Dashboard │ │
│ │ │
│ [s] start [p] pause [q] quit │ │
└──────────────────────────────────┴─────────────────────────────────┘
```
**Implementation:**
**TUI Navigator** (`apps/cli/src/ui/tui/navigator.ts`):
```typescript
import blessed from 'blessed'
class AutopilotTUI {
private screen: blessed.Widgets.Screen
private taskList: blessed.Widgets.ListElement
private statusBox: blessed.Widgets.BoxElement
private executorPane: string // tmux pane ID
async start(taskId?: string) {
// Create blessed screen
this.screen = blessed.screen()
// Create task list widget
this.taskList = blessed.list({
label: 'Tasks',
keys: true,
vi: true,
style: { selected: { bg: 'blue' } }
})
// Spawn tmux pane for executor
this.executorPane = await this.spawnExecutorPane()
// Watch state file for updates
this.watchStateFile()
// Handle keybindings
this.setupKeybindings()
}
private async spawnExecutorPane(): Promise<string> {
const paneId = await exec('tmux split-window -h -P -F "#{pane_id}"')
await exec(`tmux send-keys -t ${paneId} "tm autopilot --executor-mode" Enter`)
return paneId.trim()
}
private watchStateFile() {
watch('.taskmaster/state/current-run.json', (event, filename) => {
this.updateDisplay()
})
}
private setupKeybindings() {
this.screen.key(['s'], () => this.startTask())
this.screen.key(['p'], () => this.pauseTask())
this.screen.key(['q'], () => this.quit())
this.screen.key(['up', 'down'], () => this.navigateTasks())
}
}
```
**Executor Mode:**
```bash
$ tm autopilot 42 --executor-mode
# Runs in executor pane, writes state to shared file
# Left pane reads state file and updates display
```
**State File** (`.taskmaster/state/current-run.json`):
```json
{
"runId": "2025-01-15-142033",
"taskId": "42",
"status": "running",
"currentPhase": "green",
"currentSubtask": "42.2",
"lastOutput": "Implementing endpoint...",
"testsStatus": {
"passed": 3,
"failed": 0
}
}
```
### 4. Extension API for IDE Integration
**State-based API:**
Expose run state via JSON files that IDEs can read:
- `.taskmaster/state/current-run.json` - live run state
- `.taskmaster/reports/runs/<runId>/manifest.json` - run metadata
- `.taskmaster/reports/runs/<runId>/log.jsonl` - event stream
**WebSocket API (optional):**
```typescript
// packages/tm-core/src/services/autopilot-server.ts
class AutopilotServer {
private wss: WebSocketServer
start(port: number = 7890) {
this.wss = new WebSocketServer({ port })
this.wss.on('connection', (ws) => {
// Send current state
ws.send(JSON.stringify(this.getCurrentState()))
// Stream events
this.orchestrator.on('*', (event) => {
ws.send(JSON.stringify(event))
})
})
}
}
```
**Usage from IDE extension:**
```typescript
// VS Code extension example
const ws = new WebSocket('ws://localhost:7890')
ws.on('message', (data) => {
const event = JSON.parse(data)
if (event.type === 'subtask:complete') {
vscode.window.showInformationMessage(
`Subtask ${event.subtaskId} completed`
)
}
})
```
### 5. Parallel Subtask Execution (Experimental)
**Dependency Analysis:**
```typescript
class SubtaskScheduler {
async buildDependencyGraph(subtasks: Subtask[]): Promise<DAG> {
const graph = new DAG()
for (const subtask of subtasks) {
graph.addNode(subtask.id)
for (const depId of subtask.dependencies) {
graph.addEdge(depId, subtask.id)
}
}
return graph
}
async getParallelBatches(graph: DAG): Promise<Subtask[][]> {
const batches: Subtask[][] = []
const completed = new Set<string>()
while (completed.size < graph.size()) {
const ready = graph.nodes.filter(node =>
!completed.has(node.id) &&
node.dependencies.every(dep => completed.has(dep))
)
batches.push(ready)
ready.forEach(node => completed.add(node.id))
}
return batches
}
}
```
**Parallel Execution:**
```bash
$ tm autopilot 42 --parallel
[Batch 1] Running 2 subtasks in parallel:
→ 42.1: Add metrics schema
→ 42.4: Add API documentation
42.1 RED ✓ Tests created
42.4 RED ✓ Tests created
42.1 GREEN ✓ Implementation complete
42.4 GREEN ✓ Implementation complete
42.1 COMMIT ✓ Committed: a1b2c3d
42.4 COMMIT ✓ Committed: e5f6g7h
[Batch 2] Running 2 subtasks in parallel (depend on 42.1):
→ 42.2: Add collection endpoint
→ 42.3: Add dashboard widget
...
```
**Conflict Detection:**
```typescript
async function detectConflicts(subtasks: Subtask[]): Promise<Conflict[]> {
const conflicts: Conflict[] = []
for (let i = 0; i < subtasks.length; i++) {
for (let j = i + 1; j < subtasks.length; j++) {
const filesA = await predictAffectedFiles(subtasks[i])
const filesB = await predictAffectedFiles(subtasks[j])
const overlap = filesA.filter(f => filesB.includes(f))
if (overlap.length > 0) {
conflicts.push({
subtasks: [subtasks[i].id, subtasks[j].id],
files: overlap
})
}
}
}
return conflicts
}
```
### 6. Advanced Configuration
**Add to `.taskmaster/config.json`:**
```json
{
"autopilot": {
"safety": {
"previewDiffs": false,
"maxChangeLinesPerFile": 100,
"warnOnLargeChanges": true,
"requireConfirmOnLargeChanges": true
},
"parallel": {
"enabled": false,
"maxConcurrent": 3,
"detectConflicts": true
},
"tui": {
"enabled": false,
"tmuxSession": "taskmaster-autopilot"
},
"api": {
"enabled": false,
"port": 7890,
"allowRemote": false
}
},
"test": {
"frameworks": {
"python": {
"runner": "pytest",
"coverageCommand": "pytest --cov",
"testPattern": "**/test_*.py"
},
"go": {
"runner": "go test",
"coverageCommand": "go test ./... -coverprofile=coverage.out",
"testPattern": "**/*_test.go"
}
}
}
}
```
## CLI Updates
**New commands:**
```bash
tm autopilot <taskId> --tui # Launch TUI interface
tm autopilot <taskId> --parallel # Enable parallel execution
tm autopilot <taskId> --preview-diffs # Show diffs before applying
tm autopilot <taskId> --executor-mode # Run as executor pane
tm autopilot-server start # Start WebSocket API
```
## Success Criteria
- Supports Python projects with pytest
- Supports Go projects with go test
- Diff preview prevents unwanted changes
- TUI provides better visibility for long-running tasks
- IDE extensions can integrate via state files or WebSocket
- Parallel execution reduces total time for independent subtasks
## Out of Scope
- Full Electron/web GUI
- AI executor selection UI (defer to Phase 4)
- Multi-repository support
- Remote execution on cloud runners
## Testing Strategy
- Test with Python project (pytest)
- Test with Go project (go test)
- Test diff preview UI with mock changes
- Test parallel execution with independent subtasks
- Test conflict detection with overlapping file changes
- Test TUI with mock tmux environment
## Dependencies
- Phase 2 completed (PR + resumability)
- tmux installed (for TUI)
- blessed or ink library (for TUI rendering)
## Estimated Effort
3-4 weeks
## Risks & Mitigations
- **Risk:** Parallel execution causes git conflicts
- **Mitigation:** Conservative conflict detection, sequential fallback
- **Risk:** TUI adds complexity and maintenance burden
- **Mitigation:** Keep TUI optional, state-based design allows alternatives
- **Risk:** Framework adapters hard to maintain across versions
- **Mitigation:** Abstract common parsing logic, document adapter interface
- **Risk:** Diff preview slows down workflow
- **Mitigation:** Make optional, use --preview-diffs flag only when needed
## Validation
Test with:
- Python project with pytest and pytest-cov
- Go project with go test
- Large changes requiring confirmation
- Parallel execution with 3+ independent subtasks
- TUI with task selection and live status updates
- VS Code extension reading state files