chore: prepare branch

2025-10-07 16:31:54 +02:00
parent 959c6151fa
commit ec3972ff10
10 changed files with 2761 additions and 8 deletions
--- a/.taskmaster/docs/tdd-workflow-phase-1-core-rails.md
+++ b/.taskmaster/docs/tdd-workflow-phase-1-core-rails.md
@@ -0,0 +1,380 @@
+# Phase 1: Core Rails - Autonomous TDD Workflow
+
+## Objective
+Implement the core autonomous TDD workflow with safe git operations, test generation/execution, and commit gating.
+
+## Scope
+- WorkflowOrchestrator with event stream
+- Git and Test adapters
+- Subtask loop (RED → GREEN → COMMIT)
+- Framework-agnostic test generation using Surgical Test Generator
+- Test execution with detected test command
+- Commit gating on passing tests and coverage
+- Branch/tag mapping
+- Run report persistence
+
+## Deliverables
+
+### 1. WorkflowOrchestrator (`packages/tm-core/src/services/workflow-orchestrator.ts`)
+
+**Responsibilities:**
+- State machine driving phases: Preflight → Branch/Tag → SubtaskIter → Finalize
+- Event emission for progress tracking
+- Coordination of Git, Test, and Executor adapters
+- Run state persistence
+
+**API:**
+```typescript
+class WorkflowOrchestrator {
+  async executeTask(taskId: string, options: AutopilotOptions): Promise<RunResult>
+  async resume(runId: string): Promise<RunResult>
+  on(event: string, handler: (data: any) => void): void
+
+  // Events emitted:
+  // - 'phase:start' { phase, timestamp }
+  // - 'phase:complete' { phase, status, timestamp }
+  // - 'subtask:start' { subtaskId, phase }
+  // - 'subtask:complete' { subtaskId, phase, status }
+  // - 'test:run' { subtaskId, phase, results }
+  // - 'commit:created' { subtaskId, sha, message }
+  // - 'error' { phase, error, recoverable }
+}
+```
+
+**State Machine Phases:**
+1. Preflight - validate environment
+2. BranchSetup - create branch, set tag
+3. SubtaskLoop - for each subtask: RED → GREEN → COMMIT
+4. Finalize - full test suite, coverage check
+5. Complete - run report, cleanup
+
+### 2. GitAdapter (`packages/tm-core/src/services/git-adapter.ts`)
+
+**Responsibilities:**
+- All git operations with safety checks
+- Branch name generation from tag/task
+- Confirmation gates for destructive operations
+
+**API:**
+```typescript
+class GitAdapter {
+  async isWorkingTreeClean(): Promise<boolean>
+  async getCurrentBranch(): Promise<string>
+  async getDefaultBranch(): Promise<string>
+  async createBranch(name: string): Promise<void>
+  async checkoutBranch(name: string): Promise<void>
+  async commit(message: string, files?: string[]): Promise<string>
+  async push(branch: string, remote?: string): Promise<void>
+
+  // Safety checks
+  async assertNotOnDefaultBranch(): Promise<void>
+  async assertCleanOrConfirm(): Promise<void>
+
+  // Branch naming
+  generateBranchName(tag: string, taskId: string, slug: string): string
+}
+```
+
+**Guardrails:**
+- Never allow commits on default branch
+- Always check working tree before branch creation
+- Confirm destructive operations unless `--no-confirm` flag
+
+### 3. TestRunnerAdapter (`packages/tm-core/src/services/test-runner-adapter.ts`)
+
+**Responsibilities:**
+- Detect test command from package.json
+- Execute tests (targeted and full suite)
+- Parse test results and coverage
+- Enforce coverage thresholds
+
+**API:**
+```typescript
+class TestRunnerAdapter {
+  async detectTestCommand(): Promise<string>
+  async runTargeted(pattern: string): Promise<TestResults>
+  async runAll(): Promise<TestResults>
+  async getCoverage(): Promise<CoverageReport>
+  async meetsThresholds(coverage: CoverageReport): Promise<boolean>
+}
+
+interface TestResults {
+  exitCode: number
+  duration: number
+  summary: {
+    total: number
+    passed: number
+    failed: number
+    skipped: number
+  }
+  failures: Array<{
+    test: string
+    error: string
+    stack?: string
+  }>
+}
+
+interface CoverageReport {
+  lines: number
+  branches: number
+  functions: number
+  statements: number
+}
+```
+
+**Detection Logic:**
+- Check package.json → scripts.test
+- Support: npm test, pnpm test, yarn test, bun test
+- Fall back to explicit command from config
+
+### 4. Test Generation Integration
+
+**Use Surgical Test Generator:**
+- Load prompt from `.claude/agents/surgical-test-generator.md`
+- Compose with task/subtask context
+- Generate tests via executor (Claude)
+- Write test files to detected locations
+
+**Prompt Composition:**
+```typescript
+async function composeRedPrompt(subtask: Subtask, context: ProjectContext): Promise<string> {
+  const systemPrompts = [
+    loadFile('.cursor/rules/git_workflow.mdc'),
+    loadFile('.cursor/rules/test_workflow.mdc'),
+    loadFile('.claude/agents/surgical-test-generator.md')
+  ]
+
+  const taskContext = formatTaskContext(subtask)
+  const instruction = formatRedInstruction(subtask, context)
+
+  return [
+    ...systemPrompts,
+    '<TASK CONTEXT>',
+    taskContext,
+    '<INSTRUCTION>',
+    instruction
+  ].join('\n\n')
+}
+```
+
+### 5. Subtask Loop Implementation
+
+**RED Phase:**
+1. Compose test generation prompt with subtask context
+2. Execute via Claude executor
+3. Parse generated test file paths and code
+4. Write test files to filesystem
+5. Run tests to confirm they fail (red state)
+6. Store test results in run artifacts
+7. If tests pass unexpectedly, warn and skip to next subtask
+
+**GREEN Phase:**
+1. Compose implementation prompt with test failures
+2. Execute via Claude executor with max attempts (default: 3)
+3. Parse implementation changes
+4. Apply changes to filesystem
+5. Run tests to verify passing (green state)
+6. If tests still fail after max attempts:
+   - Save current state
+   - Emit pause event
+   - Return resumable checkpoint
+7. If tests pass, proceed to COMMIT
+
+**COMMIT Phase:**
+1. Verify all tests pass
+2. Check coverage meets thresholds (if enabled)
+3. Generate conventional commit message
+4. Stage test files + implementation files
+5. Commit with message
+6. Update subtask status to 'done'
+7. Emit commit event with SHA
+8. Continue to next subtask
+
+### 6. Branch & Tag Management
+
+**Integration with existing tag system:**
+- Use `scripts/modules/task-manager/tag-management.js`
+- Explicit tag switching when branch created
+- Store branch ↔ tag mapping in run state
+
+**Branch Naming:**
+- Pattern from config: `{tag}/task-{id}-{slug}`
+- Default: `analytics/task-42-user-metrics`
+- Sanitize: lowercase, replace spaces with hyphens
+
+### 7. Run Artifacts & State Persistence
+
+**Directory structure:**
+```
+.taskmaster/reports/runs/<run-id>/
+├── manifest.json          # run metadata
+├── log.jsonl              # event stream
+├── commits.txt            # commit SHAs
+├── test-results/
+│   ├── subtask-42.1-red.json
+│   ├── subtask-42.1-green.json
+│   ├── subtask-42.2-red.json
+│   ├── subtask-42.2-green-attempt1.json
+│   ├── subtask-42.2-green-attempt2.json
+│   └── final-suite.json
+└── state.json             # resumable checkpoint
+```
+
+**manifest.json:**
+```json
+{
+  "runId": "2025-01-15-142033",
+  "taskId": "42",
+  "tag": "analytics",
+  "branch": "analytics/task-42-user-metrics",
+  "startTime": "2025-01-15T14:20:33Z",
+  "endTime": null,
+  "status": "in-progress",
+  "currentPhase": "subtask-loop",
+  "currentSubtask": "42.2",
+  "subtasksCompleted": ["42.1"],
+  "subtasksFailed": [],
+  "totalCommits": 1
+}
+```
+
+**log.jsonl** (append-only event log):
+```jsonl
+{"ts":"2025-01-15T14:20:33Z","event":"phase:start","phase":"preflight","status":"ok"}
+{"ts":"2025-01-15T14:21:00Z","event":"subtask:start","subtask":"42.1","phase":"red"}
+{"ts":"2025-01-15T14:22:00Z","event":"test:run","subtask":"42.1","phase":"red","results":{"passed":0,"failed":3}}
+{"ts":"2025-01-15T14:23:00Z","event":"subtask:start","subtask":"42.1","phase":"green"}
+{"ts":"2025-01-15T14:24:30Z","event":"test:run","subtask":"42.1","phase":"green","attempt":1,"results":{"passed":3,"failed":0}}
+{"ts":"2025-01-15T14:24:35Z","event":"commit:created","subtask":"42.1","sha":"a1b2c3d","message":"feat(metrics): add metrics schema (task 42.1)"}
+```
+
+### 8. CLI Command Implementation
+
+**Update `tm autopilot` command:**
+- Remove `--dry-run` only behavior
+- Execute actual workflow when flag not present
+- Add progress reporting via orchestrator events
+- Support `--no-confirm` for CI/automation
+- Support `--max-attempts` to override default
+
+**Real-time output:**
+```bash
+$ tm autopilot 42
+
+🚀 Starting autopilot for Task #42 [analytics]: User metrics tracking
+
+✓ Preflight checks passed
+✓ Created branch: analytics/task-42-user-metrics
+✓ Set active tag: analytics
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[1/3] Subtask 42.1: Add metrics schema
+
+  RED   Generating tests... ⏳
+  RED   ✓ Tests created: src/__tests__/schema.test.js
+  RED   ✓ Tests failing: 3 failed, 0 passed
+
+  GREEN Implementing code... ⏳
+  GREEN ✓ Tests passing: 3 passed, 0 failed (attempt 1)
+
+  COMMIT ✓ Committed: a1b2c3d
+         "feat(metrics): add metrics schema (task 42.1)"
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[2/3] Subtask 42.2: Add collection endpoint
+  ...
+```
+
+## Success Criteria
+- Can execute a simple task end-to-end without manual intervention
+- All commits made on feature branch, never on default branch
+- Tests are generated before implementation (RED → GREEN order enforced)
+- Only commits when tests pass and coverage meets threshold
+- Run state is persisted and can be inspected post-run
+- Clear error messages when things go wrong
+- Orchestrator events allow CLI to show live progress
+
+## Configuration
+
+**Add to `.taskmaster/config.json`:**
+```json
+{
+  "autopilot": {
+    "enabled": true,
+    "requireCleanWorkingTree": true,
+    "commitTemplate": "{type}({scope}): {msg}",
+    "defaultCommitType": "feat",
+    "maxGreenAttempts": 3,
+    "testTimeout": 300000
+  },
+  "test": {
+    "runner": "auto",
+    "coverageThresholds": {
+      "lines": 80,
+      "branches": 80,
+      "functions": 80,
+      "statements": 80
+    },
+    "targetedRunPattern": "**/*.test.js"
+  },
+  "git": {
+    "branchPattern": "{tag}/task-{id}-{slug}",
+    "defaultRemote": "origin"
+  }
+}
+```
+
+## Out of Scope (defer to Phase 2)
+- PR creation (gh integration)
+- Resume functionality (`--resume` flag)
+- Lint/format step
+- Multiple executor support (only Claude)
+
+## Implementation Order
+1. GitAdapter with safety checks
+2. TestRunnerAdapter with detection logic
+3. WorkflowOrchestrator state machine skeleton
+4. RED phase: test generation integration
+5. GREEN phase: implementation with retry logic
+6. COMMIT phase: gating and persistence
+7. CLI command wiring with event handling
+8. Run artifacts and logging
+
+## Testing Strategy
+- Unit tests for each adapter (mock git/test commands)
+- Integration tests with real git repo (temporary directory)
+- End-to-end test with sample task in test project
+- Verify no commits on default branch (security test)
+- Verify commit gating works (force test failure, ensure no commit)
+
+## Dependencies
+- Phase 0 completed (CLI skeleton, preflight checks)
+- Existing TaskService and executor infrastructure
+- Surgical Test Generator prompt file exists
+
+## Estimated Effort
+2-3 weeks
+
+## Risks & Mitigations
+- **Risk:** Test generation produces invalid/wrong tests
+  - **Mitigation:** Use Surgical Test Generator prompt, add manual review step in early iterations
+
+- **Risk:** Implementation attempts timeout/fail repeatedly
+  - **Mitigation:** Max attempts with pause/resume; store state for manual intervention
+
+- **Risk:** Coverage parsing fails on different test frameworks
+  - **Mitigation:** Start with one framework (vitest), add parsers incrementally
+
+- **Risk:** Git operations fail (conflicts, permissions)
+  - **Mitigation:** Detailed error messages, save state before destructive ops
+
+## Validation
+Test with:
+- Simple task (1 subtask, clear requirements)
+- Medium task (3 subtasks with dependencies)
+- Task requiring multiple GREEN attempts
+- Task with dirty working tree (should error)
+- Task on default branch (should error)
+- Project without test command (should error with helpful message)