Files
claude-task-master/.taskmaster/docs/tdd-workflow-phase-3-extensibility-guardrails.md
Ralph Khreish ccb87a516a feat: implement tdd workflow (#1309)
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-18 16:29:03 +02:00

14 KiB

Phase 3: Extensibility + Guardrails - Autonomous TDD Workflow

Objective

Add multi-language/framework support, enhanced safety guardrails, TUI interface, and extensibility for IDE/editor integration.

Scope

  • Multi-language test runner support (pytest, go test, etc.)
  • Enhanced safety: diff preview, confirmation gates, minimal-change prompts
  • Optional TUI panel with tmux integration
  • State-based extension API for IDE integration
  • Parallel subtask execution (experimental)

Deliverables

1. Multi-Language Test Runner Support

Extend TestRunnerAdapter:

class TestRunnerAdapter {
  // Existing methods...

  async detectLanguage(): Promise<Language>
  async detectFramework(language: Language): Promise<Framework>
  async getFrameworkAdapter(framework: Framework): Promise<FrameworkAdapter>
}

enum Language {
  JavaScript = 'javascript',
  TypeScript = 'typescript',
  Python = 'python',
  Go = 'go',
  Rust = 'rust'
}

enum Framework {
  Vitest = 'vitest',
  Jest = 'jest',
  Pytest = 'pytest',
  GoTest = 'gotest',
  CargoTest = 'cargotest'
}

interface FrameworkAdapter {
  runTargeted(pattern: string): Promise<TestResults>
  runAll(): Promise<TestResults>
  parseCoverage(output: string): Promise<CoverageReport>
  getTestFilePattern(): string
  getTestFileExtension(): string
}

Framework-specific adapters:

PytestAdapter (packages/tm-core/src/services/test-adapters/pytest-adapter.ts):

class PytestAdapter implements FrameworkAdapter {
  async runTargeted(pattern: string): Promise<TestResults> {
    const output = await exec(`pytest ${pattern} --json-report`)
    return this.parseResults(output)
  }

  async runAll(): Promise<TestResults> {
    const output = await exec('pytest --cov --json-report')
    return this.parseResults(output)
  }

  parseCoverage(output: string): Promise<CoverageReport> {
    // Parse pytest-cov XML output
  }

  getTestFilePattern(): string {
    return '**/test_*.py'
  }

  getTestFileExtension(): string {
    return '.py'
  }
}

GoTestAdapter (packages/tm-core/src/services/test-adapters/gotest-adapter.ts):

class GoTestAdapter implements FrameworkAdapter {
  async runTargeted(pattern: string): Promise<TestResults> {
    const output = await exec(`go test ${pattern} -json`)
    return this.parseResults(output)
  }

  async runAll(): Promise<TestResults> {
    const output = await exec('go test ./... -coverprofile=coverage.out -json')
    return this.parseResults(output)
  }

  parseCoverage(output: string): Promise<CoverageReport> {
    // Parse go test coverage output
  }

  getTestFilePattern(): string {
    return '**/*_test.go'
  }

  getTestFileExtension(): string {
    return '_test.go'
  }
}

Detection Logic:

async function detectFramework(): Promise<Framework> {
  // Check for package.json
  if (await exists('package.json')) {
    const pkg = await readJSON('package.json')
    if (pkg.devDependencies?.vitest) return Framework.Vitest
    if (pkg.devDependencies?.jest) return Framework.Jest
  }

  // Check for Python files
  if (await exists('pytest.ini') || await exists('setup.py')) {
    return Framework.Pytest
  }

  // Check for Go files
  if (await exists('go.mod')) {
    return Framework.GoTest
  }

  // Check for Rust files
  if (await exists('Cargo.toml')) {
    return Framework.CargoTest
  }

  throw new Error('Could not detect test framework')
}

2. Enhanced Safety Guardrails

Diff Preview Mode:

$ tm autopilot 42 --preview-diffs

[2/3] Subtask 42.2: Add collection endpoint

  RED   ✓ Tests created: src/api/__tests__/metrics.test.js

  GREEN Implementing code...

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Proposed changes (src/api/metrics.js):

  + import { MetricsSchema } from '../models/schema.js'
  +
  + export async function createMetric(data) {
  +   const validated = MetricsSchema.parse(data)
  +   const result = await db.metrics.create(validated)
  +   return result
  + }

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Apply these changes? [Y/n/e(dit)/s(kip)]
  Y - Apply and continue
  n - Reject and retry GREEN phase
  e - Open in editor for manual changes
  s - Skip this subtask

Minimal Change Enforcement:

Add to system prompt:

CRITICAL: Make MINIMAL changes to pass the failing tests.
- Only modify files directly related to the subtask
- Do not refactor existing code unless absolutely necessary
- Do not add features beyond the acceptance criteria
- Keep changes under 50 lines per file when possible
- Prefer composition over modification

Change Size Warnings:

⚠️  Large change detected:
  Files modified: 5
  Lines changed: +234, -12

This subtask was expected to be small (~50 lines).
Consider:
  - Breaking into smaller subtasks
  - Reviewing acceptance criteria
  - Checking for unintended changes

Continue anyway? [y/N]

3. TUI Interface with tmux

Layout:

┌──────────────────────────────────┬─────────────────────────────────┐
│ Task Navigator (left)            │ Executor Terminal (right)       │
│                                  │                                 │
│ Project: my-app                  │ $ tm autopilot --executor-mode  │
│ Branch: analytics/task-42        │ > Running subtask 42.2 GREEN... │
│ Tag: analytics                   │ > Implementing endpoint...      │
│                                  │ > Tests: 3 passed, 0 failed     │
│ Tasks:                           │ > Ready to commit               │
│ → 42 [in-progress] User metrics  │                                 │
│   → 42.1 [done] Schema           │ [Live output from executor]     │
│   → 42.2 [active] Endpoint ◀     │                                 │
│   → 42.3 [pending] Dashboard     │                                 │
│                                  │                                 │
│ [s] start  [p] pause  [q] quit   │                                 │
└──────────────────────────────────┴─────────────────────────────────┘

Implementation:

TUI Navigator (apps/cli/src/ui/tui/navigator.ts):

import blessed from 'blessed'

class AutopilotTUI {
  private screen: blessed.Widgets.Screen
  private taskList: blessed.Widgets.ListElement
  private statusBox: blessed.Widgets.BoxElement
  private executorPane: string  // tmux pane ID

  async start(taskId?: string) {
    // Create blessed screen
    this.screen = blessed.screen()

    // Create task list widget
    this.taskList = blessed.list({
      label: 'Tasks',
      keys: true,
      vi: true,
      style: { selected: { bg: 'blue' } }
    })

    // Spawn tmux pane for executor
    this.executorPane = await this.spawnExecutorPane()

    // Watch state file for updates
    this.watchStateFile()

    // Handle keybindings
    this.setupKeybindings()
  }

  private async spawnExecutorPane(): Promise<string> {
    const paneId = await exec('tmux split-window -h -P -F "#{pane_id}"')
    await exec(`tmux send-keys -t ${paneId} "tm autopilot --executor-mode" Enter`)
    return paneId.trim()
  }

  private watchStateFile() {
    watch('.taskmaster/state/current-run.json', (event, filename) => {
      this.updateDisplay()
    })
  }

  private setupKeybindings() {
    this.screen.key(['s'], () => this.startTask())
    this.screen.key(['p'], () => this.pauseTask())
    this.screen.key(['q'], () => this.quit())
    this.screen.key(['up', 'down'], () => this.navigateTasks())
  }
}

Executor Mode:

$ tm autopilot 42 --executor-mode

# Runs in executor pane, writes state to shared file
# Left pane reads state file and updates display

State File (.taskmaster/state/current-run.json):

{
  "runId": "2025-01-15-142033",
  "taskId": "42",
  "status": "running",
  "currentPhase": "green",
  "currentSubtask": "42.2",
  "lastOutput": "Implementing endpoint...",
  "testsStatus": {
    "passed": 3,
    "failed": 0
  }
}

4. Extension API for IDE Integration

State-based API:

Expose run state via JSON files that IDEs can read:

  • .taskmaster/state/current-run.json - live run state
  • .taskmaster/reports/runs/<runId>/manifest.json - run metadata
  • .taskmaster/reports/runs/<runId>/log.jsonl - event stream

WebSocket API (optional):

// packages/tm-core/src/services/autopilot-server.ts
class AutopilotServer {
  private wss: WebSocketServer

  start(port: number = 7890) {
    this.wss = new WebSocketServer({ port })

    this.wss.on('connection', (ws) => {
      // Send current state
      ws.send(JSON.stringify(this.getCurrentState()))

      // Stream events
      this.orchestrator.on('*', (event) => {
        ws.send(JSON.stringify(event))
      })
    })
  }
}

Usage from IDE extension:

// VS Code extension example
const ws = new WebSocket('ws://localhost:7890')

ws.on('message', (data) => {
  const event = JSON.parse(data)

  if (event.type === 'subtask:complete') {
    vscode.window.showInformationMessage(
      `Subtask ${event.subtaskId} completed`
    )
  }
})

5. Parallel Subtask Execution (Experimental)

Dependency Analysis:

class SubtaskScheduler {
  async buildDependencyGraph(subtasks: Subtask[]): Promise<DAG> {
    const graph = new DAG()

    for (const subtask of subtasks) {
      graph.addNode(subtask.id)

      for (const depId of subtask.dependencies) {
        graph.addEdge(depId, subtask.id)
      }
    }

    return graph
  }

  async getParallelBatches(graph: DAG): Promise<Subtask[][]> {
    const batches: Subtask[][] = []
    const completed = new Set<string>()

    while (completed.size < graph.size()) {
      const ready = graph.nodes.filter(node =>
        !completed.has(node.id) &&
        node.dependencies.every(dep => completed.has(dep))
      )

      batches.push(ready)
      ready.forEach(node => completed.add(node.id))
    }

    return batches
  }
}

Parallel Execution:

$ tm autopilot 42 --parallel

[Batch 1] Running 2 subtasks in parallel:
  → 42.1: Add metrics schema
  → 42.4: Add API documentation

  42.1 RED   ✓ Tests created
  42.4 RED   ✓ Tests created

  42.1 GREEN ✓ Implementation complete
  42.4 GREEN ✓ Implementation complete

  42.1 COMMIT ✓ Committed: a1b2c3d
  42.4 COMMIT ✓ Committed: e5f6g7h

[Batch 2] Running 2 subtasks in parallel (depend on 42.1):
  → 42.2: Add collection endpoint
  → 42.3: Add dashboard widget
  ...

Conflict Detection:

async function detectConflicts(subtasks: Subtask[]): Promise<Conflict[]> {
  const conflicts: Conflict[] = []

  for (let i = 0; i < subtasks.length; i++) {
    for (let j = i + 1; j < subtasks.length; j++) {
      const filesA = await predictAffectedFiles(subtasks[i])
      const filesB = await predictAffectedFiles(subtasks[j])

      const overlap = filesA.filter(f => filesB.includes(f))

      if (overlap.length > 0) {
        conflicts.push({
          subtasks: [subtasks[i].id, subtasks[j].id],
          files: overlap
        })
      }
    }
  }

  return conflicts
}

6. Advanced Configuration

Add to .taskmaster/config.json:

{
  "autopilot": {
    "safety": {
      "previewDiffs": false,
      "maxChangeLinesPerFile": 100,
      "warnOnLargeChanges": true,
      "requireConfirmOnLargeChanges": true
    },
    "parallel": {
      "enabled": false,
      "maxConcurrent": 3,
      "detectConflicts": true
    },
    "tui": {
      "enabled": false,
      "tmuxSession": "taskmaster-autopilot"
    },
    "api": {
      "enabled": false,
      "port": 7890,
      "allowRemote": false
    }
  },
  "test": {
    "frameworks": {
      "python": {
        "runner": "pytest",
        "coverageCommand": "pytest --cov",
        "testPattern": "**/test_*.py"
      },
      "go": {
        "runner": "go test",
        "coverageCommand": "go test ./... -coverprofile=coverage.out",
        "testPattern": "**/*_test.go"
      }
    }
  }
}

CLI Updates

New commands:

tm autopilot <taskId> --tui              # Launch TUI interface
tm autopilot <taskId> --parallel         # Enable parallel execution
tm autopilot <taskId> --preview-diffs    # Show diffs before applying
tm autopilot <taskId> --executor-mode    # Run as executor pane
tm autopilot-server start                # Start WebSocket API

Success Criteria

  • Supports Python projects with pytest
  • Supports Go projects with go test
  • Diff preview prevents unwanted changes
  • TUI provides better visibility for long-running tasks
  • IDE extensions can integrate via state files or WebSocket
  • Parallel execution reduces total time for independent subtasks

Out of Scope

  • Full Electron/web GUI
  • AI executor selection UI (defer to Phase 4)
  • Multi-repository support
  • Remote execution on cloud runners

Testing Strategy

  • Test with Python project (pytest)
  • Test with Go project (go test)
  • Test diff preview UI with mock changes
  • Test parallel execution with independent subtasks
  • Test conflict detection with overlapping file changes
  • Test TUI with mock tmux environment

Dependencies

  • Phase 2 completed (PR + resumability)
  • tmux installed (for TUI)
  • blessed or ink library (for TUI rendering)

Estimated Effort

3-4 weeks

Risks & Mitigations

  • Risk: Parallel execution causes git conflicts

    • Mitigation: Conservative conflict detection, sequential fallback
  • Risk: TUI adds complexity and maintenance burden

    • Mitigation: Keep TUI optional, state-based design allows alternatives
  • Risk: Framework adapters hard to maintain across versions

    • Mitigation: Abstract common parsing logic, document adapter interface
  • Risk: Diff preview slows down workflow

    • Mitigation: Make optional, use --preview-diffs flag only when needed

Validation

Test with:

  • Python project with pytest and pytest-cov
  • Go project with go test
  • Large changes requiring confirmation
  • Parallel execution with 3+ independent subtasks
  • TUI with task selection and live status updates
  • VS Code extension reading state files