feat: transform QA agent into Test Architect with advanced quality capabilities

- Add 6 specialized quality assessment commands - Implement risk-based testing with scoring - Create quality gate system with deterministic decisions - Add comprehensive test design and NFR validation - Update documentation with stage-based workflow integration
2025-08-12 12:59:22 -05:00
parent ffcb4d4bf2
commit b7a34b4fc6
11 changed files with 2321 additions and 96 deletions
--- a/bmad-core/tasks/test-design.md
+++ b/bmad-core/tasks/test-design.md
@@ -0,0 +1,528 @@
+# test-design
+
+Create comprehensive test scenarios with appropriate test level recommendations for story implementation.
+
+## Inputs
+
+```yaml
+required:
+  - story_id: "{epic}.{story}" # e.g., "1.3"
+  - story_path: "docs/stories/{epic}.{story}.*.md"
+  - story_title: "{title}" # If missing, derive from story file H1
+  - story_slug: "{slug}" # If missing, derive from title (lowercase, hyphenated)
+```
+
+## Purpose
+
+Design a complete test strategy that identifies what to test, at which level (unit/integration/e2e), and why. This ensures efficient test coverage without redundancy while maintaining appropriate test boundaries.
+
+## Test Level Decision Framework
+
+### Unit Tests
+
+**When to use:**
+
+- Testing pure functions and business logic
+- Algorithm correctness
+- Input validation and data transformation
+- Error handling in isolated components
+- Complex calculations or state machines
+
+**Characteristics:**
+
+- Fast execution (immediate feedback)
+- No external dependencies (DB, API, file system)
+- Highly maintainable and stable
+- Easy to debug failures
+
+**Example scenarios:**
+
+```yaml
+unit_test:
+  component: "PriceCalculator"
+  scenario: "Calculate discount with multiple rules"
+  justification: "Complex business logic with multiple branches"
+  mock_requirements: "None - pure function"
+```
+
+### Integration Tests
+
+**When to use:**
+
+- Testing component interactions
+- Database operations and queries
+- API endpoint behavior
+- Service layer orchestration
+- External service integration (with test doubles)
+
+**Characteristics:**
+
+- Moderate execution time
+- May use test databases or containers
+- Tests multiple components together
+- Validates contracts between components
+
+**Example scenarios:**
+
+```yaml
+integration_test:
+  components: ["UserService", "UserRepository", "Database"]
+  scenario: "Create user with duplicate email check"
+  justification: "Tests transaction boundaries and constraint handling"
+  test_doubles: "Mock email service, real test database"
+```
+
+### End-to-End Tests
+
+**When to use:**
+
+- Critical user journeys
+- Cross-system workflows
+- UI interaction flows
+- Full stack validation
+- Production-like scenario testing
+
+**Characteristics:**
+
+- Keep under 90 seconds per test
+- Tests complete user scenarios
+- Uses real or production-like environment
+- Higher maintenance cost
+- More prone to flakiness
+
+**Example scenarios:**
+
+```yaml
+e2e_test:
+  flow: "Complete purchase flow"
+  scenario: "User browses, adds to cart, and completes checkout"
+  justification: "Critical business flow requiring full stack validation"
+  environment: "Staging with test payment gateway"
+```
+
+## Test Design Process
+
+### 1. Analyze Story Requirements
+
+Break down each acceptance criterion into testable scenarios:
+
+```yaml
+acceptance_criterion: "User can reset password via email"
+test_scenarios:
+  - level: unit
+    what: "Password validation rules"
+    why: "Complex regex and business rules"
+
+  - level: integration
+    what: "Password reset token generation and storage"
+    why: "Database interaction with expiry logic"
+
+  - level: integration
+    what: "Email service integration"
+    why: "External service with retry logic"
+
+  - level: e2e
+    what: "Complete password reset flow"
+    why: "Critical security flow needing full validation"
+```
+
+### 2. Apply Test Level Heuristics
+
+Use these rules to determine appropriate test levels:
+
+```markdown
+## Test Level Selection Rules
+
+### Favor Unit Tests When:
+
+- Logic can be isolated
+- No side effects involved
+- Fast feedback needed
+- High cyclomatic complexity
+
+### Favor Integration Tests When:
+
+- Testing persistence layer
+- Validating service contracts
+- Testing middleware/interceptors
+- Component boundaries critical
+
+### Favor E2E Tests When:
+
+- User-facing critical paths
+- Multi-system interactions
+- Regulatory compliance scenarios
+- Visual regression important
+
+### Anti-patterns to Avoid:
+
+- E2E testing for business logic validation
+- Unit testing framework behavior
+- Integration testing third-party libraries
+- Duplicate coverage across levels
+
+### Duplicate Coverage Guard
+
+**Before adding any test, check:**
+
+1. Is this already tested at a lower level?
+2. Can a unit test cover this instead of integration?
+3. Can an integration test cover this instead of E2E?
+
+**Coverage overlap is only acceptable when:**
+
+- Testing different aspects (unit: logic, integration: interaction, e2e: user experience)
+- Critical paths requiring defense in depth
+- Regression prevention for previously broken functionality
+```
+
+### 3. Design Test Scenarios
+
+**Test ID Format:** `{EPIC}.{STORY}-{LEVEL}-{SEQ}`
+
+- Example: `1.3-UNIT-001`, `1.3-INT-002`, `1.3-E2E-001`
+- Ensures traceability across all artifacts
+
+**Naming Convention:**
+
+- Unit: `test_{component}_{scenario}`
+- Integration: `test_{flow}_{interaction}`
+- E2E: `test_{journey}_{outcome}`
+
+**Risk Linkage:**
+
+- Tag tests with risk IDs they mitigate
+- Prioritize tests for high-risk areas (P0)
+- Link to risk profile when available
+
+For each identified test need:
+
+```yaml
+test_scenario:
+  id: "1.3-INT-002"
+  requirement: "AC2: Rate limiting on login attempts"
+  mitigates_risks: ["SEC-001", "PERF-003"] # Links to risk profile
+  priority: P0 # Based on risk score
+
+  unit_tests:
+    - name: "RateLimiter calculates window correctly"
+      input: "Timestamp array"
+      expected: "Correct window calculation"
+
+  integration_tests:
+    - name: "Login endpoint enforces rate limit"
+      setup: "5 failed attempts"
+      action: "6th attempt"
+      expected: "429 response with retry-after header"
+
+  e2e_tests:
+    - name: "User sees rate limit message"
+      setup: "Trigger rate limit"
+      validation: "Error message displayed, retry timer shown"
+```
+
+## Deterministic Test Level Minimums
+
+**Per Acceptance Criterion:**
+
+- At least 1 unit test for business logic
+- At least 1 integration test if multiple components interact
+- At least 1 E2E test if it's a user-facing feature
+
+**Exceptions:**
+
+- Pure UI changes: May skip unit tests
+- Pure logic changes: May skip E2E tests
+- Infrastructure changes: May focus on integration tests
+
+**When in doubt:** Start with unit tests, add integration for interactions, E2E for critical paths only.
+
+## Test Quality Standards
+
+### Core Testing Principles
+
+**No Flaky Tests:** Ensure reliability through proper async handling, explicit waits, and atomic test design.
+
+**No Hard Waits/Sleeps:** Use dynamic waiting strategies (e.g., polling, event-based triggers).
+
+**Stateless & Parallel-Safe:** Tests run independently; use cron jobs or semaphores only if unavoidable.
+
+**No Order Dependency:** Every it/describe/context block works in isolation (supports .only execution).
+
+**Self-Cleaning Tests:** Test sets up its own data and automatically deletes/deactivates entities created during testing.
+
+**Tests Live Near Source Code:** Co-locate test files with the code they validate (e.g., `*.spec.js` alongside components).
+
+### Execution Strategy
+
+**Shifted Left:**
+
+- Start with local environments or ephemeral stacks
+- Validate functionality across all deployment stages (local → dev → stage)
+
+**Low Maintenance:** Minimize manual upkeep (avoid brittle selectors, do not repeat UI actions, leverage APIs).
+
+**CI Execution Evidence:** Integrate into pipelines with clear logs/artifacts.
+
+**Visibility:** Generate test reports (e.g., JUnit XML, HTML) for failures and trends.
+
+### Coverage Requirements
+
+**Release Confidence:**
+
+- Happy Path: Core user journeys are prioritized
+- Edge Cases: Critical error/validation scenarios are covered
+- Feature Flags: Test both enabled and disabled states where applicable
+
+### Test Design Rules
+
+**Assertions:** Keep them explicit in tests; avoid abstraction into helpers. Use parametrized tests for soft assertions.
+
+**Naming:** Follow conventions (e.g., `describe('Component')`, `it('should do X when Y')`).
+
+**Size:** Aim for files ≤200 lines; split/chunk large tests logically.
+
+**Speed:** Target individual tests ≤90 seconds; optimize slow setups (e.g., shared fixtures).
+
+**Careful Abstractions:** Favor readability over DRY when balancing helper reuse (page objects are okay, assertion logic is not).
+
+**Test Cleanup:** Ensure tests clean up resources they create (e.g., closing browser, deleting test data).
+
+**Deterministic Flow:** Tests should refrain from using conditionals (e.g., if/else) to control flow or try/catch blocks where possible.
+
+### API Testing Standards
+
+- Tests must not depend on hardcoded data → use factories and per-test setup
+- Always test both happy path and negative/error cases
+- API tests should run parallel safely (no global state shared)
+- Test idempotency where applicable (e.g., duplicate requests)
+- Tests should clean up their data
+- Response logs should only be printed in case of failure
+- Auth tests must validate token expiration and renewal
+
+## Outputs
+
+### Output 1: Test Design Document
+
+**Save to:** `docs/qa/assessments/{epic}.{story}-test-design-{YYYYMMDD}.md`
+
+Generate a comprehensive test design document:
+
+```markdown
+# Test Design: Story {epic}.{story}
+
+Date: {date}
+Reviewer: Quinn (Test Architect)
+
+## Test Strategy Overview
+
+- Total test scenarios: X
+- Unit tests: Y (A%)
+- Integration tests: Z (B%)
+- E2E tests: W (C%)
+
+## Test Level Rationale
+
+[Explain why this distribution was chosen]
+
+## Detailed Test Scenarios
+
+### Requirement: AC1 - {description}
+
+#### Unit Tests (3 scenarios)
+
+1. **ID**: 1.3-UNIT-001
+   **Test**: Validate input format
+   - **Why Unit**: Pure validation logic
+   - **Coverage**: Input edge cases
+   - **Mocks**: None needed
+   - **Mitigates**: DATA-001 (if applicable)
+
+#### Integration Tests (2 scenarios)
+
+1. **ID**: 1.3-INT-001
+   **Test**: Service processes valid request
+   - **Why Integration**: Multiple components involved
+   - **Coverage**: Happy path + error handling
+   - **Test Doubles**: Mock external API
+   - **Mitigates**: TECH-002
+
+#### E2E Tests (1 scenario)
+
+1. **ID**: 1.3-E2E-001
+   **Test**: Complete user workflow
+   - **Why E2E**: Critical user journey
+   - **Coverage**: Full stack validation
+   - **Environment**: Staging
+   - **Max Duration**: 90 seconds
+   - **Mitigates**: BUS-001
+
+[Continue for all requirements...]
+
+## Test Data Requirements
+
+### Unit Test Data
+
+- Static fixtures for calculations
+- Edge case values arrays
+
+### Integration Test Data
+
+- Test database seeds
+- API response fixtures
+
+### E2E Test Data
+
+- Test user accounts
+- Sandbox environment data
+
+## Mock/Stub Strategy
+
+### What to Mock
+
+- External services (payment, email)
+- Time-dependent functions
+- Random number generators
+
+### What NOT to Mock
+
+- Core business logic
+- Database in integration tests
+- Critical security functions
+
+## Test Execution Implementation
+
+### Parallel Execution
+
+- All unit tests: Fully parallel (stateless requirement)
+- Integration tests: Parallel with isolated databases
+- E2E tests: Sequential or limited parallelism
+
+### Execution Order
+
+1. Unit tests first (fail fast)
+2. Integration tests second
+3. E2E tests last (expensive, max 90 seconds each)
+
+## Risk-Based Test Priority
+
+### P0 - Must Have (Linked to Critical/High Risks)
+
+- Security-related tests (SEC-\* risks)
+- Data integrity tests (DATA-\* risks)
+- Critical business flow tests (BUS-\* risks)
+- Tests for risks scored ≥6 in risk profile
+
+### P1 - Should Have (Medium Risks)
+
+- Edge case coverage
+- Performance tests (PERF-\* risks)
+- Error recovery tests
+- Tests for risks scored 4-5
+
+### P2 - Nice to Have (Low Risks)
+
+- UI polish tests
+- Minor validation tests
+- Tests for risks scored ≤3
+
+## Test Maintenance Considerations
+
+### High Maintenance Tests
+
+[List tests that may need frequent updates]
+
+### Stability Measures
+
+- No retry strategies (tests must be deterministic)
+- Dynamic waits only (no hard sleeps)
+- Environment isolation
+- Self-cleaning test data
+
+## Coverage Goals
+
+### Unit Test Coverage
+
+- Target: 80% line coverage
+- Focus: Business logic, calculations
+
+### Integration Coverage
+
+- Target: All API endpoints
+- Focus: Contract validation
+
+### E2E Coverage
+
+- Target: Critical paths only
+- Focus: User value delivery
+```
+
+## Test Level Smells to Flag
+
+### Over-testing Smells
+
+- Same logic tested at multiple levels
+- E2E tests for calculations
+- Integration tests for framework features
+
+### Under-testing Smells
+
+- No unit tests for complex logic
+- Missing integration tests for data operations
+- No E2E tests for critical user paths
+
+### Wrong Level Smells
+
+- Unit tests with real database
+- E2E tests checking calculation results
+- Integration tests mocking everything
+
+## Quality Indicators
+
+Good test design shows:
+
+- Clear level separation
+- No redundant coverage
+- Fast feedback from unit tests
+- Reliable integration tests
+- Focused e2e tests
+
+## Key Principles
+
+- Test at the lowest appropriate level
+- One clear owner per test
+- Fast tests run first
+- Mock at boundaries, not internals
+- E2E for user value, not implementation
+- Maintain test/production parity where critical
+- Tests must be atomic and self-contained
+- No shared state between tests
+- Explicit assertions in test files (not helpers)
+
+### Output 2: Story Hook Line
+
+**Print this line for review task to quote:**
+
+```text
+Test design: docs/qa/assessments/{epic}.{story}-test-design-{YYYYMMDD}.md
+```
+
+**For traceability:** This planning document will be referenced by trace-requirements task.
+
+### Output 3: Test Count Summary
+
+**Print summary for quick reference:**
+
+```yaml
+test_summary:
+  total: { total_count }
+  by_level:
+    unit: { unit_count }
+    integration: { int_count }
+    e2e: { e2e_count }
+  by_priority:
+    P0: { p0_count }
+    P1: { p1_count }
+    P2: { p2_count }
+  coverage_gaps: [] # List any ACs without tests
+```