Files
BMAD-METHOD/bmad-core/tasks/test-design.md
Murat Ozcan b7a34b4fc6 feat: transform QA agent into Test Architect with advanced quality capabilities
- Add 6 specialized quality assessment commands
  - Implement risk-based testing with scoring
  - Create quality gate system with deterministic decisions
  - Add comprehensive test design and NFR validation
  - Update documentation with stage-based workflow integration
2025-08-12 12:59:22 -05:00

13 KiB

test-design

Create comprehensive test scenarios with appropriate test level recommendations for story implementation.

Inputs

required:
  - story_id: "{epic}.{story}" # e.g., "1.3"
  - story_path: "docs/stories/{epic}.{story}.*.md"
  - story_title: "{title}" # If missing, derive from story file H1
  - story_slug: "{slug}" # If missing, derive from title (lowercase, hyphenated)

Purpose

Design a complete test strategy that identifies what to test, at which level (unit/integration/e2e), and why. This ensures efficient test coverage without redundancy while maintaining appropriate test boundaries.

Test Level Decision Framework

Unit Tests

When to use:

  • Testing pure functions and business logic
  • Algorithm correctness
  • Input validation and data transformation
  • Error handling in isolated components
  • Complex calculations or state machines

Characteristics:

  • Fast execution (immediate feedback)
  • No external dependencies (DB, API, file system)
  • Highly maintainable and stable
  • Easy to debug failures

Example scenarios:

unit_test:
  component: "PriceCalculator"
  scenario: "Calculate discount with multiple rules"
  justification: "Complex business logic with multiple branches"
  mock_requirements: "None - pure function"

Integration Tests

When to use:

  • Testing component interactions
  • Database operations and queries
  • API endpoint behavior
  • Service layer orchestration
  • External service integration (with test doubles)

Characteristics:

  • Moderate execution time
  • May use test databases or containers
  • Tests multiple components together
  • Validates contracts between components

Example scenarios:

integration_test:
  components: ["UserService", "UserRepository", "Database"]
  scenario: "Create user with duplicate email check"
  justification: "Tests transaction boundaries and constraint handling"
  test_doubles: "Mock email service, real test database"

End-to-End Tests

When to use:

  • Critical user journeys
  • Cross-system workflows
  • UI interaction flows
  • Full stack validation
  • Production-like scenario testing

Characteristics:

  • Keep under 90 seconds per test
  • Tests complete user scenarios
  • Uses real or production-like environment
  • Higher maintenance cost
  • More prone to flakiness

Example scenarios:

e2e_test:
  flow: "Complete purchase flow"
  scenario: "User browses, adds to cart, and completes checkout"
  justification: "Critical business flow requiring full stack validation"
  environment: "Staging with test payment gateway"

Test Design Process

1. Analyze Story Requirements

Break down each acceptance criterion into testable scenarios:

acceptance_criterion: "User can reset password via email"
test_scenarios:
  - level: unit
    what: "Password validation rules"
    why: "Complex regex and business rules"

  - level: integration
    what: "Password reset token generation and storage"
    why: "Database interaction with expiry logic"

  - level: integration
    what: "Email service integration"
    why: "External service with retry logic"

  - level: e2e
    what: "Complete password reset flow"
    why: "Critical security flow needing full validation"

2. Apply Test Level Heuristics

Use these rules to determine appropriate test levels:

## Test Level Selection Rules

### Favor Unit Tests When:

- Logic can be isolated
- No side effects involved
- Fast feedback needed
- High cyclomatic complexity

### Favor Integration Tests When:

- Testing persistence layer
- Validating service contracts
- Testing middleware/interceptors
- Component boundaries critical

### Favor E2E Tests When:

- User-facing critical paths
- Multi-system interactions
- Regulatory compliance scenarios
- Visual regression important

### Anti-patterns to Avoid:

- E2E testing for business logic validation
- Unit testing framework behavior
- Integration testing third-party libraries
- Duplicate coverage across levels

### Duplicate Coverage Guard

**Before adding any test, check:**

1. Is this already tested at a lower level?
2. Can a unit test cover this instead of integration?
3. Can an integration test cover this instead of E2E?

**Coverage overlap is only acceptable when:**

- Testing different aspects (unit: logic, integration: interaction, e2e: user experience)
- Critical paths requiring defense in depth
- Regression prevention for previously broken functionality

3. Design Test Scenarios

Test ID Format: {EPIC}.{STORY}-{LEVEL}-{SEQ}

  • Example: 1.3-UNIT-001, 1.3-INT-002, 1.3-E2E-001
  • Ensures traceability across all artifacts

Naming Convention:

  • Unit: test_{component}_{scenario}
  • Integration: test_{flow}_{interaction}
  • E2E: test_{journey}_{outcome}

Risk Linkage:

  • Tag tests with risk IDs they mitigate
  • Prioritize tests for high-risk areas (P0)
  • Link to risk profile when available

For each identified test need:

test_scenario:
  id: "1.3-INT-002"
  requirement: "AC2: Rate limiting on login attempts"
  mitigates_risks: ["SEC-001", "PERF-003"] # Links to risk profile
  priority: P0 # Based on risk score

  unit_tests:
    - name: "RateLimiter calculates window correctly"
      input: "Timestamp array"
      expected: "Correct window calculation"

  integration_tests:
    - name: "Login endpoint enforces rate limit"
      setup: "5 failed attempts"
      action: "6th attempt"
      expected: "429 response with retry-after header"

  e2e_tests:
    - name: "User sees rate limit message"
      setup: "Trigger rate limit"
      validation: "Error message displayed, retry timer shown"

Deterministic Test Level Minimums

Per Acceptance Criterion:

  • At least 1 unit test for business logic
  • At least 1 integration test if multiple components interact
  • At least 1 E2E test if it's a user-facing feature

Exceptions:

  • Pure UI changes: May skip unit tests
  • Pure logic changes: May skip E2E tests
  • Infrastructure changes: May focus on integration tests

When in doubt: Start with unit tests, add integration for interactions, E2E for critical paths only.

Test Quality Standards

Core Testing Principles

No Flaky Tests: Ensure reliability through proper async handling, explicit waits, and atomic test design.

No Hard Waits/Sleeps: Use dynamic waiting strategies (e.g., polling, event-based triggers).

Stateless & Parallel-Safe: Tests run independently; use cron jobs or semaphores only if unavoidable.

No Order Dependency: Every it/describe/context block works in isolation (supports .only execution).

Self-Cleaning Tests: Test sets up its own data and automatically deletes/deactivates entities created during testing.

Tests Live Near Source Code: Co-locate test files with the code they validate (e.g., *.spec.js alongside components).

Execution Strategy

Shifted Left:

  • Start with local environments or ephemeral stacks
  • Validate functionality across all deployment stages (local → dev → stage)

Low Maintenance: Minimize manual upkeep (avoid brittle selectors, do not repeat UI actions, leverage APIs).

CI Execution Evidence: Integrate into pipelines with clear logs/artifacts.

Visibility: Generate test reports (e.g., JUnit XML, HTML) for failures and trends.

Coverage Requirements

Release Confidence:

  • Happy Path: Core user journeys are prioritized
  • Edge Cases: Critical error/validation scenarios are covered
  • Feature Flags: Test both enabled and disabled states where applicable

Test Design Rules

Assertions: Keep them explicit in tests; avoid abstraction into helpers. Use parametrized tests for soft assertions.

Naming: Follow conventions (e.g., describe('Component'), it('should do X when Y')).

Size: Aim for files ≤200 lines; split/chunk large tests logically.

Speed: Target individual tests ≤90 seconds; optimize slow setups (e.g., shared fixtures).

Careful Abstractions: Favor readability over DRY when balancing helper reuse (page objects are okay, assertion logic is not).

Test Cleanup: Ensure tests clean up resources they create (e.g., closing browser, deleting test data).

Deterministic Flow: Tests should refrain from using conditionals (e.g., if/else) to control flow or try/catch blocks where possible.

API Testing Standards

  • Tests must not depend on hardcoded data → use factories and per-test setup
  • Always test both happy path and negative/error cases
  • API tests should run parallel safely (no global state shared)
  • Test idempotency where applicable (e.g., duplicate requests)
  • Tests should clean up their data
  • Response logs should only be printed in case of failure
  • Auth tests must validate token expiration and renewal

Outputs

Output 1: Test Design Document

Save to: docs/qa/assessments/{epic}.{story}-test-design-{YYYYMMDD}.md

Generate a comprehensive test design document:

# Test Design: Story {epic}.{story}

Date: {date}
Reviewer: Quinn (Test Architect)

## Test Strategy Overview

- Total test scenarios: X
- Unit tests: Y (A%)
- Integration tests: Z (B%)
- E2E tests: W (C%)

## Test Level Rationale

[Explain why this distribution was chosen]

## Detailed Test Scenarios

### Requirement: AC1 - {description}

#### Unit Tests (3 scenarios)

1. **ID**: 1.3-UNIT-001
   **Test**: Validate input format
   - **Why Unit**: Pure validation logic
   - **Coverage**: Input edge cases
   - **Mocks**: None needed
   - **Mitigates**: DATA-001 (if applicable)

#### Integration Tests (2 scenarios)

1. **ID**: 1.3-INT-001
   **Test**: Service processes valid request
   - **Why Integration**: Multiple components involved
   - **Coverage**: Happy path + error handling
   - **Test Doubles**: Mock external API
   - **Mitigates**: TECH-002

#### E2E Tests (1 scenario)

1. **ID**: 1.3-E2E-001
   **Test**: Complete user workflow
   - **Why E2E**: Critical user journey
   - **Coverage**: Full stack validation
   - **Environment**: Staging
   - **Max Duration**: 90 seconds
   - **Mitigates**: BUS-001

[Continue for all requirements...]

## Test Data Requirements

### Unit Test Data

- Static fixtures for calculations
- Edge case values arrays

### Integration Test Data

- Test database seeds
- API response fixtures

### E2E Test Data

- Test user accounts
- Sandbox environment data

## Mock/Stub Strategy

### What to Mock

- External services (payment, email)
- Time-dependent functions
- Random number generators

### What NOT to Mock

- Core business logic
- Database in integration tests
- Critical security functions

## Test Execution Implementation

### Parallel Execution

- All unit tests: Fully parallel (stateless requirement)
- Integration tests: Parallel with isolated databases
- E2E tests: Sequential or limited parallelism

### Execution Order

1. Unit tests first (fail fast)
2. Integration tests second
3. E2E tests last (expensive, max 90 seconds each)

## Risk-Based Test Priority

### P0 - Must Have (Linked to Critical/High Risks)

- Security-related tests (SEC-\* risks)
- Data integrity tests (DATA-\* risks)
- Critical business flow tests (BUS-\* risks)
- Tests for risks scored ≥6 in risk profile

### P1 - Should Have (Medium Risks)

- Edge case coverage
- Performance tests (PERF-\* risks)
- Error recovery tests
- Tests for risks scored 4-5

### P2 - Nice to Have (Low Risks)

- UI polish tests
- Minor validation tests
- Tests for risks scored ≤3

## Test Maintenance Considerations

### High Maintenance Tests

[List tests that may need frequent updates]

### Stability Measures

- No retry strategies (tests must be deterministic)
- Dynamic waits only (no hard sleeps)
- Environment isolation
- Self-cleaning test data

## Coverage Goals

### Unit Test Coverage

- Target: 80% line coverage
- Focus: Business logic, calculations

### Integration Coverage

- Target: All API endpoints
- Focus: Contract validation

### E2E Coverage

- Target: Critical paths only
- Focus: User value delivery

Test Level Smells to Flag

Over-testing Smells

  • Same logic tested at multiple levels
  • E2E tests for calculations
  • Integration tests for framework features

Under-testing Smells

  • No unit tests for complex logic
  • Missing integration tests for data operations
  • No E2E tests for critical user paths

Wrong Level Smells

  • Unit tests with real database
  • E2E tests checking calculation results
  • Integration tests mocking everything

Quality Indicators

Good test design shows:

  • Clear level separation
  • No redundant coverage
  • Fast feedback from unit tests
  • Reliable integration tests
  • Focused e2e tests

Key Principles

  • Test at the lowest appropriate level
  • One clear owner per test
  • Fast tests run first
  • Mock at boundaries, not internals
  • E2E for user value, not implementation
  • Maintain test/production parity where critical
  • Tests must be atomic and self-contained
  • No shared state between tests
  • Explicit assertions in test files (not helpers)

Output 2: Story Hook Line

Print this line for review task to quote:

Test design: docs/qa/assessments/{epic}.{story}-test-design-{YYYYMMDD}.md

For traceability: This planning document will be referenced by trace-requirements task.

Output 3: Test Count Summary

Print summary for quick reference:

test_summary:
  total: { total_count }
  by_level:
    unit: { unit_count }
    integration: { int_count }
    e2e: { e2e_count }
  by_priority:
    P0: { p0_count }
    P1: { p1_count }
    P2: { p2_count }
  coverage_gaps: [] # List any ACs without tests