Files
BMAD-METHOD/src/bmm/workflows/testarch/test-design/checklist.md
2026-01-23 13:00:57 -06:00

17 KiB
Raw Blame History

Test Design and Risk Assessment - Validation Checklist

Prerequisites (Mode-Dependent)

System-Level Mode (Phase 3):

  • PRD exists with functional and non-functional requirements
  • ADR (Architecture Decision Record) exists
  • Architecture document available (architecture.md or tech-spec)
  • Requirements are testable and unambiguous

Epic-Level Mode (Phase 4):

  • Story markdown with clear acceptance criteria exists
  • PRD or epic documentation available
  • Architecture documents available (test-design-architecture.md + test-design-qa.md from Phase 3, if exists)
  • Requirements are testable and unambiguous

Process Steps

Step 1: Context Loading

  • PRD.md read and requirements extracted
  • Epics.md or specific epic documentation loaded
  • Story markdown with acceptance criteria analyzed
  • Architecture documents reviewed (if available)
  • Existing test coverage analyzed
  • Knowledge base fragments loaded (risk-governance, probability-impact, test-levels, test-priorities)

Step 2: Risk Assessment

  • Genuine risks identified (not just features)
  • Risks classified by category (TECH/SEC/PERF/DATA/BUS/OPS)
  • Probability scored (1-3 for each risk)
  • Impact scored (1-3 for each risk)
  • Risk scores calculated (probability × impact)
  • High-priority risks (score ≥6) flagged
  • Mitigation plans defined for high-priority risks
  • Owners assigned for each mitigation
  • Timelines set for mitigations
  • Residual risk documented

Step 3: Coverage Design

  • Acceptance criteria broken into atomic scenarios
  • Test levels selected (E2E/API/Component/Unit)
  • No duplicate coverage across levels
  • Priority levels assigned (P0/P1/P2/P3)
  • P0 scenarios meet strict criteria (blocks core + high risk + no workaround)
  • Data prerequisites identified
  • Tooling requirements documented
  • Execution order defined (smoke → P0 → P1 → P2/P3)

Step 4: Deliverables Generation

  • Risk assessment matrix created
  • Coverage matrix created
  • Execution order documented
  • Resource estimates calculated
  • Quality gate criteria defined
  • Output file written to correct location
  • Output file uses template structure

Output Validation

Risk Assessment Matrix

  • All risks have unique IDs (R-001, R-002, etc.)
  • Each risk has category assigned
  • Probability values are 1, 2, or 3
  • Impact values are 1, 2, or 3
  • Scores calculated correctly (P × I)
  • High-priority risks (≥6) clearly marked
  • Mitigation strategies specific and actionable

Coverage Matrix

  • All requirements mapped to test levels
  • Priorities assigned to all scenarios
  • Risk linkage documented
  • Test counts realistic
  • Owners assigned where applicable
  • No duplicate coverage (same behavior at multiple levels)

Execution Strategy

CRITICAL: Keep execution strategy simple, avoid redundancy

  • Simple structure: PR / Nightly / Weekly (NOT complex smoke/P0/P1/P2 tiers)
  • PR execution: All functional tests unless significant infrastructure overhead
  • Nightly/Weekly: Only performance, chaos, long-running, manual tests
  • No redundancy: Don't re-list all tests (already in coverage plan)
  • Philosophy stated: "Run everything in PRs if <15 min, defer only if expensive/long"
  • Playwright parallelization noted: 100s of tests in 10-15 min

Resource Estimates

CRITICAL: Use intervals/ranges, NOT exact numbers

  • P0 effort provided as interval range (e.g., "~25-40 hours" NOT "36 hours")
  • P1 effort provided as interval range (e.g., "~20-35 hours" NOT "27 hours")
  • P2 effort provided as interval range (e.g., "~10-30 hours" NOT "15.5 hours")
  • P3 effort provided as interval range (e.g., "~2-5 hours" NOT "2.5 hours")
  • Total effort provided as interval range (e.g., "~55-110 hours" NOT "81 hours")
  • Timeline provided as week range (e.g., "~1.5-3 weeks" NOT "11 days")
  • Estimates include setup time and account for complexity variations
  • No false precision: Avoid exact calculations like "18 tests × 2 hours = 36 hours"

Quality Gate Criteria

  • P0 pass rate threshold defined (should be 100%)
  • P1 pass rate threshold defined (typically ≥95%)
  • High-risk mitigation completion required
  • Coverage targets specified (≥80% recommended)

Quality Checks

Evidence-Based Assessment

  • Risk assessment based on documented evidence
  • No speculation on business impact
  • Assumptions clearly documented
  • Clarifications requested where needed
  • Historical data referenced where available

Risk Classification Accuracy

  • TECH risks are architecture/integration issues
  • SEC risks are security vulnerabilities
  • PERF risks are performance/scalability concerns
  • DATA risks are data integrity issues
  • BUS risks are business/revenue impacts
  • OPS risks are deployment/operational issues

Priority Assignment Accuracy

CRITICAL: Priority classification is separate from execution timing

  • Priority sections (P0/P1/P2/P3) do NOT include execution context (e.g., no "Run on every commit" in headers)
  • Priority sections have only "Criteria" and "Purpose" (no "Execution:" field)
  • Execution Strategy section is separate and handles timing based on infrastructure overhead
  • P0: Truly blocks core functionality + High-risk (≥6) + No workaround
  • P1: Important features + Medium-risk (3-4) + Common workflows
  • P2: Secondary features + Low-risk (1-2) + Edge cases
  • P3: Nice-to-have + Exploratory + Benchmarks
  • Note at top of Test Coverage Plan: Clarifies P0/P1/P2/P3 = priority/risk, NOT execution timing

Test Level Selection

  • E2E used only for critical paths
  • API tests cover complex business logic
  • Component tests for UI interactions
  • Unit tests for edge cases and algorithms
  • No redundant coverage

Integration Points

Knowledge Base Integration

  • risk-governance.md consulted
  • probability-impact.md applied
  • test-levels-framework.md referenced
  • test-priorities-matrix.md used
  • Additional fragments loaded as needed

Status File Integration

  • Test design logged in Quality & Testing Progress
  • Epic number and scope documented
  • Completion timestamp recorded

Workflow Dependencies

  • Can proceed to *atdd workflow with P0 scenarios
  • *atdd is a separate workflow and must be run explicitly (not auto-run)
  • Can proceed to automate workflow with full coverage plan
  • Risk assessment informs gate workflow criteria
  • Integrates with ci workflow execution order

System-Level Mode: Two-Document Validation

When in system-level mode (PRD + ADR input), validate BOTH documents:

test-design-architecture.md

  • Purpose statement at top (serves as contract with Architecture team)
  • Executive Summary with scope, business context, architecture decisions, risk summary
  • Quick Guide section with three tiers:
    • 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
    • ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
    • 📋 INFO ONLY - Solutions Provided (no decisions needed)
  • Risk Assessment section - ACTIONABLE
    • Total risks identified count
    • High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
    • Medium and low-priority risks tables
    • Risk category legend included
  • Testability Concerns and Architectural Gaps section - ACTIONABLE
    • Sub-section: 🚨 ACTIONABLE CONCERNS at TOP
      • Blockers to Fast Feedback table (WHAT architecture must provide)
      • Architectural Improvements Needed (WHAT must be changed)
      • Each concern has: Owner, Timeline, Impact
    • Sub-section: Testability Assessment Summary at BOTTOM (FYI)
      • What Works Well (passing items)
      • Accepted Trade-offs (no action required)
      • This section only included if worth mentioning; otherwise omitted
  • Risk Mitigation Plans for all high-priority risks (≥6)
    • Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
    • Only Backend/DevOps/Arch/Security mitigations (production code changes)
    • QA-owned mitigations belong in QA doc instead
  • Assumptions and Dependencies section
    • Architectural assumptions only (SLO targets, replication lag, system design)
    • Assumptions list (numbered)
    • Dependencies list with required dates
    • Risks to plan with impact and contingency
    • QA execution assumptions belong in QA doc instead
  • NO test implementation code (long examples belong in QA doc)
  • NO test scripts (no Playwright test(...) blocks, no assertions, no test setup code)
  • NO NFR test examples (NFR sections describe WHAT to test, not HOW to test)
  • NO test scenario checklists (belong in QA doc)
  • NO bloat or repetition (consolidate repeated notes, avoid over-explanation)
  • Cross-references to QA doc where appropriate (instead of duplication)
  • RECIPE SECTIONS NOT IN ARCHITECTURE DOC:
    • NO "Test Levels Strategy" section (unit/integration/E2E split belongs in QA doc only)
    • NO "NFR Testing Approach" section with detailed test procedures (belongs in QA doc only)
    • NO "Test Environment Requirements" section (belongs in QA doc only)
    • NO "Recommendations for Sprint 0" section with test framework setup (belongs in QA doc only)
    • NO "Quality Gate Criteria" section (pass rates, coverage targets belong in QA doc only)
    • NO "Tool Selection" section (Playwright, k6, etc. belongs in QA doc only)

test-design-qa.md

REQUIRED SECTIONS:

  • Purpose statement at top (test execution recipe)
  • Executive Summary with risk summary and coverage summary
  • Dependencies & Test Blockers section in POSITION 2 (right after Executive Summary)
    • Backend/Architecture dependencies listed (what QA needs from other teams)
    • QA infrastructure setup listed (factories, fixtures, environments)
    • Code example with playwright-utils if config.tea_use_playwright_utils is true
    • Test from '@seontechnologies/playwright-utils/api-request/fixtures'
    • Expect from '@playwright/test' (playwright-utils does not re-export expect)
    • Code examples include assertions (no unused imports)
  • Risk Assessment section (brief, references Architecture doc)
    • High-priority risks table
    • Medium/low-priority risks table
    • Each risk shows "QA Test Coverage" column (how QA validates)
  • Test Coverage Plan with P0/P1/P2/P3 sections
    • Priority sections have ONLY "Criteria" (no execution context)
    • Note at top: "P0/P1/P2/P3 = priority, NOT execution timing"
    • Test tables with columns: Test ID | Requirement | Test Level | Risk Link | Notes
  • Execution Strategy section (organized by TOOL TYPE)
    • Every PR: Playwright tests (~10-15 min)
    • Nightly: k6 performance tests (~30-60 min)
    • Weekly: Chaos & long-running (~hours)
    • Philosophy: "Run everything in PRs unless expensive/long-running"
  • QA Effort Estimate section (QA effort ONLY)
    • Interval-based estimates (e.g., "~1-2 weeks" NOT "36 hours")
    • NO DevOps, Backend, Data Eng, Finance effort
    • NO Sprint breakdowns (too prescriptive)
  • Appendix A: Code Examples & Tagging
  • Appendix B: Knowledge Base References

DON'T INCLUDE (bloat):

  • NO Quick Reference section
  • NO System Architecture Summary
  • NO Test Environment Requirements as separate section (integrate into Dependencies)
  • NO Testability Assessment section (covered in Dependencies)
  • NO Test Levels Strategy section (obvious from test scenarios)
  • NO NFR Readiness Summary
  • NO Quality Gate Criteria section (teams decide for themselves)
  • NO Follow-on Workflows section (BMAD commands self-explanatory)
  • NO Approval section
  • NO Infrastructure/DevOps/Finance effort tables (out of scope)
  • NO Sprint 0/1/2/3 breakdown tables
  • NO Next Steps section

Cross-Document Consistency

  • Both documents reference same risks by ID (R-001, R-002, etc.)
  • Both documents use consistent priority levels (P0, P1, P2, P3)
  • Both documents reference same Sprint 0 blockers
  • No duplicate content (cross-reference instead)
  • Dates and authors match across documents
  • ADR and PRD references consistent

Document Quality (Anti-Bloat Check)

CRITICAL: Check for bloat and repetition across BOTH documents

  • No repeated notes 10+ times (e.g., "Timing is pessimistic until R-005 fixed" on every section)
  • Repeated information consolidated (write once at top, reference briefly if needed)
  • No excessive detail that doesn't add value (obvious concepts, redundant examples)
  • Focus on unique/critical info (only document what's different from standard practice)
  • Architecture doc: Concerns-focused, NOT implementation-focused
  • QA doc: Implementation-focused, NOT theory-focused
  • Clear separation: Architecture = WHAT and WHY, QA = HOW
  • Professional tone: No AI slop markers
    • Avoid excessive / emojis (use sparingly, only when adding clarity)
    • Avoid "absolutely", "excellent", "fantastic", overly enthusiastic language
    • Write professionally and directly
  • Architecture doc length: Target ~150-200 lines max (focus on actionable concerns only)
  • QA doc length: Keep concise, remove bloat sections

Architecture Doc Structure (Actionable-First Principle)

CRITICAL: Validate structure follows actionable-first, FYI-last principle

  • Actionable sections at TOP:
    • Quick Guide (🚨 BLOCKERS first, then ⚠️ HIGH PRIORITY, then 📋 INFO ONLY last)
    • Risk Assessment (high-priority risks ≥6 at top)
    • Testability Concerns (concerns/blockers at top, passing items at bottom)
    • Risk Mitigation Plans (for high-priority risks ≥6)
  • FYI sections at BOTTOM:
    • Testability Assessment Summary (what works well - only if worth mentioning)
    • Assumptions and Dependencies
  • ASRs categorized correctly:
    • Actionable ASRs included in 🚨 or ⚠️ sections
    • FYI ASRs included in 📋 section or omitted if obvious

Completion Criteria

All must be true:

  • All prerequisites met
  • All process steps completed
  • All output validations passed
  • All quality checks passed
  • All integration points verified
  • Output file(s) complete and well-formatted
  • System-level mode: Both documents validated (if applicable)
  • Epic-level mode: Single document validated (if applicable)
  • Team review scheduled (if required)

Post-Workflow Actions

User must complete:

  1. Review risk assessment with team
  2. Prioritize mitigation for high-priority risks (score ≥6)
  3. Allocate resources per estimates
  4. Run *atdd workflow to generate P0 tests (separate workflow; not auto-run)
  5. Set up test data factories and fixtures
  6. Schedule team review of test design document

Recommended next workflows:

  1. Run atdd workflow for P0 test generation
  2. Run framework workflow if not already done
  3. Run ci workflow to configure pipeline stages

Rollback Procedure

If workflow fails:

  1. Delete output file
  2. Review error logs
  3. Fix missing context (PRD, architecture docs)
  4. Clarify ambiguous requirements
  5. Retry workflow

Notes

Common Issues

Issue: Too many P0 tests

  • Solution: Apply strict P0 criteria - must block core AND high risk AND no workaround

Issue: Risk scores all high

  • Solution: Differentiate between high-impact (3) and degraded (2) impacts

Issue: Duplicate coverage across levels

  • Solution: Use test pyramid - E2E for critical paths only

Issue: Resource estimates too high or too precise

  • Solution:
    • Invest in fixtures/factories to reduce per-test setup time
    • Use interval ranges (e.g., "~55-110 hours") instead of exact numbers (e.g., "81 hours")
    • Widen intervals if high uncertainty exists

Issue: Execution order section too complex or redundant

  • Solution:
    • Default: Run everything in PRs (<15 min with Playwright parallelization)
    • Only defer to nightly/weekly if expensive (k6, chaos, 4+ hour tests)
    • Don't create smoke/P0/P1/P2/P3 tier structure
    • Don't re-list all tests (already in coverage plan)

Best Practices

  • Base risk assessment on evidence, not assumptions
  • High-priority risks (≥6) require immediate mitigation
  • P0 tests should cover <10% of total scenarios
  • Avoid testing same behavior at multiple levels
  • Use interval-based estimates (e.g., "~25-40 hours") instead of exact numbers to avoid false precision and provide flexibility
  • Keep execution strategy simple: Default to "run everything in PRs" (<15 min with Playwright), only defer if expensive/long-running
  • Avoid execution order redundancy: Don't create complex tier structures or re-list tests

Checklist Complete: Sign off when all items validated.

Completed by: {name} Date: {date} Epic: {epic title} Notes: {additional notes}