doc: test design refinements (#1382)

2026-01-30 04:32:02 +00:00 · 2026-01-23 13:00:48 -06:00
parent efbe839a0a
commit 48881f86a6
4 changed files with 687 additions and 367 deletions
--- a/src/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/bmm/workflows/testarch/test-design/checklist.md
@@ -80,23 +80,29 @@
 - [ ] Owners assigned where applicable
 - [ ] No duplicate coverage (same behavior at multiple levels)
-### Execution Order
+### Execution Strategy
- [ ] Smoke tests defined (<5 min target)
+**CRITICAL: Keep execution strategy simple, avoid redundancy**
- [ ] P0 tests listed (<10 min target)
+
- [ ] P1 tests listed (<30 min target)
+- [ ] **Simple structure**: PR / Nightly / Weekly (NOT complex smoke/P0/P1/P2 tiers)
- [ ] P2/P3 tests listed (<60 min target)
+- [ ] **PR execution**: All functional tests unless significant infrastructure overhead
- [ ] Order optimizes for fast feedback
+- [ ] **Nightly/Weekly**: Only performance, chaos, long-running, manual tests
 - [ ] **No redundancy**: Don't re-list all tests (already in coverage plan)
 - [ ] **Philosophy stated**: "Run everything in PRs if <15 min, defer only if expensive/long"
 - [ ] **Playwright parallelization noted**: 100s of tests in 10-15 min
 ### Resource Estimates
- [ ] P0 hours calculated (count × 2 hours)
+**CRITICAL: Use intervals/ranges, NOT exact numbers**
- [ ] P1 hours calculated (count × 1 hour)
+
- [ ] P2 hours calculated (count × 0.5 hours)
+- [ ] P0 effort provided as interval range (e.g., "~25-40 hours" NOT "36 hours")
- [ ] P3 hours calculated (count × 0.25 hours)
+- [ ] P1 effort provided as interval range (e.g., "~20-35 hours" NOT "27 hours")
- [ ] Total hours summed
+- [ ] P2 effort provided as interval range (e.g., "~10-30 hours" NOT "15.5 hours")
- [ ] Days estimate provided (hours / 8)
+- [ ] P3 effort provided as interval range (e.g., "~2-5 hours" NOT "2.5 hours")
- [ ] Estimates include setup time
+- [ ] Total effort provided as interval range (e.g., "~55-110 hours" NOT "81 hours")
 - [ ] Timeline provided as week range (e.g., "~1.5-3 weeks" NOT "11 days")
 - [ ] Estimates include setup time and account for complexity variations
 - [ ] **No false precision**: Avoid exact calculations like "18 tests × 2 hours = 36 hours"
 ### Quality Gate Criteria
@@ -126,11 +132,16 @@
 ### Priority Assignment Accuracy
- [ ] P0: Truly blocks core functionality
+**CRITICAL: Priority classification is separate from execution timing**
- [ ] P0: High-risk (score ≥6)
+
- [ ] P0: No workaround exists
+- [ ] **Priority sections (P0/P1/P2/P3) do NOT include execution context** (e.g., no "Run on every commit" in headers)
- [ ] P1: Important but not blocking
+- [ ] **Priority sections have only "Criteria" and "Purpose"** (no "Execution:" field)
- [ ] P2/P3: Nice-to-have or edge cases
+- [ ] **Execution Strategy section** is separate and handles timing based on infrastructure overhead
 - [ ] P0: Truly blocks core functionality + High-risk (≥6) + No workaround
 - [ ] P1: Important features + Medium-risk (3-4) + Common workflows
 - [ ] P2: Secondary features + Low-risk (1-2) + Edge cases
 - [ ] P3: Nice-to-have + Exploratory + Benchmarks
 - [ ] **Note at top of Test Coverage Plan**: Clarifies P0/P1/P2/P3 = priority/risk, NOT execution timing
 ### Test Level Selection
@@ -176,58 +187,90 @@
  - [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
  - [ ] ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
  - [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
- [ ] **Risk Assessment** section
+- [ ] **Risk Assessment** section - **ACTIONABLE**
  - [ ] Total risks identified count
  - [ ] High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
  - [ ] Medium and low-priority risks tables
  - [ ] Risk category legend included
- [ ] **Testability Concerns** section (if system has architectural constraints)
+- [ ] **Testability Concerns and Architectural Gaps** section - **ACTIONABLE**
-  - [ ] Blockers to fast feedback table
+  - [ ] **Sub-section: 🚨 ACTIONABLE CONCERNS** at TOP
-  - [ ] Explanation of why standard CI/CD may not apply (if applicable)
+    - [ ] Blockers to Fast Feedback table (WHAT architecture must provide)
-  - [ ] Tiered testing strategy table (if forced by architecture)
+    - [ ] Architectural Improvements Needed (WHAT must be changed)
-  - [ ] Architectural improvements needed (or acknowledgment system supports testing well)
+    - [ ] Each concern has: Owner, Timeline, Impact
  - [ ] **Sub-section: Testability Assessment Summary** at BOTTOM (FYI)
    - [ ] What Works Well (passing items)
    - [ ] Accepted Trade-offs (no action required)
    - [ ] This section only included if worth mentioning; otherwise omitted
 - [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
  - [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
  - [ ] **Only Backend/DevOps/Arch/Security mitigations** (production code changes)
  - [ ] QA-owned mitigations belong in QA doc instead
 - [ ] **Assumptions and Dependencies** section
  - [ ] **Architectural assumptions only** (SLO targets, replication lag, system design)
  - [ ] Assumptions list (numbered)
  - [ ] Dependencies list with required dates
  - [ ] Risks to plan with impact and contingency
  - [ ] QA execution assumptions belong in QA doc instead
 - [ ] **NO test implementation code** (long examples belong in QA doc)
 - [ ] **NO test scripts** (no Playwright test(...) blocks, no assertions, no test setup code)
 - [ ] **NO NFR test examples** (NFR sections describe WHAT to test, not HOW to test)
 - [ ] **NO test scenario checklists** (belong in QA doc)
- [ ] **Cross-references to QA doc** where appropriate
+- [ ] **NO bloat or repetition** (consolidate repeated notes, avoid over-explanation)
 - [ ] **Cross-references to QA doc** where appropriate (instead of duplication)
 - [ ] **RECIPE SECTIONS NOT IN ARCHITECTURE DOC:**
  - [ ] NO "Test Levels Strategy" section (unit/integration/E2E split belongs in QA doc only)
  - [ ] NO "NFR Testing Approach" section with detailed test procedures (belongs in QA doc only)
  - [ ] NO "Test Environment Requirements" section (belongs in QA doc only)
  - [ ] NO "Recommendations for Sprint 0" section with test framework setup (belongs in QA doc only)
  - [ ] NO "Quality Gate Criteria" section (pass rates, coverage targets belong in QA doc only)
  - [ ] NO "Tool Selection" section (Playwright, k6, etc. belongs in QA doc only)
 ### test-design-qa.md
- [ ] **Purpose statement** at top (execution recipe for QA team)
+**NEW STRUCTURE (streamlined from 375 to ~287 lines):**
- [ ] **Quick Reference for QA** section
+
-  - [ ] Before You Start checklist
+- [ ] **Purpose statement** at top (test execution recipe)
-  - [ ] Test Execution Order
+- [ ] **Executive Summary** with risk summary and coverage summary
-  - [ ] Need Help? guidance
+- [ ] **Dependencies & Test Blockers** section in POSITION 2 (right after Executive Summary)
- [ ] **System Architecture Summary** (brief overview of services and data flow)
+  - [ ] Backend/Architecture dependencies listed (what QA needs from other teams)
- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
+  - [ ] QA infrastructure setup listed (factories, fixtures, environments)
-  - [ ] Table with Local/Dev/Staging environments
+  - [ ] Code example with playwright-utils if config.tea_use_playwright_utils is true
-  - [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
+  - [ ] Test from '@seontechnologies/playwright-utils/api-request/fixtures'
-  - [ ] Code example provided
+  - [ ] Expect from '@playwright/test' (playwright-utils does not re-export expect)
- [ ] **Testability Assessment** with prerequisites checklist
+  - [ ] Code examples include assertions (no unused imports)
-  - [ ] References Architecture doc blockers (not duplication)
+- [ ] **Risk Assessment** section (brief, references Architecture doc)
- [ ] **Test Levels Strategy** with unit/integration/E2E split
+  - [ ] High-priority risks table
-  - [ ] System type identified
+  - [ ] Medium/low-priority risks table
-  - [ ] Recommended split percentages with rationale
+  - [ ] Each risk shows "QA Test Coverage" column (how QA validates)
  - [ ] Test count summary (P0/P1/P2/P3 totals)
 - [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
-  - [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
+  - [ ] Priority sections have ONLY "Criteria" (no execution context)
-  - [ ] Detailed test scenarios WITH CHECKBOXES
+  - [ ] Note at top: "P0/P1/P2/P3 = priority, NOT execution timing"
-  - [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
+  - [ ] Test tables with columns: Test ID | Requirement | Test Level | Risk Link | Notes
- [ ] **Sprint 0 Setup Requirements**
+- [ ] **Execution Strategy** section (organized by TOOL TYPE)
-  - [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
+  - [ ] Every PR: Playwright tests (~10-15 min)
-  - [ ] QA Test Infrastructure section (factories, fixtures)
+  - [ ] Nightly: k6 performance tests (~30-60 min)
-  - [ ] Test Environments section (Local, CI/CD, Staging, Production)
+  - [ ] Weekly: Chaos & long-running (~hours)
-  - [ ] Sprint 0 NFR Gates checklist
+  - [ ] Philosophy: "Run everything in PRs unless expensive/long-running"
-  - [ ] Sprint 1 Items clearly separated
+- [ ] **QA Effort Estimate** section (QA effort ONLY)
- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
+  - [ ] Interval-based estimates (e.g., "~1-2 weeks" NOT "36 hours")
-  - [ ] Table with NFR categories, status, evidence, blocker, next action
+  - [ ] NO DevOps, Backend, Data Eng, Finance effort
- [ ] **Cross-references to Architecture doc** (not duplication)
+  - [ ] NO Sprint breakdowns (too prescriptive)
- [ ] **NO architectural theory** (just reference Architecture doc)
+- [ ] **Appendix A: Code Examples & Tagging**
 - [ ] **Appendix B: Knowledge Base References**
 **REMOVED SECTIONS (bloat):**
 - [ ] ❌ NO Quick Reference section (bloat)
 - [ ] ❌ NO System Architecture Summary (bloat)
 - [ ] ❌ NO Test Environment Requirements as separate section (integrated into Dependencies)
 - [ ] ❌ NO Testability Assessment section (bloat - covered in Dependencies)
 - [ ] ❌ NO Test Levels Strategy section (bloat - obvious from test scenarios)
 - [ ] ❌ NO NFR Readiness Summary (bloat)
 - [ ] ❌ NO Quality Gate Criteria section (teams decide for themselves)
 - [ ] ❌ NO Follow-on Workflows section (bloat - BMAD commands self-explanatory)
 - [ ] ❌ NO Approval section (unnecessary formality)
 - [ ] ❌ NO Infrastructure/DevOps/Finance effort tables (out of scope)
 - [ ] ❌ NO Sprint 0/1/2/3 breakdown tables (too prescriptive)
 - [ ] ❌ NO Next Steps section (bloat)
 ### Cross-Document Consistency
@@ -238,6 +281,40 @@
 - [ ] Dates and authors match across documents
 - [ ] ADR and PRD references consistent
 ### Document Quality (Anti-Bloat Check)
 **CRITICAL: Check for bloat and repetition across BOTH documents**
 - [ ] **No repeated notes 10+ times** (e.g., "Timing is pessimistic until R-005 fixed" on every section)
 - [ ] **Repeated information consolidated** (write once at top, reference briefly if needed)
 - [ ] **No excessive detail** that doesn't add value (obvious concepts, redundant examples)
 - [ ] **Focus on unique/critical info** (only document what's different from standard practice)
 - [ ] **Architecture doc**: Concerns-focused, NOT implementation-focused
 - [ ] **QA doc**: Implementation-focused, NOT theory-focused
 - [ ] **Clear separation**: Architecture = WHAT and WHY, QA = HOW
 - [ ] **Professional tone**: No AI slop markers
  - [ ] Avoid excessive ✅/❌ emojis (use sparingly, only when adding clarity)
  - [ ] Avoid "absolutely", "excellent", "fantastic", overly enthusiastic language
  - [ ] Write professionally and directly
 - [ ] **Architecture doc length**: Target ~150-200 lines max (focus on actionable concerns only)
 - [ ] **QA doc length**: Keep concise, remove bloat sections
 ### Architecture Doc Structure (Actionable-First Principle)
 **CRITICAL: Validate structure follows actionable-first, FYI-last principle**
 - [ ] **Actionable sections at TOP:**
  - [ ] Quick Guide (🚨 BLOCKERS first, then ⚠️ HIGH PRIORITY, then 📋 INFO ONLY last)
  - [ ] Risk Assessment (high-priority risks ≥6 at top)
  - [ ] Testability Concerns (concerns/blockers at top, passing items at bottom)
  - [ ] Risk Mitigation Plans (for high-priority risks ≥6)
 - [ ] **FYI sections at BOTTOM:**
  - [ ] Testability Assessment Summary (what works well - only if worth mentioning)
  - [ ] Assumptions and Dependencies
 - [ ] **ASRs categorized correctly:**
  - [ ] Actionable ASRs included in 🚨 or ⚠️ sections
  - [ ] FYI ASRs included in 📋 section or omitted if obvious
 ## Completion Criteria
 **All must be true:**
@@ -295,9 +372,20 @@ If workflow fails:
 - **Solution**: Use test pyramid - E2E for critical paths only
-**Issue**: Resource estimates too high
+**Issue**: Resource estimates too high or too precise
- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+- **Solution**:
  - Invest in fixtures/factories to reduce per-test setup time
  - Use interval ranges (e.g., "~55-110 hours") instead of exact numbers (e.g., "81 hours")
  - Widen intervals if high uncertainty exists
 **Issue**: Execution order section too complex or redundant
 - **Solution**:
  - Default: Run everything in PRs (<15 min with Playwright parallelization)
  - Only defer to nightly/weekly if expensive (k6, chaos, 4+ hour tests)
  - Don't create smoke/P0/P1/P2/P3 tier structure
  - Don't re-list all tests (already in coverage plan)
 ### Best Practices
@@ -305,7 +393,9 @@ If workflow fails:
 - High-priority risks (≥6) require immediate mitigation
 - P0 tests should cover <10% of total scenarios
 - Avoid testing same behavior at multiple levels
- Include smoke tests (P0 subset) for fast feedback
+- **Use interval-based estimates** (e.g., "~25-40 hours") instead of exact numbers to avoid false precision and provide flexibility
 - **Keep execution strategy simple**: Default to "run everything in PRs" (<15 min with Playwright), only defer if expensive/long-running
 - **Avoid execution order redundancy**: Don't create complex tier structures or re-list tests
 ---
--- a/src/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/bmm/workflows/testarch/test-design/instructions.md
@@ -157,7 +157,13 @@ TEA test-design workflow supports TWO modes, detected automatically:
 1. **Review Architecture for Testability**
-   Evaluate architecture against these criteria:
+   **STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST**
   Evaluate architecture against these criteria and structure output as:
   1. **Testability Concerns** (ACTIONABLE - what's broken/missing)
   2. **Testability Assessment Summary** (FYI - what works well)
   **Testability Criteria:**
   **Controllability:**
   - Can we control system state for testing? (API seeding, factories, database reset)
@@ -174,8 +180,18 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Can we reproduce failures? (deterministic waits, HAR capture, seed data)
   - Are components loosely coupled? (mockable, testable boundaries)
   **In Architecture Doc Output:**
   - **Section A: Testability Concerns** (TOP) - List what's BROKEN or MISSING
     - Example: "No API for test data seeding → Cannot parallelize tests"
     - Example: "Hardcoded DB connection → Cannot test in CI"
   - **Section B: Testability Assessment Summary** (BOTTOM) - List what PASSES
     - Example: "✅ API-first design supports test isolation"
     - Only include if worth mentioning; otherwise omit this section entirely
 2. **Identify Architecturally Significant Requirements (ASRs)**
   **CRITICAL: ASRs must indicate if ACTIONABLE or FYI**
   From PRD NFRs and architecture decisions, identify quality requirements that:
   - Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
   - Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
@@ -183,21 +199,60 @@ TEA test-design workflow supports TWO modes, detected automatically:
   Score each ASR using risk matrix (probability × impact).
   **In Architecture Doc, categorize ASRs:**
   - **ACTIONABLE ASRs** (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections
   - **FYI ASRs** (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious
   **Example:**
   - ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → **ACTIONABLE** (goes in 🚨 BLOCKERS)
   - ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → **FYI** (goes in 📋 INFO ONLY or omit)
   **Structure Principle:** Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit)
 3. **Define Test Levels Strategy**
   **IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc**
   Based on architecture (mobile, web, API, microservices, monolith):
   - Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
   - Identify test environment needs (local, staging, ephemeral, production-like)
   - Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)
-4. **Assess NFR Testing Approach**
+   **In Architecture doc:** Only mention test level split if it's an ACTIONABLE concern
   - Example: "API response time <100ms requires load testing infrastructure" (concern)
   - DO NOT include full test level strategy table in Architecture doc
-   For each NFR category:
+4. **Assess NFR Requirements (MINIMAL in Architecture Doc)**
-   - **Security**: Auth/authz tests, OWASP validation, secret handling (Playwright E2E + security tools)
+
-   - **Performance**: Load/stress/spike testing with k6, SLO/SLA thresholds
+   **CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY**
-   - **Reliability**: Error handling, retries, circuit breakers, health checks (Playwright + API tests)
+
   **In Architecture Doc:**
   - Only mention NFRs if they create testability CONCERNS
   - Focus on WHAT architecture must provide, not HOW to test
   - Keep it brief - 1-2 sentences per NFR category at most
   **Example - Security NFR in Architecture doc (if there's a concern):**
   ✅ CORRECT (concern-focused, brief, WHAT/WHY only):
   - "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0."
   - "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation."
   ❌ INCORRECT (too detailed, belongs in QA doc):
   - Full table of security test scenarios
   - Test scripts with code examples
   - Detailed test procedures
   - Tool selection (e.g., "use Playwright E2E + OWASP ZAP")
   - Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz")
   **In QA Doc (full NFR testing approach):**
   - **Security**: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures
   - **Performance**: Load/stress/spike test scenarios, k6 scripts, SLO thresholds
   - **Reliability**: Error handling tests, retry logic validation, circuit breaker tests
   - **Maintainability**: Coverage targets, code quality gates, observability validation
   **Rule of Thumb:**
   - Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences)
   - QA doc: "How to test those NFRs" (full sections with tables, code, procedures)
 5. **Flag Testability Concerns**
   Identify architecture decisions that harm testability:
@@ -228,22 +283,54 @@ TEA test-design workflow supports TWO modes, detected automatically:
   **Standard Structures (REQUIRED):**
   **test-design-architecture.md sections (in this order):**
   **STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST**
   1. Executive Summary (scope, business context, architecture, risk summary)
   2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
-   3. Risk Assessment (high/medium/low-priority risks with scoring)
+   3. Risk Assessment (high/medium/low-priority risks with scoring) - **ACTIONABLE**
-   4. Testability Concerns and Architectural Gaps (if system has constraints)
+   4. Testability Concerns and Architectural Gaps - **ACTIONABLE** (what arch team must do)
-   5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
+      - Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST)
-   6. Assumptions and Dependencies
+      - Sub-section: Architectural Improvements Needed (ACTIONABLE)
      - Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning)
   5. Risk Mitigation Plans (detailed for high-priority risks ≥6) - **ACTIONABLE**
   6. Assumptions and Dependencies - **FYI**
   **SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:**
   - ❌ Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY
   **WHAT BELONGS IN ARCHITECTURE DOC:**
   - ✅ Testability CONCERNS (what makes it hard to test)
   - ✅ Architecture GAPS (what's missing for testability)
   - ✅ What architecture team must DO (blockers, improvements)
   - ✅ Risks and mitigation plans
   - ✅ ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable
   **test-design-qa.md sections (in this order):**
-   1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
+   1. Executive Summary (risk summary, coverage summary)
-   2. System Architecture Summary (brief overview)
+   2. **Dependencies & Test Blockers** (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams)
-   3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
+   3. Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate)
-   4. Testability Assessment (lightweight prerequisites checklist)
+   4. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
-   5. Test Levels Strategy (unit/integration/E2E split with rationale)
+   5. **Execution Strategy** (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual))
-   6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+   6. QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend)
-   7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
+   7. Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs)
-   8. NFR Readiness Summary (reference to Architecture doc)
+
   **SECTIONS TO EXCLUDE FROM QA DOC:**
   - ❌ Quality Gate Criteria (pass/fail thresholds - teams decide for themselves)
   - ❌ Follow-on Workflows (bloat - BMAD commands are self-explanatory)
   - ❌ Approval section (unnecessary formality)
   - ❌ Test Environment Requirements (remove as separate section - integrate into Dependencies if needed)
   - ❌ NFR Readiness Summary (bloat - covered in Risk Assessment)
   - ❌ Testability Assessment (bloat - covered in Dependencies)
   - ❌ Test Levels Strategy (bloat - obvious from test scenarios)
   - ❌ Sprint breakdowns (too prescriptive)
   - ❌ Infrastructure/DevOps/Data Eng effort tables (out of scope)
   - ❌ Mitigation plans for non-QA work (belongs in Arch doc)
   **Content Guidelines:**
@@ -252,26 +339,46 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - ✅ Clear ownership (each blocker/ASR has owner + timeline)
   - ✅ Testability requirements (what architecture must support)
   - ✅ Mitigation plans (for each high-risk item ≥6)
-   - ✅ Short code examples (5-10 lines max showing what to support)
+   - ✅ Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max)
   - ✅ **Target length**: ~150-200 lines max (focus on actionable concerns only)
   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
-   **Architecture doc (DON'T):**
+   **Architecture doc (DON'T) - CRITICAL:**
-   - ❌ NO long test code examples (belongs in QA doc)
+   - ❌ NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide
-   - ❌ NO test scenario checklists (belongs in QA doc)
+   - ❌ NO Playwright test examples (e.g., test('...', async ({ request }) => ...))
-   - ❌ NO implementation details (how QA will test)
+   - ❌ NO assertion logic (e.g., expect(...).toBe(...))
   - ❌ NO test scenario checklists with checkboxes (belongs in QA doc)
   - ❌ NO implementation details about HOW QA will test
   - ❌ Focus on CONCERNS, not IMPLEMENTATION
   **QA doc (DO):**
   - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
-   - ✅ Environment setup (Sprint 0 checklist with blockers)
+   - ✅ Full test implementation code samples when helpful
-   - ✅ Tool setup (factories, fixtures, frameworks)
+   - ✅ **IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities**
   - ✅ Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures'
   - ✅ Import expect from '@playwright/test' (playwright-utils does not re-export expect)
   - ✅ Use apiRequest fixture with schema validation, retry logic, and structured responses
   - ✅ Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams)
   - ✅ **QA effort estimates ONLY** (no DevOps, Data Eng, Finance, Backend effort - out of scope)
   - ✅ Cross-references to Architecture doc (not duplication)
   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
   **QA doc (DON'T):**
   - ❌ NO architectural theory (just reference Architecture doc)
   - ❌ NO ASR explanations (link to Architecture doc instead)
   - ❌ NO duplicate risk assessments (reference Architecture doc)
   - ❌ NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves)
   - ❌ NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory)
   - ❌ NO Approval section (unnecessary formality)
   - ❌ NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only)
   - ❌ NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours")
   - ❌ NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc)
   - ❌ NO architectural assumptions or debates (those belong in Architecture doc)
   **Anti-Patterns to Avoid (Cross-Document Redundancy):**
   **CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO**
   ❌ **DON'T duplicate OAuth requirements:**
   - Architecture doc: Explain OAuth 2.1 flow in detail
   - QA doc: Re-explain why OAuth 2.1 is required
@@ -280,6 +387,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
   - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
   ❌ **DON'T repeat the same note 10+ times:**
   - Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section
   - This creates bloat and makes docs hard to read
   ✅ **DO consolidate repeated information:**
   - Write once at the top: "**Note**: All timing estimates are pessimistic pending R-005 resolution"
   - Reference briefly if needed: "(pessimistic timing)"
   ❌ **DON'T include excessive detail that doesn't add value:**
   - Long explanations of obvious concepts
   - Redundant examples showing the same pattern
   - Over-documentation of standard practices
   ✅ **DO focus on what's unique or critical:**
   - Document only what's different from standard practice
   - Highlight critical decisions and risks
   - Keep explanations concise and actionable
   **Markdown Cross-Reference Syntax Examples:**
   ```markdown
@@ -330,6 +455,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Cross-reference between docs (no duplication)
   - Validate against checklist.md (System-Level Mode section)
 **Common Over-Engineering to Avoid:**
   **In QA Doc:**
   1. ❌ Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves
   2. ❌ Effort estimates for other teams - QA doc should only estimate QA effort
   3. ❌ Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive
   4. ❌ Approval sections - Unnecessary formality
   5. ❌ Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc
   6. ❌ Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc
   7. ❌ Follow-on workflows section - Bloat, BMAD commands are self-explanatory
   8. ❌ NFR Readiness Summary - Bloat, covered in Risk Assessment
   **Test Coverage Numbers Reality Check:**
   - With Playwright parallelization, running ALL Playwright tests is as fast as running just P0
   - Don't split Playwright tests by priority into different CI gates - it adds no value
   - Tool type matters, not priority labels
   - Defer based on infrastructure cost, not importance
 **After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
 ---
@@ -540,12 +683,51 @@ TEA test-design workflow supports TWO modes, detected automatically:
 8. **Plan Mitigations**
   **CRITICAL: Mitigation placement depends on WHO does the work**
   For each high-priority risk:
   - Define mitigation strategy
   - Assign owner (dev, QA, ops)
   - Set timeline
   - Update residual risk expectation
   **Mitigation Plan Placement:**
   **Architecture Doc:**
   - Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng
   - Example: "Add authorization layer for customer-scoped access" (Backend work)
   - Example: "Configure AWS Fault Injection Simulator" (DevOps work)
   - Example: "Define CloudWatch log schema for backfill events" (Architecture work)
   **QA Doc:**
   - Mitigations owned by QA (test development work)
   - Example: "Create factories for test data with randomization" (QA work)
   - Example: "Implement polling with retry for async validation" (QA test code)
   - Brief reference to Architecture doc mitigations (don't duplicate)
   **Rule of Thumb:**
   - If mitigation requires production code changes → Architecture doc
   - If mitigation is test infrastructure/code → QA doc
   - If mitigation involves multiple teams → Architecture doc with QA validation approach
   **Assumptions Placement:**
   **Architecture Doc:**
   - Architectural assumptions (SLO targets, replication lag, system design assumptions)
   - Example: "P95 <500ms inferred from <2s timeout (requires Product approval)"
   - Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)"
   - Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)"
   **QA Doc:**
   - Test execution assumptions (test infrastructure readiness, test data availability)
   - Example: "Assumes test factories already created"
   - Example: "Assumes CI/CD pipeline configured"
   - Brief reference to Architecture doc for architectural assumptions
   **Rule of Thumb:**
   - If assumption is about system architecture/design → Architecture doc
   - If assumption is about test infrastructure/execution → QA doc
 ---
 ## Step 3: Design Test Coverage
@@ -594,6 +776,8 @@ TEA test-design workflow supports TWO modes, detected automatically:
 3. **Assign Priority Levels**
   **CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing**
   **Knowledge Base Reference**: `test-priorities-matrix.md`
   **P0 (Critical)**:
@@ -601,25 +785,28 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - High-risk areas (score ≥6)
   - Revenue-impacting
   - Security-critical
-   - **Run on every commit**
+   - No workaround exists
   - Affects majority of users
   **P1 (High)**:
   - Important user features
   - Medium-risk areas (score 3-4)
   - Common workflows
-   - **Run on PR to main**
+   - Workaround exists but difficult
   **P2 (Medium)**:
   - Secondary features
   - Low-risk areas (score 1-2)
   - Edge cases
-   - **Run nightly or weekly**
+   - Regression prevention
   **P3 (Low)**:
   - Nice-to-have
   - Exploratory
   - Performance benchmarks
-   - **Run on-demand**
+   - Documentation validation
   **NOTE:** Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance.
 4. **Outline Data and Tooling Prerequisites**
@@ -629,13 +816,55 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Environment setup
   - Tools and dependencies
-5. **Define Execution Order**
+5. **Define Execution Strategy** (Keep It Simple)
-   Recommend test execution sequence:
+   **IMPORTANT: Avoid over-engineering execution order**
-   1. **Smoke tests** (P0 subset, <5 min)
+
-   2. **P0 tests** (critical paths, <10 min)
+   **Default Philosophy:**
-   3. **P1 tests** (important features, <30 min)
+   - Run **everything** in PRs if total duration <15 minutes
-   4. **P2/P3 tests** (full regression, <60 min)
+   - Playwright is fast with parallelization (100s of tests in ~10-15 min)
   - Only defer to nightly/weekly if there's significant overhead:
     - Performance tests (k6, load testing) - expensive infrastructure
     - Chaos engineering - requires special setup (AWS FIS)
     - Long-running tests - endurance (4+ hours), disaster recovery
     - Manual tests - require human intervention
   **Simple Execution Strategy (Organized by TOOL TYPE):**
   ```markdown
   ## Execution Strategy
   **Philosophy**: Run everything in PRs unless significant infrastructure overhead.
   Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
   **Organized by TOOL TYPE:**
   ### Every PR: Playwright Tests (~10-15 min)
   All functional tests (from any priority level):
   - All E2E, API, integration, unit tests using Playwright
   - Parallelized across {N} shards
   - Total: ~{N} tests (includes P0, P1, P2, P3)
   ### Nightly: k6 Performance Tests (~30-60 min)
   All performance tests (from any priority level):
   - Load, stress, spike, endurance
   - Reason: Expensive infrastructure, long-running (10-40 min per test)
   ### Weekly: Chaos & Long-Running (~hours)
   Special infrastructure tests (from any priority level):
   - Multi-region failover, disaster recovery, endurance
   - Reason: Very expensive, very long (4+ hours)
   ```
   **KEY INSIGHT: Organize by TOOL TYPE, not priority**
   - Playwright (fast, cheap) → PR
   - k6 (expensive, long) → Nightly
   - Chaos/Manual (very expensive, very long) → Weekly
   **Avoid:**
   - ❌ Don't organize by priority (smoke → P0 → P1 → P2 → P3)
   - ❌ Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly)
   - ❌ Don't create artificial tiers - organize by tool type and infrastructure overhead
 ---
@@ -661,34 +890,66 @@ TEA test-design workflow supports TWO modes, detected automatically:
   | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
   ```
-3. **Document Execution Order**
+3. **Document Execution Strategy** (Simple, Not Redundant)
   **IMPORTANT: Keep execution strategy simple and avoid redundancy**
   ```markdown
-   ### Smoke Tests (<5 min)
+   ## Execution Strategy
-   - Login successful
+   **Default: Run all functional tests in PRs (~10-15 min)**
-   - Dashboard loads
+   - All Playwright tests (parallelized across 4 shards)
   - Includes E2E, API, integration, unit tests
   - Total: ~{N} tests
-   ### P0 Tests (<10 min)
+   **Nightly: Performance & Infrastructure tests**
   - k6 load/stress/spike tests (~30-60 min)
   - Reason: Expensive infrastructure, long-running
-   - [Full P0 list]
+   **Weekly: Chaos & Disaster Recovery**
-
+   - Endurance tests (4+ hours)
-   ### P1 Tests (<30 min)
+   - Multi-region failover (requires AWS FIS)
-
+   - Backup restore validation
-   - [Full P1 list]
+   - Reason: Special infrastructure, very long-running
   ```
   **DO NOT:**
   - ❌ Create redundant smoke/P0/P1/P2/P3 tier structure
   - ❌ List all tests again in execution order (already in coverage plan)
   - ❌ Split tests by priority unless there's infrastructure overhead
 4. **Include Resource Estimates**
   **IMPORTANT: Use intervals/ranges, not exact numbers**
   Provide rough estimates with intervals to avoid false precision:
   ```markdown
   ### Test Effort Estimates
-   - P0 scenarios: 15 tests × 2 hours = 30 hours
+   - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours**
-   - P1 scenarios: 25 tests × 1 hour = 25 hours
+   - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours**
-   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
+   - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours**
-   - **Total:** 75 hours (~10 days)
+   - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer)
   ```
   **Why intervals:**
   - Avoids false precision (estimates are never exact)
   - Provides flexibility for complexity variations
   - Accounts for unknowns and dependencies
   - More realistic and less prescriptive
   **Guidelines:**
   - P0 tests: 1.5-2.5 hours each (complex setup, security, performance)
   - P1 tests: 0.75-1.5 hours each (standard integration, API tests)
   - P2 tests: 0.25-0.75 hours each (edge cases, simple validation)
   - P3 tests: 0.1-0.5 hours each (exploratory, documentation)
   **Express totals as:**
   - Hour ranges: "~55-105 hours"
   - Week ranges: "~1.5-3 weeks"
   - Avoid: Exact numbers like "75 hours" or "11 days"
 5. **Add Gate Criteria**
   ```markdown
--- a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
@@ -108,54 +108,51 @@
 ### Testability Concerns and Architectural Gaps
-**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
+**🚨 ACTIONABLE CONCERNS - Architecture Team Must Address**
-#### Blockers to Fast Feedback
+{If system has critical testability concerns, list them here. If architecture supports testing well, state "No critical testability concerns identified" and skip to Testability Assessment Summary}
-| Blocker | Impact | Current Mitigation | Ideal Solution |
+#### 1. Blockers to Fast Feedback (WHAT WE NEED FROM ARCHITECTURE)
 |---------|--------|-------------------|----------------|
 | **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
-#### Why This Matters
+| Concern | Impact | What Architecture Must Provide | Owner | Timeline |
 |---------|--------|--------------------------------|-------|----------|
 | **{Concern name}** | {Impact on testing} | {Specific architectural change needed} | {Team} | {Sprint} |
-**Standard CI/CD expectations:**
+**Example:**
- Full test suite on every commit (~5-15 min feedback)
+- **No API for test data seeding** → Cannot parallelize tests → Provide POST /test/seed endpoint (Backend, Sprint 0)
 - Parallel test execution (isolated test data per worker)
 - Ephemeral test environments (spin up → test → tear down)
 - Fast feedback loop (devs stay in flow state)
-**Current reality for {Feature}:**
+#### 2. Architectural Improvements Needed (WHAT SHOULD BE CHANGED)
 - {Actual situation - what's different from standard}
-#### Tiered Testing Strategy
+{List specific improvements that would make the system more testable}
 {If forced by architecture, explain. If standard approach works, state that.}
 | Tier | When | Duration | Coverage | Why Not Full Suite? |
 |------|------|----------|----------|---------------------|
 | **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
 | **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
 | **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
 | **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
 **Note**: {Any timing assumptions or constraints}
 #### Architectural Improvements Needed
 {If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
 1. **{Improvement name}**
-   - {What to change}
+   - **Current problem**: {What's wrong}
-   - **Impact**: {How it improves testing}
+   - **Required change**: {What architecture must do}
   - **Impact if not fixed**: {Consequences}
   - **Owner**: {Team}
   - **Timeline**: {Sprint}
-#### Acceptance of Trade-offs
+---
-For {Feature} Phase 1, the team accepts:
+### Testability Assessment Summary
 - **{Trade-off 1}** ({Reasoning})
 - **{Trade-off 2}** ({Reasoning})
 - ⚠️ **{Known limitation}** ({Why acceptable for now})
-This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
+**📊 CURRENT STATE - FYI**
 {Only include this section if there are passing items worth mentioning. Otherwise omit.}
 #### What Works Well
 - ✅ {Passing item 1} (e.g., "API-first design supports parallel test execution")
 - ✅ {Passing item 2} (e.g., "Feature flags enable test isolation")
 - ✅ {Passing item 3}
 #### Accepted Trade-offs (No Action Required)
 For {Feature} Phase 1, the following trade-offs are acceptable:
 - **{Trade-off 1}** - {Why acceptable for now}
 - **{Trade-off 2}** - {Why acceptable for now}
 {This is technical debt OR acceptable for Phase 1} that {should be revisited post-GA OR maintained as-is}
 ---
--- a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
@@ -1,314 +1,286 @@
 # Test Design for QA: {Feature Name}
-**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
+**Purpose:** Test execution recipe for QA team. Defines what to test, how to test it, and what QA needs from other teams.
 **Date:** {date}
 **Author:** {author}
-**Status:** Draft / Ready for Implementation
+**Status:** Draft
 **Project:** {project_name}
-**PRD Reference:** {prd_link}
+
-**ADR Reference:** {adr_link}
+**Related:** See Architecture doc (test-design-architecture.md) for testability concerns and architectural blockers.
 ---
-## Quick Reference for QA
+## Executive Summary
-**Before You Start:**
+**Scope:** {Brief description of testing scope}
 - [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
 - [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
 - [ ] Confirm test infrastructure ready (factories, fixtures, environments)
-**Test Execution Order:**
+**Risk Summary:**
-1. **Smoke tests** (<5 min) - Fast feedback on critical paths
+- Total Risks: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
-2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
+- Critical Categories: {Categories with most high-priority risks}
 3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
 4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
-**Need Help?**
+**Coverage Summary:**
- Blockers: See Architecture doc "Quick Guide" for mitigation plans
+- P0 tests: ~{N} (critical paths, security)
- Test scenarios: See "Test Coverage Plan" section below
+- P1 tests: ~{N} (important features, integration)
- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
+- P2 tests: ~{N} (edge cases, regression)
 - P3 tests: ~{N} (exploratory, benchmarks)
 - **Total**: ~{N} tests (~{X}-{Y} weeks with 1 QA)
 ---
-## System Architecture Summary
+## Dependencies & Test Blockers
-**Data Pipeline:**
+**CRITICAL:** QA cannot proceed without these items from other teams.
 {Brief description of system flow}
-**Key Services:**
+### Backend/Architecture Dependencies (Sprint 0)
 - **{Service 1}**: {Purpose and key responsibilities}
 - **{Service 2}**: {Purpose and key responsibilities}
 - **{Service 3}**: {Purpose and key responsibilities}
-**Data Stores:**
+**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
 - **{Database 1}**: {What it stores}
 - **{Database 2}**: {What it stores}
-**Expected Scale** (from ADR):
+1. **{Dependency 1}** - {Team} - {Timeline}
- {Key metrics: RPS, volume, users, etc.}
+   - {What QA needs}
   - {Why it blocks testing}
---
+2. **{Dependency 2}** - {Team} - {Timeline}
   - {What QA needs}
   - {Why it blocks testing}
-## Test Environment Requirements
+### QA Infrastructure Setup (Sprint 0)
-**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
+1. **Test Data Factories** - QA
   - {Entity} factory with faker-based randomization
   - Auto-cleanup fixtures for parallel safety
-| Environment | Database | Test Data Strategy | Purpose |
+2. **Test Environments** - QA
-|-------------|----------|-------------------|---------|
+   - Local: {Setup details}
-| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
+   - CI/CD: {Setup details}
-| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
+   - Staging: {Setup details}
 | **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
-**Key Principles:**
+**Example factory pattern:**
 - **Shared database per environment** (no ephemeral)
 - **Randomization for isolation** (faker-based unique IDs)
 - **Parallel-safe** (concurrent test runs don't conflict)
 - **Self-cleaning** (tests delete their own data)
 - **Shift-left** (test against real DBs early)
 **Example:**
 ```typescript
-import { faker } from "@faker-js/faker";
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
 import { expect } from '@playwright/test';
 import { faker } from '@faker-js/faker';
-test("example with randomized test data @p0", async ({ apiRequest }) => {
+test('example test @p0', async ({ apiRequest }) => {
  const testData = {
    id: `test-${faker.string.uuid()}`,
-    customerId: `test-customer-${faker.string.alphanumeric(8)}`,
+    email: faker.internet.email(),
    // ... unique test data
  };
-  // Seed, test, cleanup
+  const { status } = await apiRequest({
    method: 'POST',
    path: '/api/resource',
    body: testData,
  });
  expect(status).toBe(201);
 });
 ```
 ---
-## Testability Assessment
+## Risk Assessment
-**Prerequisites from Architecture Doc:**
+**Note:** Full risk details in Architecture doc. This section summarizes risks relevant to QA test planning.
-Verify these blockers are resolved before test development:
+### High-Priority Risks (Score ≥6)
 - [ ] {Blocker 1} (see Architecture doc Quick Guide → 🚨 BLOCKERS)
 - [ ] {Blocker 2}
 - [ ] {Blocker 3}
-**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
+| Risk ID | Category | Description | Score | QA Test Coverage |
 |---------|----------|-------------|-------|------------------|
 | **{R-ID}** | {CAT} | {Brief description} | **{Score}** | {How QA validates this risk} |
---
+### Medium/Low-Priority Risks
-## Test Levels Strategy
+| Risk ID | Category | Description | Score | QA Test Coverage |
-
+|---------|----------|-------------|-------|------------------|
-**System Type:** {API-heavy / UI-heavy / Mixed backend system}
+| {R-ID} | {CAT} | {Brief description} | {Score} | {How QA validates this risk} |
 **Recommended Split:**
 - **Unit Tests: {X}%** - {What to unit test}
 - **Integration/API Tests: {X}%** - ⭐ **PRIMARY FOCUS** - {What to integration test}
 - **E2E Tests: {X}%** - {What to E2E test}
 **Rationale:** {Why this split makes sense for this system}
 **Test Count Summary:**
 - P0: ~{N} tests - Critical paths, run on every commit
 - P1: ~{N} tests - Important features, run on PR to main
 - P2: ~{N} tests - Edge cases, run nightly/weekly
 - P3: ~{N} tests - Exploratory, run on-demand
 - **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
 ---
 ## Test Coverage Plan
-**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
+**IMPORTANT:** P0/P1/P2/P3 = **priority and risk level** (what to focus on if time-constrained), NOT execution timing. See "Execution Strategy" for when tests run.
-### P0 (Critical) - Run on every commit (~{X} min)
+### P0 (Critical)
-**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
+**Criteria:** Blocks core functionality + High risk (≥6) + No workaround + Affects majority of users
-**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P0-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
 | **P0-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
+**Total P0:** ~{N} tests
 **Key Smoke Tests** (subset of P0, run first for fast feedback):
 - {Smoke test 1} - {Duration}
 - {Smoke test 2} - {Duration}
 - {Smoke test 3} - {Duration}
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P0:** ~{N} tests (~{X} weeks)
 #### P0 Test Scenarios (Detailed)
 **1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
 - [ ] {Scenario 1 with checkbox}
 - [ ] {Scenario 2}
 - [ ] {Scenario 3}
 **2. {Test Category 2} ({N} tests)**
 - [ ] {Scenario 1}
 - [ ] {Scenario 2}
 {Continue for all P0 categories}
 ---
-### P1 (High) - Run on PR to main (~{X} min additional)
+### P1 (High)
-**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
+**Criteria:** Important features + Medium risk (3-4) + Common workflows + Workaround exists but difficult
-**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P1-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
 | **P1-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
+**Total P1:** ~{N} tests
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P1:** ~{N} tests (~{X} weeks)
 #### P1 Test Scenarios (Detailed)
 **1. {Test Category} ({N} tests)**
 - [ ] {Scenario 1}
 - [ ] {Scenario 2}
 {Continue for all P1 categories}
 ---
-### P2 (Medium) - Run nightly/weekly (~{X} min)
+### P2 (Medium)
-**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
+**Criteria:** Secondary features + Low risk (1-2) + Edge cases + Regression prevention
-**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P2-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
+**Total P2:** ~{N} tests
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P2:** ~{N} tests (~{X} weeks)
 ---
-### P3 (Low) - Run on-demand (exploratory)
+### P3 (Low)
-**Execution:** Manual trigger or weekly scheduled run, performance testing
+**Criteria:** Nice-to-have + Exploratory + Performance benchmarks + Documentation validation
-**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
+| Test ID | Requirement | Test Level | Notes |
 |---------|-------------|------------|-------|
 | **P3-001** | {Requirement} | {Level} | {Notes} |
-**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
+**Total P3:** ~{N} tests
 | Requirement | Test Level | Test Count | Owner | Notes |
 |-------------|------------|------------|-------|-------|
 | {Requirement 1} | {Level} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {N} | QA | {Notes} |
 **Total P3:** ~{N} tests (~{X} days)
 ---
-### Coverage Matrix (Requirements → Tests)
+## Execution Strategy
-| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+**Philosophy:** Run everything in PRs unless there's significant infrastructure overhead. Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
-|-------------|------------|----------|-----------|------------|-------|
+
-| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+**Organized by TOOL TYPE:**
-| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+
 ### Every PR: Playwright Tests (~10-15 min)
 **All functional tests** (from any priority level):
 - All E2E, API, integration, unit tests using Playwright
 - Parallelized across {N} shards
 - Total: ~{N} Playwright tests (includes P0, P1, P2, P3)
 **Why run in PRs:** Fast feedback, no expensive infrastructure
 ### Nightly: k6 Performance Tests (~30-60 min)
 **All performance tests** (from any priority level):
 - Load, stress, spike, endurance tests
 - Total: ~{N} k6 tests (may include P0, P1, P2)
 **Why defer to nightly:** Expensive infrastructure (k6 Cloud), long-running (10-40 min per test)
 ### Weekly: Chaos & Long-Running (~hours)
 **Special infrastructure tests** (from any priority level):
 - Multi-region failover (requires AWS Fault Injection Simulator)
 - Disaster recovery (backup restore, 4+ hours)
 - Endurance tests (4+ hours runtime)
 **Why defer to weekly:** Very expensive infrastructure, very long-running, infrequent validation sufficient
 **Manual tests** (excluded from automation):
 - DevOps validation (deployment, monitoring)
 - Finance validation (cost alerts)
 - Documentation validation
 ---
-## Sprint 0 Setup Requirements
+## QA Effort Estimate
-**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
+**QA test development effort only** (excludes DevOps, Backend, Data Eng, Finance work):
-### Architecture/Backend Blockers (from Architecture doc)
+| Priority | Count | Effort Range | Notes |
 |----------|-------|--------------|-------|
 | P0 | ~{N} | ~{X}-{Y} weeks | Complex setup (security, performance, multi-step) |
 | P1 | ~{N} | ~{X}-{Y} weeks | Standard coverage (integration, API tests) |
 | P2 | ~{N} | ~{X}-{Y} days | Edge cases, simple validation |
 | P3 | ~{N} | ~{X}-{Y} days | Exploratory, benchmarks |
 | **Total** | ~{N} | **~{X}-{Y} weeks** | **1 QA engineer, full-time** |
-**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
+**Assumptions:**
 - Includes test design, implementation, debugging, CI integration
 - Excludes ongoing maintenance (~10% effort)
 - Assumes test infrastructure (factories, fixtures) ready
-1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
+**Dependencies from other teams:**
-   - {What needs to be provided}
+- See "Dependencies & Test Blockers" section for what QA needs from Backend, DevOps, Data Eng
   - **Details:** Architecture doc {Risk-ID} mitigation plan
 2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
   - {What needs to be provided}
   - **Details:** Architecture doc {Risk-ID} mitigation plan
 ### QA Test Infrastructure
 1. **{Factory/Fixture Name}** - QA
   - Faker-based generator: `{function_signature}`
   - Auto-cleanup after tests
 2. **{Entity} Fixtures** - QA
   - Seed scripts for {states/scenarios}
   - Isolated {id_pattern} per test
 ### Test Environments
 **Local:** {Setup details - Docker, LocalStack, etc.}
 **CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
 **Staging:** {Setup details - shared multi-tenant, nightly E2E}
 **Production:** {Setup details - feature flags, canary transactions}
 **Sprint 0 NFR Gates** (MUST complete before integration testing):
 - [ ] {Gate 1}: {Description} (Owner) 🚨
 - [ ] {Gate 2}: {Description} (Owner) 🚨
 - [ ] {Gate 3}: {Description} (Owner) 🚨
 ### Sprint 1 Items (Not Sprint 0)
 - **{Item 1}** ({Owner}): {Description}
 - **{Item 2}** ({Owner}): {Description}
 **Sprint 1 NFR Gates** (MUST complete before GA):
 - [ ] {Gate 1}: {Description} (Owner)
 - [ ] {Gate 2}: {Description} (Owner)
 ---
-## NFR Readiness Summary
+## Appendix A: Code Examples & Tagging
-**Based on Architecture Doc Risk Assessment**
+**Playwright Tags for Selective Execution:**
-| NFR Category | Status | Evidence Status | Blocker | Next Action |
+```typescript
-|--------------|--------|-----------------|---------|-------------|
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
-| **Testability & Automation** | {Status} | {Evidence} | {Sprint} | {Action} |
+import { expect } from '@playwright/test';
 | **Test Data Strategy** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Scalability & Availability** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Monitorability, Debuggability & Manageability** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **QoS & QoE** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
-**Total:** {N} PASS, {N} CONCERNS across {N} categories
+// P0 critical test
 test('@P0 @API @Security unauthenticated request returns 401', async ({ apiRequest }) => {
  const { status, body } = await apiRequest({
    method: 'POST',
    path: '/api/endpoint',
    body: { data: 'test' },
    skipAuth: true,
  });
  expect(status).toBe(401);
  expect(body.error).toContain('unauthorized');
 });
 // P1 integration test
 test('@P1 @Integration data syncs correctly', async ({ apiRequest }) => {
  // Seed data
  await apiRequest({
    method: 'POST',
    path: '/api/seed',
    body: { /* test data */ },
  });
  // Validate
  const { status, body } = await apiRequest({
    method: 'GET',
    path: '/api/resource',
  });
  expect(status).toBe(200);
  expect(body).toHaveProperty('data');
 });
 ```
 **Run specific tags:**
 ```bash
 # Run only P0 tests
 npx playwright test --grep @P0
 # Run P0 + P1 tests
 npx playwright test --grep "@P0|@P1"
 # Run only security tests
 npx playwright test --grep @Security
 # Run all Playwright tests in PR (default)
 npx playwright test
 ```
 ---
-**End of QA Document**
+## Appendix B: Knowledge Base References
-**Next Steps for QA Team:**
+- **Risk Governance**: `risk-governance.md` - Risk scoring methodology
-1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
+- **Test Priorities Matrix**: `test-priorities-matrix.md` - P0-P3 criteria
-2. Set up test infrastructure (factories, fixtures, environments)
+- **Test Levels Framework**: `test-levels-framework.md` - E2E vs API vs Unit selection
-3. Begin test implementation following priority order (P0 → P1 → P2 → P3)
+- **Test Quality**: `test-quality.md` - Definition of Done (no hard waits, <300 lines, <1.5 min)
 4. Run smoke tests first for fast feedback
 5. Track progress using test scenario checklists above
-**Next Steps for Architecture Team:**
+---
-1. Monitor Sprint 0 blocker resolution
+
-2. Provide support for QA infrastructure setup if needed
+**Generated by:** BMad TEA Agent
-3. Review test results and address any newly discovered testability gaps
+**Workflow:** `_bmad/bmm/testarch/test-design`
 **Version:** 4.0 (BMad v6)