Merge branch 'v6-alpha' into feat/migrate-tea-1

feat: integrated new playwright mcp
format fixed
2025-10-16 10:22:26 -05:00 · 2025-10-15 17:14:39 -05:00 · 2025-10-14 16:14:04 -05:00 · 2025-10-14 16:10:20 -05:00
68 changed files with 29788 additions and 549 deletions
--- a/src/modules/bmm/README.md
+++ b/src/modules/bmm/README.md
@@ -62,7 +62,7 @@ Extension modules that add specialized capabilities to BMM.

 ### 🏗️ `/testarch`

-Test architecture and quality assurance components.
+Test architecture and quality assurance components. The **[Test Architect (TEA) Guide](./testarch/README.md)** provides comprehensive testing strategy across 9 workflows: framework setup, CI/CD, test design, ATDD, automation, traceability, NFR assessment, quality gates, and test review.

 ## Quick Start

@@ -119,6 +119,7 @@ BMM integrates seamlessly with the BMad Core framework, leveraging:
 ## Related Documentation

 - [BMM Workflows Guide](./workflows/README.md) - **Start here!**
+- [Test Architect (TEA) Guide](./testarch/README.md) - Quality assurance and testing strategy
 - [Agent Documentation](./agents/README.md) - Individual agent capabilities
 - [Team Configurations](./teams/README.md) - Pre-built team setups
 - [Task Library](./tasks/README.md) - Reusable task components
--- a/src/modules/bmm/agents/tea.agent.yaml
+++ b/src/modules/bmm/agents/tea.agent.yaml
@@ -11,7 +11,7 @@ agent:
  persona:
    role: Master Test Architect
    identity: Test architect specializing in CI/CD, automated frameworks, and scalable quality gates.
-    communication_style: Data-driven advisor. Strong opinions, weakly held. Pragmatic. Makes random bird noises.
+    communication_style: Data-driven advisor. Strong opinions, weakly held. Pragmatic.
    principles:
      - Risk-based testing: depth scales with impact. Quality gates backed by data. Tests mirror usage. Cost = creation + execution + maintenance.
      - Testing is feature work. Prioritize unit/integration over E2E. Flakiness is critical debt. ATDD: tests first, AI implements, suite validates.
@@ -44,7 +44,7 @@ agent:

    - trigger: trace
      workflow: "{project-root}/bmad/bmm/workflows/testarch/trace/workflow.yaml"
-      description: Map requirements to tests Given-When-Then BDD format
+      description: Map requirements to tests (Phase 1) and make quality gate decision (Phase 2)

    - trigger: nfr-assess
      workflow: "{project-root}/bmad/bmm/workflows/testarch/nfr-assess/workflow.yaml"
@@ -54,6 +54,6 @@ agent:
      workflow: "{project-root}/bmad/bmm/workflows/testarch/ci/workflow.yaml"
      description: Scaffold CI/CD quality pipeline

-    - trigger: gate
-      workflow: "{project-root}/bmad/bmm/workflows/testarch/gate/workflow.yaml"
-      description: Write/update quality gate decision assessment
+    - trigger: test-review
+      workflow: "{project-root}/bmad/bmm/workflows/testarch/test-review/workflow.yaml"
+      description: Review test quality using comprehensive knowledge base and best practices
--- a/src/modules/bmm/config.yaml
+++ b/src/modules/bmm/config.yaml
@@ -0,0 +1,7 @@
+# Powered by BMAD™ Core
+name: bmm
+short-title: BMad Method Module
+author: Brian (BMad) Madison
+
+# TEA Agent Configuration
+tea_use_mcp_enhancements: true # Enable Playwright MCP capabilities (healing, exploratory, verification)
--- a/src/modules/bmm/testarch/README.md
+++ b/src/modules/bmm/testarch/README.md
@@ -1,5 +1,5 @@
 ---
-last-redoc-date: 2025-09-30
+last-redoc-date: 2025-10-14
 ---

 # Test Architect (TEA) Agent Guide
@@ -10,6 +10,97 @@ last-redoc-date: 2025-09-30
 - **Mission:** Deliver actionable quality strategies, automation coverage, and gate decisions that scale with project level and compliance demands.
 - **Use When:** Project level ≥2, integration risk is non-trivial, brownfield regression risk exists, or compliance/NFR evidence is required.

+## TEA Workflow Lifecycle
+
+TEA integrates across the entire BMad development lifecycle, providing quality assurance at every phase:
+
+```
+┌──────────────────────────────────────────────────────────┐
+│             BMM Phase 2: PLANNING                        │
+│                                                          │
+│  PM: *plan-project                                       │
+│       ↓                                                  │
+│  TEA: *framework ──→ *ci ──→ *test-design                │
+│       └─────────┬─────────────┘                          │
+│                 │ (Setup once per project)               │
+└─────────────────┼──────────────────────────────────────────┘
+                  ↓
+┌──────────────────────────────────────────────────────────┐
+│            BMM Phase 4: IMPLEMENTATION                   │
+│                  (Per Story Cycle)                       │
+│                                                          │
+│  ┌─→ SM: *create-story                                  │
+│  │        ↓                                              │
+│  │   TEA: *atdd (optional, before dev)                  │
+│  │        ↓                                              │
+│  │   DEV: implements story                               │
+│  │        ↓                                              │
+│  │   TEA: *automate ──→ *test-review (optional)         │
+│  │        ↓                                              │
+│  │   TEA: *trace (refresh coverage)                     │
+│  │        ↓                                              │
+│  └───[next story]                                        │
+└─────────────────┼──────────────────────────────────────────┘
+                  ↓
+┌──────────────────────────────────────────────────────────┐
+│                EPIC/RELEASE GATE                         │
+│                                                          │
+│  TEA: *nfr-assess (if not done earlier)                 │
+│       ↓                                                  │
+│  TEA: *test-review (final audit, optional)              │
+│       ↓                                                  │
+│  TEA: *trace (Phase 2: Gate) ──→ PASS | CONCERNS | FAIL | WAIVED │
+│                                                          │
+└──────────────────────────────────────────────────────────┘
+```
+
+### TEA Integration with BMad v6 Workflow
+
+TEA operates **across all four BMad phases**, unlike other agents that are phase-specific:
+
+<details>
+<summary><strong>Cross-Phase Integration & Workflow Complexity</strong></summary>
+
+### Phase-Specific Agents (Standard Pattern)
+
+- **Phase 1 (Analysis)**: Analyst agent
+- **Phase 2 (Planning)**: PM agent
+- **Phase 3 (Solutioning)**: Architect agent
+- **Phase 4 (Implementation)**: SM, DEV agents
+
+### TEA: Cross-Phase Quality Agent (Unique Pattern)
+
+TEA is **the only agent that spans all phases**:
+
+```
+Phase 1 (Analysis) → [TEA not typically used]
+    ↓
+Phase 2 (Planning) → TEA: *framework, *ci, *test-design (setup)
+    ↓
+Phase 3 (Solutioning) → [TEA validates architecture testability]
+    ↓
+Phase 4 (Implementation) → TEA: *atdd, *automate, *test-review, *trace (per story)
+    ↓
+Epic/Release Gate → TEA: *nfr-assess, *trace Phase 2 (release decision)
+```
+
+### Why TEA Needs 8 Workflows
+
+**Standard agents**: 1-3 workflows per phase
+**TEA**: 8 workflows across 3+ phases
+
+| Phase       | TEA Workflows                          | Frequency        | Purpose                          |
+| ----------- | -------------------------------------- | ---------------- | -------------------------------- |
+| **Phase 2** | *framework, *ci, \*test-design         | Once per project | Establish quality infrastructure |
+| **Phase 4** | *atdd, *automate, *test-review, *trace | Per story/sprint | Continuous quality validation    |
+| **Release** | *nfr-assess, *trace (Phase 2: gate)    | Per epic/release | Go/no-go decision                |
+
+**Note**: `*trace` is a two-phase workflow: Phase 1 (traceability) + Phase 2 (gate decision). This reduces cognitive load while maintaining natural workflow.
+
+This complexity **requires specialized documentation** (this guide), **extensive knowledge base** (19+ fragments), and **unique architecture** (`testarch/` directory).
+
+</details>
+
 ## Prerequisites and Setup

 1. Run the core planning workflows first:
@@ -31,8 +122,8 @@ last-redoc-date: 2025-09-30
 | Pre-Implementation | Run `*framework` (if harness missing), `*ci`, and `*test-design`          | Review risk/design/CI guidance, align backlog                                    | Test scaffold, CI pipeline, risk and coverage strategy                                |
 | Story Prep         | -                                                                         | Scrum Master `*create-story`, `*story-context`                                   | Story markdown + context XML                                                          |
 | Implementation     | (Optional) Trigger `*atdd` before dev to supply failing tests + checklist | Implement story guided by ATDD checklist                                         | Failing acceptance tests + implementation checklist                                   |
-| Post-Dev           | Execute `*automate`, re-run `*trace`                                      | Address recommendations, update code/tests                                       | Regression specs, refreshed coverage matrix                                           |
-| Release            | Run `*gate`                                                               | Confirm Definition of Done, share release notes                                  | Gate YAML + release summary (owners, waivers)                                         |
+| Post-Dev           | Execute `*automate`, (Optional) `*test-review`, re-run `*trace`           | Address recommendations, update code/tests                                       | Regression specs, quality report, refreshed coverage matrix                           |
+| Release            | (Optional) `*test-review` for final audit, Run `*trace` (Phase 2)         | Confirm Definition of Done, share release notes                                  | Quality audit, Gate YAML + release summary (owners, waivers)                          |

 <details>
 <summary>Execution Notes</summary>
@@ -40,7 +131,8 @@ last-redoc-date: 2025-09-30
 - Run `*framework` only once per repo or when modern harness support is missing.
 - `*framework` followed by `*ci` establishes install + pipeline; `*test-design` then handles risk scoring, mitigations, and scenario planning in one pass.
 - Use `*atdd` before coding when the team can adopt ATDD; share its checklist with the dev agent.
- Post-implementation, keep `*trace` current, expand coverage with `*automate`, and finish with `*gate`.
+- Post-implementation, keep `*trace` current, expand coverage with `*automate`, optionally review test quality with `*test-review`. For release gate, run `*trace` with Phase 2 enabled to get deployment decision.
+- Use `*test-review` after `*atdd` to validate generated tests, after `*automate` to ensure regression quality, or before gate for final audit.

 </details>

@@ -51,21 +143,21 @@ last-redoc-date: 2025-09-30
 2. **Setup:** TEA checks harness via `*framework`, configures `*ci`, and runs `*test-design` to capture risk/coverage plans.
 3. **Story Prep:** Scrum Master generates the story via `*create-story`; PO validates using `*assess-project-ready`.
 4. **Implementation:** TEA optionally runs `*atdd`; Dev implements with guidance from failing tests and the plan.
-5. **Post-Dev and Release:** TEA runs `*automate`, re-runs `*trace`, and finishes with `*gate` to document the decision.
+5. **Post-Dev and Release:** TEA runs `*automate`, optionally `*test-review` to audit test quality, re-runs `*trace` with Phase 2 enabled to generate both traceability and gate decision.

 </details>

 ### Brownfield Feature Enhancement (Level 3–4)

-| Phase             | Test Architect                                                      | Dev / Team                                                 | Outputs                                                 |
-| ----------------- | ------------------------------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |
-| Refresh Context   | -                                                                   | Analyst/PM/Architect rerun planning workflows              | Updated planning artifacts in `{output_folder}`         |
-| Baseline Coverage | Run `*trace` to inventory existing tests                            | Review matrix, flag hotspots                               | Coverage matrix + initial gate snippet                  |
-| Risk Targeting    | Run `*test-design`                                                  | Align remediation/backlog priorities                       | Brownfield risk memo + scenario matrix                  |
-| Story Prep        | -                                                                   | Scrum Master `*create-story`                               | Updated story markdown                                  |
-| Implementation    | (Optional) Run `*atdd` before dev                                   | Implement story, referencing checklist/tests               | Failing acceptance tests + implementation checklist     |
-| Post-Dev          | Apply `*automate`, re-run `*trace`, trigger `*nfr-assess` if needed | Resolve gaps, update docs/tests                            | Regression specs, refreshed coverage matrix, NFR report |
-| Release           | Run `*gate`                                                         | Product Owner `*assess-project-ready`, share release notes | Gate YAML + release summary                             |
+| Phase             | Test Architect                                                                         | Dev / Team                                                 | Outputs                                                                 |
+| ----------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
+| Refresh Context   | -                                                                                      | Analyst/PM/Architect rerun planning workflows              | Updated planning artifacts in `{output_folder}`                         |
+| Baseline Coverage | Run `*trace` to inventory existing tests                                               | Review matrix, flag hotspots                               | Coverage matrix + initial gate snippet                                  |
+| Risk Targeting    | Run `*test-design`                                                                     | Align remediation/backlog priorities                       | Brownfield risk memo + scenario matrix                                  |
+| Story Prep        | -                                                                                      | Scrum Master `*create-story`                               | Updated story markdown                                                  |
+| Implementation    | (Optional) Run `*atdd` before dev                                                      | Implement story, referencing checklist/tests               | Failing acceptance tests + implementation checklist                     |
+| Post-Dev          | Apply `*automate`, (Optional) `*test-review`, re-run `*trace`, `*nfr-assess` if needed | Resolve gaps, update docs/tests                            | Regression specs, quality report, refreshed coverage matrix, NFR report |
+| Release           | (Optional) `*test-review` for final audit, Run `*trace` (Phase 2)                      | Product Owner `*assess-project-ready`, share release notes | Quality audit, Gate YAML + release summary                              |

 <details>
 <summary>Execution Notes</summary>
@@ -73,7 +165,8 @@ last-redoc-date: 2025-09-30
 - Lead with `*trace` so remediation plans target true coverage gaps. Ensure `*framework` and `*ci` are in place early in the engagement; if the brownfield lacks them, run those setup steps immediately after refreshing context.
 - `*test-design` should highlight regression hotspots, mitigations, and P0 scenarios.
 - Use `*atdd` when stories benefit from ATDD; otherwise proceed to implementation and rely on post-dev automation.
- After development, expand coverage with `*automate`, re-run `*trace`, and close with `*gate`. Run `*nfr-assess` now if non-functional risks weren't addressed earlier.
+- After development, expand coverage with `*automate`, optionally review test quality with `*test-review`, re-run `*trace` (Phase 2 for gate decision). Run `*nfr-assess` now if non-functional risks weren't addressed earlier.
+- Use `*test-review` to validate existing brownfield tests or audit new tests before gate.
 - Product Owner `*assess-project-ready` confirms the team has artifacts before handoff or release.

 </details>
@@ -87,26 +180,27 @@ last-redoc-date: 2025-09-30
 4. **Story Prep:** Scrum Master generates `stories/story-1.1.md` via `*create-story`, automatically pulling updated context.
 5. **ATDD First:** TEA runs `*atdd`, producing failing Playwright specs under `tests/e2e/payments/` plus an implementation checklist.
 6. **Implementation:** Dev pairs with the checklist/tests to deliver the story.
-7. **Post-Implementation:** TEA applies `*automate`, re-runs `*trace`, performs `*nfr-assess` to validate SLAs, and closes with `*gate` marking PASS with follow-ups.
+7. **Post-Implementation:** TEA applies `*automate`, optionally `*test-review` to audit test quality, re-runs `*trace` with Phase 2 enabled, performs `*nfr-assess` to validate SLAs. The `*trace` Phase 2 output marks PASS with follow-ups.

 </details>

 ### Enterprise / Compliance Program (Level 4)

-| Phase               | Test Architect                                   | Dev / Team                                     | Outputs                                                   |
-| ------------------- | ------------------------------------------------ | ---------------------------------------------- | --------------------------------------------------------- |
-| Strategic Planning  | -                                                | Analyst/PM/Architect standard workflows        | Enterprise-grade PRD, epics, architecture                 |
-| Quality Planning    | Run `*framework`, `*test-design`, `*nfr-assess`  | Review guidance, align compliance requirements | Harness scaffold, risk + coverage plan, NFR documentation |
-| Pipeline Enablement | Configure `*ci`                                  | Coordinate secrets, pipeline approvals         | `.github/workflows/test.yml`, helper scripts              |
-| Execution           | Enforce `*atdd`, `*automate`, `*trace` per story | Implement stories, resolve TEA findings        | Tests, fixtures, coverage matrices                        |
-| Release             | Run `*gate`                                      | Capture sign-offs, archive artifacts           | Updated assessments, gate YAML, audit trail               |
+| Phase               | Test Architect                                                    | Dev / Team                                     | Outputs                                                    |
+| ------------------- | ----------------------------------------------------------------- | ---------------------------------------------- | ---------------------------------------------------------- |
+| Strategic Planning  | -                                                                 | Analyst/PM/Architect standard workflows        | Enterprise-grade PRD, epics, architecture                  |
+| Quality Planning    | Run `*framework`, `*test-design`, `*nfr-assess`                   | Review guidance, align compliance requirements | Harness scaffold, risk + coverage plan, NFR documentation  |
+| Pipeline Enablement | Configure `*ci`                                                   | Coordinate secrets, pipeline approvals         | `.github/workflows/test.yml`, helper scripts               |
+| Execution           | Enforce `*atdd`, `*automate`, `*test-review`, `*trace` per story  | Implement stories, resolve TEA findings        | Tests, fixtures, quality reports, coverage matrices        |
+| Release             | (Optional) `*test-review` for final audit, Run `*trace` (Phase 2) | Capture sign-offs, archive artifacts           | Quality audit, updated assessments, gate YAML, audit trail |

 <details>
 <summary>Execution Notes</summary>

 - Use `*atdd` for every story when feasible so acceptance tests lead implementation in regulated environments.
 - `*ci` scaffolds selective testing scripts, burn-in jobs, caching, and notifications for long-running suites.
- Prior to release, rerun coverage (`*trace`, `*automate`) and formalize the decision in `*gate`; store everything for audits. Call `*nfr-assess` here if compliance/performance requirements weren't captured during planning.
+- Enforce `*test-review` per story or sprint to maintain quality standards and ensure compliance with testing best practices.
+- Prior to release, rerun coverage (`*trace`, `*automate`), perform final quality audit with `*test-review`, and formalize the decision with `*trace` Phase 2 (gate decision); store everything for audits. Call `*nfr-assess` here if compliance/performance requirements weren't captured during planning.

 </details>

@@ -116,47 +210,102 @@ last-redoc-date: 2025-09-30
 1. **Strategic Planning:** Analyst/PM/Architect complete PRD, epics, and architecture using the standard workflows.
 2. **Quality Planning:** TEA runs `*framework`, `*test-design`, and `*nfr-assess` to establish mitigations, coverage, and NFR targets.
 3. **Pipeline Setup:** TEA configures CI via `*ci` with selective execution scripts.
-4. **Execution:** For each story, TEA enforces `*atdd`, `*automate`, and `*trace`; Dev teams iterate on the findings.
-5. **Release:** TEA re-checks coverage and logs the final gate decision via `*gate`, archiving artifacts for compliance.
+4. **Execution:** For each story, TEA enforces `*atdd`, `*automate`, `*test-review`, and `*trace`; Dev teams iterate on the findings.
+5. **Release:** TEA re-checks coverage, performs final quality audit with `*test-review`, and logs the final gate decision via `*trace` Phase 2, archiving artifacts for compliance.

 </details>

 ## Command Catalog

-| Command        | Task File                                        | Primary Outputs                                                     | Notes                                            |
-| -------------- | ------------------------------------------------ | ------------------------------------------------------------------- | ------------------------------------------------ |
-| `*framework`   | `workflows/testarch/framework/instructions.md`   | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs | Use when no production-ready harness exists      |
-| `*atdd`        | `workflows/testarch/atdd/instructions.md`        | Failing acceptance tests + implementation checklist                 | Requires approved story + harness                |
-| `*automate`    | `workflows/testarch/automate/instructions.md`    | Prioritized specs, fixtures, README/script updates, DoD summary     | Avoid duplicate coverage (see priority matrix)   |
-| `*ci`          | `workflows/testarch/ci/instructions.md`          | CI workflow, selective test scripts, secrets checklist              | Platform-aware (GitHub Actions default)          |
-| `*test-design` | `workflows/testarch/test-design/instructions.md` | Combined risk assessment, mitigation plan, and coverage strategy    | Handles risk scoring and test design in one pass |
-| `*trace`       | `workflows/testarch/trace/instructions.md`       | Coverage matrix, recommendations, gate snippet                      | Requires access to story/tests repositories      |
-| `*nfr-assess`  | `workflows/testarch/nfr-assess/instructions.md`  | NFR assessment report with actions                                  | Focus on security/performance/reliability        |
-| `*gate`        | `workflows/testarch/gate/instructions.md`        | Gate YAML + summary (PASS/CONCERNS/FAIL/WAIVED)                     | Deterministic decision rules + rationale         |
-
 <details>
-<summary>Command Guidance and Context Loading</summary>
+<summary><strong>Optional Playwright MCP Enhancements</strong></summary>

- Each task now carries its own preflight/flow/deliverable guidance inline.
- `tea-index.csv` maps workflow needs to knowledge fragments; keep tags accurate as you add guidance.
- Consider future modularization into orchestrated workflows if additional automation is needed.
- Update the fragment markdown files alongside workflow edits so guidance and outputs stay in sync.
+**Two Playwright MCP servers** (actively maintained, continuously updated):
+
+- `playwright` - Browser automation (`npx @playwright/mcp@latest`)
+- `playwright-test` - Test runner with failure analysis (`npx playwright run-test-mcp-server`)
+
+**How MCP Enhances TEA Workflows**:
+
+MCP provides additional capabilities on top of TEA's default AI-based approach:
+
+1. `*test-design`:
+   - Default: Analysis + documentation
+   - **+ MCP**: Interactive UI discovery with `browser_navigate`, `browser_click`, `browser_snapshot`, behavior observation
+
+   Benefit:Discover actual functionality, edge cases, undocumented features
+
+2. `*atdd`, `*automate`:
+   - Default: Infers selectors and interactions from requirements and knowledge fragments
+   - **+ MCP**: Generates tests **then** verifies with `generator_setup_page`, `browser_*` tools, validates against live app
+
+   Benefit: Accurate selectors from real DOM, verified behavior, refined test code
+
+3. `*automate`:
+   - Default: Pattern-based fixes from error messages + knowledge fragments
+   - **+ MCP**: Pattern fixes **enhanced with** `browser_snapshot`, `browser_console_messages`, `browser_network_requests`, `browser_generate_locator`
+
+   Benefit: Visual failure context, live DOM inspection, root cause discovery
+
+**Config example**:
+
+```json
+{
+  "mcpServers": {
+    "playwright": {
+      "command": "npx",
+      "args": ["@playwright/mcp@latest"]
+    },
+    "playwright-test": {
+      "command": "npx",
+      "args": ["playwright", "run-test-mcp-server"]
+    }
+  }
+}
+```
+
+**To disable**: Set `tea_use_mcp_enhancements: false` in `bmad/bmm/config.yaml` OR remove MCPs from IDE config.

 </details>

-## Workflow Placement
+<br></br>

-The TEA stack has three tightly-linked layers:
+| Command        | Workflow README                                   | Primary Outputs                                                                               | Notes                                                | With Playwright MCP Enhancements                                                                             |
+| -------------- | ------------------------------------------------- | --------------------------------------------------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
+| `*framework`   | [📖](../workflows/testarch/framework/README.md)   | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs                           | Use when no production-ready harness exists          | -                                                                                                            |
+| `*ci`          | [📖](../workflows/testarch/ci/README.md)          | CI workflow, selective test scripts, secrets checklist                                        | Platform-aware (GitHub Actions default)              | -                                                                                                            |
+| `*test-design` | [📖](../workflows/testarch/test-design/README.md) | Combined risk assessment, mitigation plan, and coverage strategy                              | Risk scoring + optional exploratory mode             | **+ Exploratory**: Interactive UI discovery with browser automation (uncover actual functionality)           |
+| `*atdd`        | [📖](../workflows/testarch/atdd/README.md)        | Failing acceptance tests + implementation checklist                                           | TDD red phase + optional recording mode              | **+ Recording**: AI generation verified with live browser (accurate selectors from real DOM)                 |
+| `*automate`    | [📖](../workflows/testarch/automate/README.md)    | Prioritized specs, fixtures, README/script updates, DoD summary                               | Optional healing/recording, avoid duplicate coverage | **+ Healing**: Pattern fixes enhanced with visual debugging + **+ Recording**: AI verified with live browser |
+| `*test-review` | [📖](../workflows/testarch/test-review/README.md) | Test quality review report with 0-100 score, violations, fixes                                | Reviews tests against knowledge base patterns        | -                                                                                                            |
+| `*nfr-assess`  | [📖](../workflows/testarch/nfr-assess/README.md)  | NFR assessment report with actions                                                            | Focus on security/performance/reliability            | -                                                                                                            |
+| `*trace`       | [📖](../workflows/testarch/trace/README.md)       | Phase 1: Coverage matrix, recommendations. Phase 2: Gate decision (PASS/CONCERNS/FAIL/WAIVED) | Two-phase workflow: traceability + gate decision     | -                                                                                                            |

-1. **Agent spec (`agents/tea.md`)** – declares the persona, critical actions, and the `run-workflow` entries for every TEA command. Critical actions instruct the agent to load `tea-index.csv` and then fetch only the fragments it needs from `knowledge/` before giving guidance.
-2. **Knowledge index (`tea-index.csv`)** – catalogues each fragment with tags and file paths. Workflows call out the IDs they need (e.g., `risk-governance`, `fixture-architecture`) so the agent loads targeted guidance instead of a monolithic brief.
-3. **Workflows (`workflows/testarch/*`)** – contain the task flows and reference `tea-index.csv` in their `<flow>`/`<notes>` sections to request specific fragments. Keeping all workflows in this directory ensures consistent discovery during planning (`*framework`), implementation (`*atdd`, `*automate`, `*trace`), and release (`*nfr-assess`, `*gate`).
+**📖** = Click to view detailed workflow documentation

-This separation lets us expand the knowledge base without touching agent wiring and keeps every command remote-controllable via the standard BMAD workflow runner. As navigation improves, we can add lightweight entrypoints or tags in the index without changing where workflows live.
+## Why TEA is Architecturally Different

-## Appendix
+TEA is the only BMM agent with its own top-level module directory (`bmm/testarch/`). This intentional design pattern reflects TEA's unique requirements:

- **Supporting Knowledge:**
-  - `tea-index.csv` – Catalog of knowledge fragments with tags and file paths under `knowledge/` for task-specific loading.
-  - `knowledge/*.md` – Focused summaries (fixtures, network, CI, levels, priorities, etc.) distilled from Murat’s external resources.
-  - `test-resources-for-ai-flat.txt` – Raw 347 KB archive retained for manual deep dives when a fragment needs source validation.
+<details>
+<summary><strong>Unique Architecture Pattern & Rationale</strong></summary>
+
+### Directory Structure
+
+```
+src/modules/bmm/
+├── agents/
+│   └── tea.agent.yaml          # Agent definition (standard location)
+├── workflows/
+│   └── testarch/               # TEA workflows (standard location)
+└── testarch/                   # Knowledge base (UNIQUE!)
+    ├── knowledge/              # 21 production-ready test pattern fragments
+    ├── tea-index.csv           # Centralized knowledge lookup (21 fragments indexed)
+    └── README.md               # This guide
+```
+
+### Why TEA Gets Special Treatment
+
+TEA uniquely requires **extensive domain knowledge** (21 fragments, 12,821 lines: test patterns, CI/CD, fixtures, quality practices, healing strategies), a **centralized reference system** (`tea-index.csv` for on-demand fragment loading), **cross-cutting concerns** (domain-specific patterns vs project-specific artifacts like PRDs/stories), and **optional MCP integration** (healing, exploratory, verification modes). Other BMM agents don't require this architecture.
+
+</details>
--- a/src/modules/bmm/testarch/knowledge/ci-burn-in.md
+++ b/src/modules/bmm/testarch/knowledge/ci-burn-in.md
@@ -1,9 +1,675 @@
 # CI Pipeline and Burn-In Strategy

- Stage jobs: install/caching once, run `test-changed` for quick feedback, then shard full suites with `fail-fast: false` so evidence isn’t lost.
- Re-run changed specs 5–10x (burn-in) before merging to flush flakes; fail the pipeline on the first inconsistent run.
- Upload artifacts on failure (videos, traces, HAR) and keep retry counts explicit—hidden retries hide instability.
- Use `wait-on` for app startup, enforce time budgets (<10 min per job), and document required secrets alongside workflows.
- Mirror CI scripts locally (`npm run test:ci`, `scripts/burn-in-changed.sh`) so devs reproduce pipeline behaviour exactly.
+## Principle

-_Source: Murat CI/CD strategy blog, Playwright/Cypress workflow examples._
+CI pipelines must execute tests reliably, quickly, and provide clear feedback. Burn-in testing (running changed tests multiple times) flushes out flakiness before merge. Stage jobs strategically: install/cache once, run changed specs first for fast feedback, then shard full suites with fail-fast disabled to preserve evidence.
+
+## Rationale
+
+CI is the quality gate for production. A poorly configured pipeline either wastes developer time (slow feedback, false positives) or ships broken code (false negatives, insufficient coverage). Burn-in testing ensures reliability by stress-testing changed code, while parallel execution and intelligent test selection optimize speed without sacrificing thoroughness.
+
+## Pattern Examples
+
+### Example 1: GitHub Actions Workflow with Parallel Execution
+
+**Context**: Production-ready CI/CD pipeline for E2E tests with caching, parallelization, and burn-in testing.
+
+**Implementation**:
+
+```yaml
+# .github/workflows/e2e-tests.yml
+name: E2E Tests
+on:
+  pull_request:
+  push:
+    branches: [main, develop]
+
+env:
+  NODE_VERSION_FILE: '.nvmrc'
+  CACHE_KEY: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+
+jobs:
+  install-dependencies:
+    name: Install & Cache Dependencies
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ${{ env.NODE_VERSION_FILE }}
+          cache: 'npm'
+
+      - name: Cache node modules
+        uses: actions/cache@v4
+        id: npm-cache
+        with:
+          path: |
+            ~/.npm
+            node_modules
+            ~/.cache/Cypress
+            ~/.cache/ms-playwright
+          key: ${{ env.CACHE_KEY }}
+          restore-keys: |
+            ${{ runner.os }}-node-
+
+      - name: Install dependencies
+        if: steps.npm-cache.outputs.cache-hit != 'true'
+        run: npm ci --prefer-offline --no-audit
+
+      - name: Install Playwright browsers
+        if: steps.npm-cache.outputs.cache-hit != 'true'
+        run: npx playwright install --with-deps chromium
+
+  test-changed-specs:
+    name: Test Changed Specs First (Burn-In)
+    needs: install-dependencies
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0 # Full history for accurate diff
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ${{ env.NODE_VERSION_FILE }}
+          cache: 'npm'
+
+      - name: Restore dependencies
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.npm
+            node_modules
+            ~/.cache/ms-playwright
+          key: ${{ env.CACHE_KEY }}
+
+      - name: Detect changed test files
+        id: changed-tests
+        run: |
+          CHANGED_SPECS=$(git diff --name-only origin/main...HEAD | grep -E '\.(spec|test)\.(ts|js|tsx|jsx)$' || echo "")
+          echo "changed_specs=${CHANGED_SPECS}" >> $GITHUB_OUTPUT
+          echo "Changed specs: ${CHANGED_SPECS}"
+
+      - name: Run burn-in on changed specs (10 iterations)
+        if: steps.changed-tests.outputs.changed_specs != ''
+        run: |
+          SPECS="${{ steps.changed-tests.outputs.changed_specs }}"
+          echo "Running burn-in: 10 iterations on changed specs"
+          for i in {1..10}; do
+            echo "Burn-in iteration $i/10"
+            npm run test -- $SPECS || {
+              echo "❌ Burn-in failed on iteration $i"
+              exit 1
+            }
+          done
+          echo "✅ Burn-in passed - 10/10 successful runs"
+
+      - name: Upload artifacts on failure
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: burn-in-failure-artifacts
+          path: |
+            test-results/
+            playwright-report/
+            screenshots/
+          retention-days: 7
+
+  test-e2e-sharded:
+    name: E2E Tests (Shard ${{ matrix.shard }}/${{ strategy.job-total }})
+    needs: [install-dependencies, test-changed-specs]
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    strategy:
+      fail-fast: false # Run all shards even if one fails
+      matrix:
+        shard: [1, 2, 3, 4]
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ${{ env.NODE_VERSION_FILE }}
+          cache: 'npm'
+
+      - name: Restore dependencies
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.npm
+            node_modules
+            ~/.cache/ms-playwright
+          key: ${{ env.CACHE_KEY }}
+
+      - name: Run E2E tests (shard ${{ matrix.shard }})
+        run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+        env:
+          TEST_ENV: staging
+          CI: true
+
+      - name: Upload test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-shard-${{ matrix.shard }}
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+      - name: Upload JUnit report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: junit-results-shard-${{ matrix.shard }}
+          path: test-results/junit.xml
+          retention-days: 30
+
+  merge-test-results:
+    name: Merge Test Results & Generate Report
+    needs: test-e2e-sharded
+    runs-on: ubuntu-latest
+    if: always()
+    steps:
+      - name: Download all shard results
+        uses: actions/download-artifact@v4
+        with:
+          pattern: test-results-shard-*
+          path: all-results/
+
+      - name: Merge HTML reports
+        run: |
+          npx playwright merge-reports --reporter=html all-results/
+          echo "Merged report available in playwright-report/"
+
+      - name: Upload merged report
+        uses: actions/upload-artifact@v4
+        with:
+          name: merged-playwright-report
+          path: playwright-report/
+          retention-days: 30
+
+      - name: Comment PR with results
+        if: github.event_name == 'pull_request'
+        uses: daun/playwright-report-comment@v3
+        with:
+          report-path: playwright-report/
+```
+
+**Key Points**:
+
+- **Install once, reuse everywhere**: Dependencies cached across all jobs
+- **Burn-in first**: Changed specs run 10x before full suite
+- **Fail-fast disabled**: All shards run to completion for full evidence
+- **Parallel execution**: 4 shards cut execution time by ~75%
+- **Artifact retention**: 30 days for reports, 7 days for failure debugging
+
+---
+
+### Example 2: Burn-In Loop Pattern (Standalone Script)
+
+**Context**: Reusable bash script for burn-in testing changed specs locally or in CI.
+
+**Implementation**:
+
+```bash
+#!/bin/bash
+# scripts/burn-in-changed.sh
+# Usage: ./scripts/burn-in-changed.sh [iterations] [base-branch]
+
+set -e  # Exit on error
+
+# Configuration
+ITERATIONS=${1:-10}
+BASE_BRANCH=${2:-main}
+SPEC_PATTERN='\.(spec|test)\.(ts|js|tsx|jsx)$'
+
+echo "🔥 Burn-In Test Runner"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Iterations: $ITERATIONS"
+echo "Base branch: $BASE_BRANCH"
+echo ""
+
+# Detect changed test files
+echo "📋 Detecting changed test files..."
+CHANGED_SPECS=$(git diff --name-only $BASE_BRANCH...HEAD | grep -E "$SPEC_PATTERN" || echo "")
+
+if [ -z "$CHANGED_SPECS" ]; then
+  echo "✅ No test files changed. Skipping burn-in."
+  exit 0
+fi
+
+echo "Changed test files:"
+echo "$CHANGED_SPECS" | sed 's/^/  - /'
+echo ""
+
+# Count specs
+SPEC_COUNT=$(echo "$CHANGED_SPECS" | wc -l | xargs)
+echo "Running burn-in on $SPEC_COUNT test file(s)..."
+echo ""
+
+# Burn-in loop
+FAILURES=()
+for i in $(seq 1 $ITERATIONS); do
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  echo "🔄 Iteration $i/$ITERATIONS"
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+  # Run tests with explicit file list
+  if npm run test -- $CHANGED_SPECS 2>&1 | tee "burn-in-log-$i.txt"; then
+    echo "✅ Iteration $i passed"
+  else
+    echo "❌ Iteration $i failed"
+    FAILURES+=($i)
+
+    # Save failure artifacts
+    mkdir -p burn-in-failures/iteration-$i
+    cp -r test-results/ burn-in-failures/iteration-$i/ 2>/dev/null || true
+    cp -r screenshots/ burn-in-failures/iteration-$i/ 2>/dev/null || true
+
+    echo ""
+    echo "🛑 BURN-IN FAILED on iteration $i"
+    echo "Failure artifacts saved to: burn-in-failures/iteration-$i/"
+    echo "Logs saved to: burn-in-log-$i.txt"
+    echo ""
+    exit 1
+  fi
+
+  echo ""
+done
+
+# Success summary
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "🎉 BURN-IN PASSED"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "All $ITERATIONS iterations passed for $SPEC_COUNT test file(s)"
+echo "Changed specs are stable and ready to merge."
+echo ""
+
+# Cleanup logs
+rm -f burn-in-log-*.txt
+
+exit 0
+```
+
+**Usage**:
+
+```bash
+# Run locally with default settings (10 iterations, compare to main)
+./scripts/burn-in-changed.sh
+
+# Custom iterations and base branch
+./scripts/burn-in-changed.sh 20 develop
+
+# Add to package.json
+{
+  "scripts": {
+    "test:burn-in": "bash scripts/burn-in-changed.sh",
+    "test:burn-in:strict": "bash scripts/burn-in-changed.sh 20"
+  }
+}
+```
+
+**Key Points**:
+
+- **Exit on first failure**: Flaky tests caught immediately
+- **Failure artifacts**: Saved per-iteration for debugging
+- **Flexible configuration**: Iterations and base branch customizable
+- **CI/local parity**: Same script runs in both environments
+- **Clear output**: Visual feedback on progress and results
+
+---
+
+### Example 3: Shard Orchestration with Result Aggregation
+
+**Context**: Advanced sharding strategy for large test suites with intelligent result merging.
+
+**Implementation**:
+
+```javascript
+// scripts/run-sharded-tests.js
+const { spawn } = require('child_process');
+const fs = require('fs');
+const path = require('path');
+
+/**
+ * Run tests across multiple shards and aggregate results
+ * Usage: node scripts/run-sharded-tests.js --shards=4 --env=staging
+ */
+
+const SHARD_COUNT = parseInt(process.env.SHARD_COUNT || '4');
+const TEST_ENV = process.env.TEST_ENV || 'local';
+const RESULTS_DIR = path.join(__dirname, '../test-results');
+
+console.log(`🚀 Running tests across ${SHARD_COUNT} shards`);
+console.log(`Environment: ${TEST_ENV}`);
+console.log('━'.repeat(50));
+
+// Ensure results directory exists
+if (!fs.existsSync(RESULTS_DIR)) {
+  fs.mkdirSync(RESULTS_DIR, { recursive: true });
+}
+
+/**
+ * Run a single shard
+ */
+function runShard(shardIndex) {
+  return new Promise((resolve, reject) => {
+    const shardId = `${shardIndex}/${SHARD_COUNT}`;
+    console.log(`\n📦 Starting shard ${shardId}...`);
+
+    const child = spawn('npx', ['playwright', 'test', `--shard=${shardId}`, '--reporter=json'], {
+      env: { ...process.env, TEST_ENV, SHARD_INDEX: shardIndex },
+      stdio: 'pipe',
+    });
+
+    let stdout = '';
+    let stderr = '';
+
+    child.stdout.on('data', (data) => {
+      stdout += data.toString();
+      process.stdout.write(data);
+    });
+
+    child.stderr.on('data', (data) => {
+      stderr += data.toString();
+      process.stderr.write(data);
+    });
+
+    child.on('close', (code) => {
+      // Save shard results
+      const resultFile = path.join(RESULTS_DIR, `shard-${shardIndex}.json`);
+      try {
+        const result = JSON.parse(stdout);
+        fs.writeFileSync(resultFile, JSON.stringify(result, null, 2));
+        console.log(`✅ Shard ${shardId} completed (exit code: ${code})`);
+        resolve({ shardIndex, code, result });
+      } catch (error) {
+        console.error(`❌ Shard ${shardId} failed to parse results:`, error.message);
+        reject({ shardIndex, code, error });
+      }
+    });
+
+    child.on('error', (error) => {
+      console.error(`❌ Shard ${shardId} process error:`, error.message);
+      reject({ shardIndex, error });
+    });
+  });
+}
+
+/**
+ * Aggregate results from all shards
+ */
+function aggregateResults() {
+  console.log('\n📊 Aggregating results from all shards...');
+
+  const shardResults = [];
+  let totalTests = 0;
+  let totalPassed = 0;
+  let totalFailed = 0;
+  let totalSkipped = 0;
+  let totalFlaky = 0;
+
+  for (let i = 1; i <= SHARD_COUNT; i++) {
+    const resultFile = path.join(RESULTS_DIR, `shard-${i}.json`);
+    if (fs.existsSync(resultFile)) {
+      const result = JSON.parse(fs.readFileSync(resultFile, 'utf8'));
+      shardResults.push(result);
+
+      // Aggregate stats
+      totalTests += result.stats?.expected || 0;
+      totalPassed += result.stats?.expected || 0;
+      totalFailed += result.stats?.unexpected || 0;
+      totalSkipped += result.stats?.skipped || 0;
+      totalFlaky += result.stats?.flaky || 0;
+    }
+  }
+
+  const summary = {
+    totalShards: SHARD_COUNT,
+    environment: TEST_ENV,
+    totalTests,
+    passed: totalPassed,
+    failed: totalFailed,
+    skipped: totalSkipped,
+    flaky: totalFlaky,
+    duration: shardResults.reduce((acc, r) => acc + (r.duration || 0), 0),
+    timestamp: new Date().toISOString(),
+  };
+
+  // Save aggregated summary
+  fs.writeFileSync(path.join(RESULTS_DIR, 'summary.json'), JSON.stringify(summary, null, 2));
+
+  console.log('\n━'.repeat(50));
+  console.log('📈 Test Results Summary');
+  console.log('━'.repeat(50));
+  console.log(`Total tests:    ${totalTests}`);
+  console.log(`✅ Passed:      ${totalPassed}`);
+  console.log(`❌ Failed:      ${totalFailed}`);
+  console.log(`⏭️  Skipped:     ${totalSkipped}`);
+  console.log(`⚠️  Flaky:       ${totalFlaky}`);
+  console.log(`⏱️  Duration:    ${(summary.duration / 1000).toFixed(2)}s`);
+  console.log('━'.repeat(50));
+
+  return summary;
+}
+
+/**
+ * Main execution
+ */
+async function main() {
+  const startTime = Date.now();
+  const shardPromises = [];
+
+  // Run all shards in parallel
+  for (let i = 1; i <= SHARD_COUNT; i++) {
+    shardPromises.push(runShard(i));
+  }
+
+  try {
+    await Promise.allSettled(shardPromises);
+  } catch (error) {
+    console.error('❌ One or more shards failed:', error);
+  }
+
+  // Aggregate results
+  const summary = aggregateResults();
+
+  const totalTime = ((Date.now() - startTime) / 1000).toFixed(2);
+  console.log(`\n⏱️  Total execution time: ${totalTime}s`);
+
+  // Exit with failure if any tests failed
+  if (summary.failed > 0) {
+    console.error('\n❌ Test suite failed');
+    process.exit(1);
+  }
+
+  console.log('\n✅ All tests passed');
+  process.exit(0);
+}
+
+main().catch((error) => {
+  console.error('Fatal error:', error);
+  process.exit(1);
+});
+```
+
+**package.json integration**:
+
+```json
+{
+  "scripts": {
+    "test:sharded": "node scripts/run-sharded-tests.js",
+    "test:sharded:ci": "SHARD_COUNT=8 TEST_ENV=staging node scripts/run-sharded-tests.js"
+  }
+}
+```
+
+**Key Points**:
+
+- **Parallel shard execution**: All shards run simultaneously
+- **Result aggregation**: Unified summary across shards
+- **Failure detection**: Exit code reflects overall test status
+- **Artifact preservation**: Individual shard results saved for debugging
+- **CI/local compatibility**: Same script works in both environments
+
+---
+
+### Example 4: Selective Test Execution (Changed Files + Tags)
+
+**Context**: Optimize CI by running only relevant tests based on file changes and tags.
+
+**Implementation**:
+
+```bash
+#!/bin/bash
+# scripts/selective-test-runner.sh
+# Intelligent test selection based on changed files and test tags
+
+set -e
+
+BASE_BRANCH=${BASE_BRANCH:-main}
+TEST_ENV=${TEST_ENV:-local}
+
+echo "🎯 Selective Test Runner"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Base branch: $BASE_BRANCH"
+echo "Environment: $TEST_ENV"
+echo ""
+
+# Detect changed files (all types, not just tests)
+CHANGED_FILES=$(git diff --name-only $BASE_BRANCH...HEAD)
+
+if [ -z "$CHANGED_FILES" ]; then
+  echo "✅ No files changed. Skipping tests."
+  exit 0
+fi
+
+echo "Changed files:"
+echo "$CHANGED_FILES" | sed 's/^/  - /'
+echo ""
+
+# Determine test strategy based on changes
+run_smoke_only=false
+run_all_tests=false
+affected_specs=""
+
+# Critical files = run all tests
+if echo "$CHANGED_FILES" | grep -qE '(package\.json|package-lock\.json|playwright\.config|cypress\.config|\.github/workflows)'; then
+  echo "⚠️  Critical configuration files changed. Running ALL tests."
+  run_all_tests=true
+
+# Auth/security changes = run all auth + smoke tests
+elif echo "$CHANGED_FILES" | grep -qE '(auth|login|signup|security)'; then
+  echo "🔒 Auth/security files changed. Running auth + smoke tests."
+  npm run test -- --grep "@auth|@smoke"
+  exit $?
+
+# API changes = run integration + smoke tests
+elif echo "$CHANGED_FILES" | grep -qE '(api|service|controller)'; then
+  echo "🔌 API files changed. Running integration + smoke tests."
+  npm run test -- --grep "@integration|@smoke"
+  exit $?
+
+# UI component changes = run related component tests
+elif echo "$CHANGED_FILES" | grep -qE '\.(tsx|jsx|vue)$'; then
+  echo "🎨 UI components changed. Running component + smoke tests."
+
+  # Extract component names and find related tests
+  components=$(echo "$CHANGED_FILES" | grep -E '\.(tsx|jsx|vue)$' | xargs -I {} basename {} | sed 's/\.[^.]*$//')
+  for component in $components; do
+    # Find tests matching component name
+    affected_specs+=$(find tests -name "*${component}*" -type f) || true
+  done
+
+  if [ -n "$affected_specs" ]; then
+    echo "Running tests for: $affected_specs"
+    npm run test -- $affected_specs --grep "@smoke"
+  else
+    echo "No specific tests found. Running smoke tests only."
+    npm run test -- --grep "@smoke"
+  fi
+  exit $?
+
+# Documentation/config only = run smoke tests
+elif echo "$CHANGED_FILES" | grep -qE '\.(md|txt|json|yml|yaml)$'; then
+  echo "📝 Documentation/config files changed. Running smoke tests only."
+  run_smoke_only=true
+else
+  echo "⚙️  Other files changed. Running smoke tests."
+  run_smoke_only=true
+fi
+
+# Execute selected strategy
+if [ "$run_all_tests" = true ]; then
+  echo ""
+  echo "Running full test suite..."
+  npm run test
+elif [ "$run_smoke_only" = true ]; then
+  echo ""
+  echo "Running smoke tests..."
+  npm run test -- --grep "@smoke"
+fi
+```
+
+**Usage in GitHub Actions**:
+
+```yaml
+# .github/workflows/selective-tests.yml
+name: Selective Tests
+on: pull_request
+
+jobs:
+  selective-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Run selective tests
+        run: bash scripts/selective-test-runner.sh
+        env:
+          BASE_BRANCH: ${{ github.base_ref }}
+          TEST_ENV: staging
+```
+
+**Key Points**:
+
+- **Intelligent routing**: Tests selected based on changed file types
+- **Tag-based filtering**: Use @smoke, @auth, @integration tags
+- **Fast feedback**: Only relevant tests run on most PRs
+- **Safety net**: Critical changes trigger full suite
+- **Component mapping**: UI changes run related component tests
+
+---
+
+## CI Configuration Checklist
+
+Before deploying your CI pipeline, verify:
+
+- [ ] **Caching strategy**: node_modules, npm cache, browser binaries cached
+- [ ] **Timeout budgets**: Each job has reasonable timeout (10-30 min)
+- [ ] **Artifact retention**: 30 days for reports, 7 days for failure artifacts
+- [ ] **Parallelization**: Matrix strategy uses fail-fast: false
+- [ ] **Burn-in enabled**: Changed specs run 5-10x before merge
+- [ ] **wait-on app startup**: CI waits for app (wait-on: 'http://localhost:3000')
+- [ ] **Secrets documented**: README lists required secrets (API keys, tokens)
+- [ ] **Local parity**: CI scripts runnable locally (npm run test:ci)
+
+## Integration Points
+
+- Used in workflows: `*ci` (CI/CD pipeline setup)
+- Related fragments: `selective-testing.md`, `playwright-config.md`, `test-quality.md`
+- CI tools: GitHub Actions, GitLab CI, CircleCI, Jenkins
+
+_Source: Murat CI/CD strategy blog, Playwright/Cypress workflow examples, SEON production pipelines_
--- a/src/modules/bmm/testarch/knowledge/component-tdd.md
+++ b/src/modules/bmm/testarch/knowledge/component-tdd.md
@@ -1,9 +1,486 @@
 # Component Test-Driven Development Loop

- Start every UI change with a failing component spec (`cy.mount` or RTL `render`); ship only after red → green → refactor passes.
- Recreate providers/stores per spec to prevent state bleed and keep parallel runs deterministic.
- Use factories to exercise prop/state permutations; cover accessibility by asserting against roles, labels, and keyboard flows.
- Keep component specs under ~100 lines: split by intent (rendering, state transitions, error messaging) to preserve clarity.
- Pair component tests with visual debugging (Cypress runner, Storybook, Playwright trace viewer) to accelerate diagnosis.
+## Principle

-_Source: CCTDD repository, Murat component testing talks._
+Start every UI change with a failing component test (`cy.mount`, Playwright component test, or RTL `render`). Follow the Red-Green-Refactor cycle: write a failing test (red), make it pass with minimal code (green), then improve the implementation (refactor). Ship only after the cycle completes. Keep component tests under 100 lines, isolated with fresh providers per test, and validate accessibility alongside functionality.
+
+## Rationale
+
+Component TDD provides immediate feedback during development. Failing tests (red) clarify requirements before writing code. Minimal implementations (green) prevent over-engineering. Refactoring with passing tests ensures changes don't break functionality. Isolated tests with fresh providers prevent state bleed in parallel runs. Accessibility assertions catch usability issues early. Visual debugging (Cypress runner, Storybook, Playwright trace viewer) accelerates diagnosis when tests fail.
+
+## Pattern Examples
+
+### Example 1: Red-Green-Refactor Loop
+
+**Context**: When building a new component, start with a failing test that describes the desired behavior. Implement just enough to pass, then refactor for quality.
+
+**Implementation**:
+
+```typescript
+// Step 1: RED - Write failing test
+// Button.cy.tsx (Cypress Component Test)
+import { Button } from './Button';
+
+describe('Button Component', () => {
+  it('should render with label', () => {
+    cy.mount(<Button label="Click Me" />);
+    cy.contains('Click Me').should('be.visible');
+  });
+
+  it('should call onClick when clicked', () => {
+    const onClickSpy = cy.stub().as('onClick');
+    cy.mount(<Button label="Submit" onClick={onClickSpy} />);
+
+    cy.get('button').click();
+    cy.get('@onClick').should('have.been.calledOnce');
+  });
+});
+
+// Run test: FAILS - Button component doesn't exist yet
+// Error: "Cannot find module './Button'"
+
+// Step 2: GREEN - Minimal implementation
+// Button.tsx
+type ButtonProps = {
+  label: string;
+  onClick?: () => void;
+};
+
+export const Button = ({ label, onClick }: ButtonProps) => {
+  return <button onClick={onClick}>{label}</button>;
+};
+
+// Run test: PASSES - Component renders and handles clicks
+
+// Step 3: REFACTOR - Improve implementation
+// Add disabled state, loading state, variants
+type ButtonProps = {
+  label: string;
+  onClick?: () => void;
+  disabled?: boolean;
+  loading?: boolean;
+  variant?: 'primary' | 'secondary' | 'danger';
+};
+
+export const Button = ({
+  label,
+  onClick,
+  disabled = false,
+  loading = false,
+  variant = 'primary'
+}: ButtonProps) => {
+  return (
+    <button
+      onClick={onClick}
+      disabled={disabled || loading}
+      className={`btn btn-${variant}`}
+      data-testid="button"
+    >
+      {loading ? <Spinner /> : label}
+    </button>
+  );
+};
+
+// Step 4: Expand tests for new features
+describe('Button Component', () => {
+  it('should render with label', () => {
+    cy.mount(<Button label="Click Me" />);
+    cy.contains('Click Me').should('be.visible');
+  });
+
+  it('should call onClick when clicked', () => {
+    const onClickSpy = cy.stub().as('onClick');
+    cy.mount(<Button label="Submit" onClick={onClickSpy} />);
+
+    cy.get('button').click();
+    cy.get('@onClick').should('have.been.calledOnce');
+  });
+
+  it('should be disabled when disabled prop is true', () => {
+    cy.mount(<Button label="Submit" disabled={true} />);
+    cy.get('button').should('be.disabled');
+  });
+
+  it('should show spinner when loading', () => {
+    cy.mount(<Button label="Submit" loading={true} />);
+    cy.get('[data-testid="spinner"]').should('be.visible');
+    cy.get('button').should('be.disabled');
+  });
+
+  it('should apply variant styles', () => {
+    cy.mount(<Button label="Delete" variant="danger" />);
+    cy.get('button').should('have.class', 'btn-danger');
+  });
+});
+
+// Run tests: ALL PASS - Refactored component still works
+
+// Playwright Component Test equivalent
+import { test, expect } from '@playwright/experimental-ct-react';
+import { Button } from './Button';
+
+test.describe('Button Component', () => {
+  test('should call onClick when clicked', async ({ mount }) => {
+    let clicked = false;
+    const component = await mount(
+      <Button label="Submit" onClick={() => { clicked = true; }} />
+    );
+
+    await component.getByRole('button').click();
+    expect(clicked).toBe(true);
+  });
+
+  test('should be disabled when loading', async ({ mount }) => {
+    const component = await mount(<Button label="Submit" loading={true} />);
+    await expect(component.getByRole('button')).toBeDisabled();
+    await expect(component.getByTestId('spinner')).toBeVisible();
+  });
+});
+```
+
+**Key Points**:
+
+- Red: Write failing test first - clarifies requirements before coding
+- Green: Implement minimal code to pass - prevents over-engineering
+- Refactor: Improve code quality while keeping tests green
+- Expand: Add tests for new features after refactoring
+- Cycle repeats: Each new feature starts with a failing test
+
+### Example 2: Provider Isolation Pattern
+
+**Context**: When testing components that depend on context providers (React Query, Auth, Router), wrap them with required providers in each test to prevent state bleed between tests.
+
+**Implementation**:
+
+```typescript
+// test-utils/AllTheProviders.tsx
+import { FC, ReactNode } from 'react';
+import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
+import { BrowserRouter } from 'react-router-dom';
+import { AuthProvider } from '../contexts/AuthContext';
+
+type Props = {
+  children: ReactNode;
+  initialAuth?: { user: User | null; token: string | null };
+};
+
+export const AllTheProviders: FC<Props> = ({ children, initialAuth }) => {
+  // Create NEW QueryClient per test (prevent state bleed)
+  const queryClient = new QueryClient({
+    defaultOptions: {
+      queries: { retry: false },
+      mutations: { retry: false }
+    }
+  });
+
+  return (
+    <QueryClientProvider client={queryClient}>
+      <BrowserRouter>
+        <AuthProvider initialAuth={initialAuth}>
+          {children}
+        </AuthProvider>
+      </BrowserRouter>
+    </QueryClientProvider>
+  );
+};
+
+// Cypress custom mount command
+// cypress/support/component.tsx
+import { mount } from 'cypress/react18';
+import { AllTheProviders } from '../../test-utils/AllTheProviders';
+
+Cypress.Commands.add('wrappedMount', (component, options = {}) => {
+  const { initialAuth, ...mountOptions } = options;
+
+  return mount(
+    <AllTheProviders initialAuth={initialAuth}>
+      {component}
+    </AllTheProviders>,
+    mountOptions
+  );
+});
+
+// Usage in tests
+// UserProfile.cy.tsx
+import { UserProfile } from './UserProfile';
+
+describe('UserProfile Component', () => {
+  it('should display user when authenticated', () => {
+    const user = { id: 1, name: 'John Doe', email: 'john@example.com' };
+
+    cy.wrappedMount(<UserProfile />, {
+      initialAuth: { user, token: 'fake-token' }
+    });
+
+    cy.contains('John Doe').should('be.visible');
+    cy.contains('john@example.com').should('be.visible');
+  });
+
+  it('should show login prompt when not authenticated', () => {
+    cy.wrappedMount(<UserProfile />, {
+      initialAuth: { user: null, token: null }
+    });
+
+    cy.contains('Please log in').should('be.visible');
+  });
+});
+
+// Playwright Component Test with providers
+import { test, expect } from '@playwright/experimental-ct-react';
+import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
+import { UserProfile } from './UserProfile';
+import { AuthProvider } from '../contexts/AuthContext';
+
+test.describe('UserProfile Component', () => {
+  test('should display user when authenticated', async ({ mount }) => {
+    const user = { id: 1, name: 'John Doe', email: 'john@example.com' };
+    const queryClient = new QueryClient();
+
+    const component = await mount(
+      <QueryClientProvider client={queryClient}>
+        <AuthProvider initialAuth={{ user, token: 'fake-token' }}>
+          <UserProfile />
+        </AuthProvider>
+      </QueryClientProvider>
+    );
+
+    await expect(component.getByText('John Doe')).toBeVisible();
+    await expect(component.getByText('john@example.com')).toBeVisible();
+  });
+});
+```
+
+**Key Points**:
+
+- Create NEW providers per test (QueryClient, Router, Auth)
+- Prevents state pollution between tests
+- `initialAuth` prop allows testing different auth states
+- Custom mount command (`wrappedMount`) reduces boilerplate
+- Providers wrap component, not the entire test suite
+
+### Example 3: Accessibility Assertions
+
+**Context**: When testing components, validate accessibility alongside functionality using axe-core, ARIA roles, labels, and keyboard navigation.
+
+**Implementation**:
+
+```typescript
+// Cypress with axe-core
+// cypress/support/component.tsx
+import 'cypress-axe';
+
+// Form.cy.tsx
+import { Form } from './Form';
+
+describe('Form Component Accessibility', () => {
+  beforeEach(() => {
+    cy.wrappedMount(<Form />);
+    cy.injectAxe(); // Inject axe-core
+  });
+
+  it('should have no accessibility violations', () => {
+    cy.checkA11y(); // Run axe scan
+  });
+
+  it('should have proper ARIA labels', () => {
+    cy.get('input[name="email"]').should('have.attr', 'aria-label', 'Email address');
+    cy.get('input[name="password"]').should('have.attr', 'aria-label', 'Password');
+    cy.get('button[type="submit"]').should('have.attr', 'aria-label', 'Submit form');
+  });
+
+  it('should support keyboard navigation', () => {
+    // Tab through form fields
+    cy.get('input[name="email"]').focus().type('test@example.com');
+    cy.realPress('Tab'); // cypress-real-events plugin
+    cy.focused().should('have.attr', 'name', 'password');
+
+    cy.focused().type('password123');
+    cy.realPress('Tab');
+    cy.focused().should('have.attr', 'type', 'submit');
+
+    cy.realPress('Enter'); // Submit via keyboard
+    cy.contains('Form submitted').should('be.visible');
+  });
+
+  it('should announce errors to screen readers', () => {
+    cy.get('button[type="submit"]').click(); // Submit without data
+
+    // Error has role="alert" and aria-live="polite"
+    cy.get('[role="alert"]')
+      .should('be.visible')
+      .and('have.attr', 'aria-live', 'polite')
+      .and('contain', 'Email is required');
+  });
+
+  it('should have sufficient color contrast', () => {
+    cy.checkA11y(null, {
+      rules: {
+        'color-contrast': { enabled: true }
+      }
+    });
+  });
+});
+
+// Playwright with axe-playwright
+import { test, expect } from '@playwright/experimental-ct-react';
+import AxeBuilder from '@axe-core/playwright';
+import { Form } from './Form';
+
+test.describe('Form Component Accessibility', () => {
+  test('should have no accessibility violations', async ({ mount, page }) => {
+    await mount(<Form />);
+
+    const accessibilityScanResults = await new AxeBuilder({ page })
+      .analyze();
+
+    expect(accessibilityScanResults.violations).toEqual([]);
+  });
+
+  test('should support keyboard navigation', async ({ mount, page }) => {
+    const component = await mount(<Form />);
+
+    await component.getByLabel('Email address').fill('test@example.com');
+    await page.keyboard.press('Tab');
+
+    await expect(component.getByLabel('Password')).toBeFocused();
+
+    await component.getByLabel('Password').fill('password123');
+    await page.keyboard.press('Tab');
+
+    await expect(component.getByRole('button', { name: 'Submit form' })).toBeFocused();
+
+    await page.keyboard.press('Enter');
+    await expect(component.getByText('Form submitted')).toBeVisible();
+  });
+});
+```
+
+**Key Points**:
+
+- Use `cy.checkA11y()` (Cypress) or `AxeBuilder` (Playwright) for automated accessibility scanning
+- Validate ARIA roles, labels, and live regions
+- Test keyboard navigation (Tab, Enter, Escape)
+- Ensure errors are announced to screen readers (`role="alert"`, `aria-live`)
+- Check color contrast meets WCAG standards
+
+### Example 4: Visual Regression Test
+
+**Context**: When testing components, capture screenshots to detect unintended visual changes. Use Playwright visual comparison or Cypress snapshot plugins.
+
+**Implementation**:
+
+```typescript
+// Playwright visual regression
+import { test, expect } from '@playwright/experimental-ct-react';
+import { Button } from './Button';
+
+test.describe('Button Visual Regression', () => {
+  test('should match primary button snapshot', async ({ mount }) => {
+    const component = await mount(<Button label="Primary" variant="primary" />);
+
+    // Capture and compare screenshot
+    await expect(component).toHaveScreenshot('button-primary.png');
+  });
+
+  test('should match secondary button snapshot', async ({ mount }) => {
+    const component = await mount(<Button label="Secondary" variant="secondary" />);
+    await expect(component).toHaveScreenshot('button-secondary.png');
+  });
+
+  test('should match disabled button snapshot', async ({ mount }) => {
+    const component = await mount(<Button label="Disabled" disabled={true} />);
+    await expect(component).toHaveScreenshot('button-disabled.png');
+  });
+
+  test('should match loading button snapshot', async ({ mount }) => {
+    const component = await mount(<Button label="Loading" loading={true} />);
+    await expect(component).toHaveScreenshot('button-loading.png');
+  });
+});
+
+// Cypress visual regression with percy or snapshot plugins
+import { Button } from './Button';
+
+describe('Button Visual Regression', () => {
+  it('should match primary button snapshot', () => {
+    cy.wrappedMount(<Button label="Primary" variant="primary" />);
+
+    // Option 1: Percy (cloud-based visual testing)
+    cy.percySnapshot('Button - Primary');
+
+    // Option 2: cypress-plugin-snapshots (local snapshots)
+    cy.get('button').toMatchImageSnapshot({
+      name: 'button-primary',
+      threshold: 0.01 // 1% threshold for pixel differences
+    });
+  });
+
+  it('should match hover state', () => {
+    cy.wrappedMount(<Button label="Hover Me" />);
+    cy.get('button').realHover(); // cypress-real-events
+    cy.percySnapshot('Button - Hover State');
+  });
+
+  it('should match focus state', () => {
+    cy.wrappedMount(<Button label="Focus Me" />);
+    cy.get('button').focus();
+    cy.percySnapshot('Button - Focus State');
+  });
+});
+
+// Playwright configuration for visual regression
+// playwright.config.ts
+export default defineConfig({
+  expect: {
+    toHaveScreenshot: {
+      maxDiffPixels: 100, // Allow 100 pixels difference
+      threshold: 0.2 // 20% threshold
+    }
+  },
+  use: {
+    screenshot: 'only-on-failure'
+  }
+});
+
+// Update snapshots when intentional changes are made
+// npx playwright test --update-snapshots
+```
+
+**Key Points**:
+
+- Playwright: Use `toHaveScreenshot()` for built-in visual comparison
+- Cypress: Use Percy (cloud) or snapshot plugins (local) for visual testing
+- Capture different states: default, hover, focus, disabled, loading
+- Set threshold for acceptable pixel differences (avoid false positives)
+- Update snapshots when visual changes are intentional
+- Visual tests catch unintended CSS/layout regressions
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (component test generation), `*automate` (component test expansion), `*framework` (component testing setup)
+- **Related fragments**:
+  - `test-quality.md` - Keep component tests <100 lines, isolated, focused
+  - `fixture-architecture.md` - Provider wrapping patterns, custom mount commands
+  - `data-factories.md` - Factory functions for component props
+  - `test-levels-framework.md` - When to use component tests vs E2E tests
+
+## TDD Workflow Summary
+
+**Red-Green-Refactor Cycle**:
+
+1. **Red**: Write failing test describing desired behavior
+2. **Green**: Implement minimal code to make test pass
+3. **Refactor**: Improve code quality, tests stay green
+4. **Repeat**: Each new feature starts with failing test
+
+**Component Test Checklist**:
+
+- [ ] Test renders with required props
+- [ ] Test user interactions (click, type, submit)
+- [ ] Test different states (loading, error, disabled)
+- [ ] Test accessibility (ARIA, keyboard navigation)
+- [ ] Test visual regression (snapshots)
+- [ ] Isolate with fresh providers (no state bleed)
+- [ ] Keep tests <100 lines (split by intent)
+
+_Source: CCTDD repository, Murat component testing talks, Playwright/Cypress component testing docs._
--- a/src/modules/bmm/testarch/knowledge/contract-testing.md
+++ b/src/modules/bmm/testarch/knowledge/contract-testing.md
@@ -1,9 +1,957 @@
 # Contract Testing Essentials (Pact)

- Store consumer contracts beside the integration specs that generate them; version contracts semantically and publish on every CI run.
- Require provider verification before merge; failed verification blocks release and surfaces breaking changes immediately.
- Capture fallback behaviour inside interactions (timeouts, retries, error payloads) so resilience guarantees remain explicit.
- Automate broker housekeeping: tag releases, archive superseded contracts, and expire unused pacts to reduce noise.
- Pair contract suites with API smoke or component tests to validate data mapping and UI rendering in tandem.
+## Principle

-_Source: Pact consumer/provider sample repos, Murat contract testing blog._
+Contract testing validates API contracts between consumer and provider services without requiring integrated end-to-end tests. Store consumer contracts alongside integration specs, version contracts semantically, and publish on every CI run. Provider verification before merge surfaces breaking changes immediately, while explicit fallback behavior (timeouts, retries, error payloads) captures resilience guarantees in contracts.
+
+## Rationale
+
+Traditional integration testing requires running both consumer and provider simultaneously, creating slow, flaky tests with complex setup. Contract testing decouples services: consumers define expectations (pact files), providers verify against those expectations independently. This enables parallel development, catches breaking changes early, and documents API behavior as executable specifications. Pair contract tests with API smoke tests to validate data mapping and UI rendering in tandem.
+
+## Pattern Examples
+
+### Example 1: Pact Consumer Test (Frontend → Backend API)
+
+**Context**: React application consuming a user management API, defining expected interactions.
+
+**Implementation**:
+
+```typescript
+// tests/contract/user-api.pact.spec.ts
+import { PactV3, MatchersV3 } from '@pact-foundation/pact';
+import { getUserById, createUser, User } from '@/api/user-service';
+
+const { like, eachLike, string, integer } = MatchersV3;
+
+/**
+ * Consumer-Driven Contract Test
+ * - Consumer (React app) defines expected API behavior
+ * - Generates pact file for provider to verify
+ * - Runs in isolation (no real backend required)
+ */
+
+const provider = new PactV3({
+  consumer: 'user-management-web',
+  provider: 'user-api-service',
+  dir: './pacts', // Output directory for pact files
+  logLevel: 'warn',
+});
+
+describe('User API Contract', () => {
+  describe('GET /users/:id', () => {
+    it('should return user when user exists', async () => {
+      // Arrange: Define expected interaction
+      await provider
+        .given('user with id 1 exists') // Provider state
+        .uponReceiving('a request for user 1')
+        .withRequest({
+          method: 'GET',
+          path: '/users/1',
+          headers: {
+            Accept: 'application/json',
+            Authorization: like('Bearer token123'), // Matcher: any string
+          },
+        })
+        .willRespondWith({
+          status: 200,
+          headers: {
+            'Content-Type': 'application/json',
+          },
+          body: like({
+            id: integer(1),
+            name: string('John Doe'),
+            email: string('john@example.com'),
+            role: string('user'),
+            createdAt: string('2025-01-15T10:00:00Z'),
+          }),
+        })
+        .executeTest(async (mockServer) => {
+          // Act: Call consumer code against mock server
+          const user = await getUserById(1, {
+            baseURL: mockServer.url,
+            headers: { Authorization: 'Bearer token123' },
+          });
+
+          // Assert: Validate consumer behavior
+          expect(user).toEqual(
+            expect.objectContaining({
+              id: 1,
+              name: 'John Doe',
+              email: 'john@example.com',
+              role: 'user',
+            }),
+          );
+        });
+    });
+
+    it('should handle 404 when user does not exist', async () => {
+      await provider
+        .given('user with id 999 does not exist')
+        .uponReceiving('a request for non-existent user')
+        .withRequest({
+          method: 'GET',
+          path: '/users/999',
+          headers: { Accept: 'application/json' },
+        })
+        .willRespondWith({
+          status: 404,
+          headers: { 'Content-Type': 'application/json' },
+          body: {
+            error: 'User not found',
+            code: 'USER_NOT_FOUND',
+          },
+        })
+        .executeTest(async (mockServer) => {
+          // Act & Assert: Consumer handles 404 gracefully
+          await expect(getUserById(999, { baseURL: mockServer.url })).rejects.toThrow('User not found');
+        });
+    });
+  });
+
+  describe('POST /users', () => {
+    it('should create user and return 201', async () => {
+      const newUser: Omit<User, 'id' | 'createdAt'> = {
+        name: 'Jane Smith',
+        email: 'jane@example.com',
+        role: 'admin',
+      };
+
+      await provider
+        .given('no users exist')
+        .uponReceiving('a request to create a user')
+        .withRequest({
+          method: 'POST',
+          path: '/users',
+          headers: {
+            'Content-Type': 'application/json',
+            Accept: 'application/json',
+          },
+          body: like(newUser),
+        })
+        .willRespondWith({
+          status: 201,
+          headers: { 'Content-Type': 'application/json' },
+          body: like({
+            id: integer(2),
+            name: string('Jane Smith'),
+            email: string('jane@example.com'),
+            role: string('admin'),
+            createdAt: string('2025-01-15T11:00:00Z'),
+          }),
+        })
+        .executeTest(async (mockServer) => {
+          const createdUser = await createUser(newUser, {
+            baseURL: mockServer.url,
+          });
+
+          expect(createdUser).toEqual(
+            expect.objectContaining({
+              id: expect.any(Number),
+              name: 'Jane Smith',
+              email: 'jane@example.com',
+              role: 'admin',
+            }),
+          );
+        });
+    });
+  });
+});
+```
+
+**package.json scripts**:
+
+```json
+{
+  "scripts": {
+    "test:contract": "jest tests/contract --testTimeout=30000",
+    "pact:publish": "pact-broker publish ./pacts --consumer-app-version=$GIT_SHA --broker-base-url=$PACT_BROKER_URL --broker-token=$PACT_BROKER_TOKEN"
+  }
+}
+```
+
+**Key Points**:
+
+- **Consumer-driven**: Frontend defines expectations, not backend
+- **Matchers**: `like`, `string`, `integer` for flexible matching
+- **Provider states**: given() sets up test preconditions
+- **Isolation**: No real backend needed, runs fast
+- **Pact generation**: Automatically creates JSON pact files
+
+---
+
+### Example 2: Pact Provider Verification (Backend validates contracts)
+
+**Context**: Node.js/Express API verifying pacts published by consumers.
+
+**Implementation**:
+
+```typescript
+// tests/contract/user-api.provider.spec.ts
+import { Verifier, VerifierOptions } from '@pact-foundation/pact';
+import { server } from '../../src/server'; // Your Express/Fastify app
+import { seedDatabase, resetDatabase } from '../support/db-helpers';
+
+/**
+ * Provider Verification Test
+ * - Provider (backend API) verifies against published pacts
+ * - State handlers setup test data for each interaction
+ * - Runs before merge to catch breaking changes
+ */
+
+describe('Pact Provider Verification', () => {
+  let serverInstance;
+  const PORT = 3001;
+
+  beforeAll(async () => {
+    // Start provider server
+    serverInstance = server.listen(PORT);
+    console.log(`Provider server running on port ${PORT}`);
+  });
+
+  afterAll(async () => {
+    // Cleanup
+    await serverInstance.close();
+  });
+
+  it('should verify pacts from all consumers', async () => {
+    const opts: VerifierOptions = {
+      // Provider details
+      provider: 'user-api-service',
+      providerBaseUrl: `http://localhost:${PORT}`,
+
+      // Pact Broker configuration
+      pactBrokerUrl: process.env.PACT_BROKER_URL,
+      pactBrokerToken: process.env.PACT_BROKER_TOKEN,
+      publishVerificationResult: process.env.CI === 'true',
+      providerVersion: process.env.GIT_SHA || 'dev',
+
+      // State handlers: Setup provider state for each interaction
+      stateHandlers: {
+        'user with id 1 exists': async () => {
+          await seedDatabase({
+            users: [
+              {
+                id: 1,
+                name: 'John Doe',
+                email: 'john@example.com',
+                role: 'user',
+                createdAt: '2025-01-15T10:00:00Z',
+              },
+            ],
+          });
+          return 'User seeded successfully';
+        },
+
+        'user with id 999 does not exist': async () => {
+          // Ensure user doesn't exist
+          await resetDatabase();
+          return 'Database reset';
+        },
+
+        'no users exist': async () => {
+          await resetDatabase();
+          return 'Database empty';
+        },
+      },
+
+      // Request filters: Add auth headers to all requests
+      requestFilter: (req, res, next) => {
+        // Mock authentication for verification
+        req.headers['x-user-id'] = 'test-user';
+        req.headers['authorization'] = 'Bearer valid-test-token';
+        next();
+      },
+
+      // Timeout for verification
+      timeout: 30000,
+    };
+
+    // Run verification
+    await new Verifier(opts).verifyProvider();
+  });
+});
+```
+
+**CI integration**:
+
+```yaml
+# .github/workflows/pact-provider.yml
+name: Pact Provider Verification
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  verify-contracts:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Start database
+        run: docker-compose up -d postgres
+
+      - name: Run migrations
+        run: npm run db:migrate
+
+      - name: Verify pacts
+        run: npm run test:contract:provider
+        env:
+          PACT_BROKER_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+          GIT_SHA: ${{ github.sha }}
+          CI: true
+
+      - name: Can I Deploy?
+        run: |
+          npx pact-broker can-i-deploy \
+            --pacticipant user-api-service \
+            --version ${{ github.sha }} \
+            --to-environment production
+        env:
+          PACT_BROKER_BASE_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+```
+
+**Key Points**:
+
+- **State handlers**: Setup provider data for each given() state
+- **Request filters**: Add auth/headers for verification requests
+- **CI publishing**: Verification results sent to broker
+- **can-i-deploy**: Safety check before production deployment
+- **Database isolation**: Reset between state handlers
+
+---
+
+### Example 3: Contract CI Integration (Consumer & Provider Workflow)
+
+**Context**: Complete CI/CD workflow coordinating consumer pact publishing and provider verification.
+
+**Implementation**:
+
+```yaml
+# .github/workflows/pact-consumer.yml (Consumer side)
+name: Pact Consumer Tests
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  consumer-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run consumer contract tests
+        run: npm run test:contract
+
+      - name: Publish pacts to broker
+        if: github.ref == 'refs/heads/main' || github.event_name == 'pull_request'
+        run: |
+          npx pact-broker publish ./pacts \
+            --consumer-app-version ${{ github.sha }} \
+            --branch ${{ github.head_ref || github.ref_name }} \
+            --broker-base-url ${{ secrets.PACT_BROKER_URL }} \
+            --broker-token ${{ secrets.PACT_BROKER_TOKEN }}
+
+      - name: Tag pact with environment (main branch only)
+        if: github.ref == 'refs/heads/main'
+        run: |
+          npx pact-broker create-version-tag \
+            --pacticipant user-management-web \
+            --version ${{ github.sha }} \
+            --tag production \
+            --broker-base-url ${{ secrets.PACT_BROKER_URL }} \
+            --broker-token ${{ secrets.PACT_BROKER_TOKEN }}
+```
+
+```yaml
+# .github/workflows/pact-provider.yml (Provider side)
+name: Pact Provider Verification
+on:
+  pull_request:
+  push:
+    branches: [main]
+  repository_dispatch:
+    types: [pact_changed] # Webhook from Pact Broker
+
+jobs:
+  verify-contracts:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Start dependencies
+        run: docker-compose up -d
+
+      - name: Run provider verification
+        run: npm run test:contract:provider
+        env:
+          PACT_BROKER_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+          GIT_SHA: ${{ github.sha }}
+          CI: true
+
+      - name: Publish verification results
+        if: always()
+        run: echo "Verification results published to broker"
+
+      - name: Can I Deploy to Production?
+        if: github.ref == 'refs/heads/main'
+        run: |
+          npx pact-broker can-i-deploy \
+            --pacticipant user-api-service \
+            --version ${{ github.sha }} \
+            --to-environment production \
+            --broker-base-url ${{ secrets.PACT_BROKER_URL }} \
+            --broker-token ${{ secrets.PACT_BROKER_TOKEN }} \
+            --retry-while-unknown 6 \
+            --retry-interval 10
+
+      - name: Record deployment (if can-i-deploy passed)
+        if: success() && github.ref == 'refs/heads/main'
+        run: |
+          npx pact-broker record-deployment \
+            --pacticipant user-api-service \
+            --version ${{ github.sha }} \
+            --environment production \
+            --broker-base-url ${{ secrets.PACT_BROKER_URL }} \
+            --broker-token ${{ secrets.PACT_BROKER_TOKEN }}
+```
+
+**Pact Broker Webhook Configuration**:
+
+```json
+{
+  "events": [
+    {
+      "name": "contract_content_changed"
+    }
+  ],
+  "request": {
+    "method": "POST",
+    "url": "https://api.github.com/repos/your-org/user-api/dispatches",
+    "headers": {
+      "Authorization": "Bearer ${user.githubToken}",
+      "Content-Type": "application/json",
+      "Accept": "application/vnd.github.v3+json"
+    },
+    "body": {
+      "event_type": "pact_changed",
+      "client_payload": {
+        "pact_url": "${pactbroker.pactUrl}",
+        "consumer": "${pactbroker.consumerName}",
+        "provider": "${pactbroker.providerName}"
+      }
+    }
+  }
+}
+```
+
+**Key Points**:
+
+- **Automatic trigger**: Consumer pact changes trigger provider verification via webhook
+- **Branch tracking**: Pacts published per branch for feature testing
+- **can-i-deploy**: Safety gate before production deployment
+- **Record deployment**: Track which version is in each environment
+- **Parallel dev**: Consumer and provider teams work independently
+
+---
+
+### Example 4: Resilience Coverage (Testing Fallback Behavior)
+
+**Context**: Capture timeout, retry, and error handling behavior explicitly in contracts.
+
+**Implementation**:
+
+```typescript
+// tests/contract/user-api-resilience.pact.spec.ts
+import { PactV3, MatchersV3 } from '@pact-foundation/pact';
+import { getUserById, ApiError } from '@/api/user-service';
+
+const { like, string } = MatchersV3;
+
+const provider = new PactV3({
+  consumer: 'user-management-web',
+  provider: 'user-api-service',
+  dir: './pacts',
+});
+
+describe('User API Resilience Contract', () => {
+  /**
+   * Test 500 error handling
+   * Verifies consumer handles server errors gracefully
+   */
+  it('should handle 500 errors with retry logic', async () => {
+    await provider
+      .given('server is experiencing errors')
+      .uponReceiving('a request that returns 500')
+      .withRequest({
+        method: 'GET',
+        path: '/users/1',
+        headers: { Accept: 'application/json' },
+      })
+      .willRespondWith({
+        status: 500,
+        headers: { 'Content-Type': 'application/json' },
+        body: {
+          error: 'Internal server error',
+          code: 'INTERNAL_ERROR',
+          retryable: true,
+        },
+      })
+      .executeTest(async (mockServer) => {
+        // Consumer should retry on 500
+        try {
+          await getUserById(1, {
+            baseURL: mockServer.url,
+            retries: 3,
+            retryDelay: 100,
+          });
+          fail('Should have thrown error after retries');
+        } catch (error) {
+          expect(error).toBeInstanceOf(ApiError);
+          expect((error as ApiError).code).toBe('INTERNAL_ERROR');
+          expect((error as ApiError).retryable).toBe(true);
+        }
+      });
+  });
+
+  /**
+   * Test 429 rate limiting
+   * Verifies consumer respects rate limits
+   */
+  it('should handle 429 rate limit with backoff', async () => {
+    await provider
+      .given('rate limit exceeded for user')
+      .uponReceiving('a request that is rate limited')
+      .withRequest({
+        method: 'GET',
+        path: '/users/1',
+      })
+      .willRespondWith({
+        status: 429,
+        headers: {
+          'Content-Type': 'application/json',
+          'Retry-After': '60', // Retry after 60 seconds
+        },
+        body: {
+          error: 'Too many requests',
+          code: 'RATE_LIMIT_EXCEEDED',
+        },
+      })
+      .executeTest(async (mockServer) => {
+        try {
+          await getUserById(1, {
+            baseURL: mockServer.url,
+            respectRateLimit: true,
+          });
+          fail('Should have thrown rate limit error');
+        } catch (error) {
+          expect(error).toBeInstanceOf(ApiError);
+          expect((error as ApiError).code).toBe('RATE_LIMIT_EXCEEDED');
+          expect((error as ApiError).retryAfter).toBe(60);
+        }
+      });
+  });
+
+  /**
+   * Test timeout handling
+   * Verifies consumer has appropriate timeout configuration
+   */
+  it('should timeout after 10 seconds', async () => {
+    await provider
+      .given('server is slow to respond')
+      .uponReceiving('a request that times out')
+      .withRequest({
+        method: 'GET',
+        path: '/users/1',
+      })
+      .willRespondWith({
+        status: 200,
+        headers: { 'Content-Type': 'application/json' },
+        body: like({ id: 1, name: 'John' }),
+      })
+      .withDelay(15000) // Simulate 15 second delay
+      .executeTest(async (mockServer) => {
+        try {
+          await getUserById(1, {
+            baseURL: mockServer.url,
+            timeout: 10000, // 10 second timeout
+          });
+          fail('Should have timed out');
+        } catch (error) {
+          expect(error).toBeInstanceOf(ApiError);
+          expect((error as ApiError).code).toBe('TIMEOUT');
+        }
+      });
+  });
+
+  /**
+   * Test partial response (optional fields)
+   * Verifies consumer handles missing optional data
+   */
+  it('should handle response with missing optional fields', async () => {
+    await provider
+      .given('user exists with minimal data')
+      .uponReceiving('a request for user with partial data')
+      .withRequest({
+        method: 'GET',
+        path: '/users/1',
+      })
+      .willRespondWith({
+        status: 200,
+        headers: { 'Content-Type': 'application/json' },
+        body: {
+          id: integer(1),
+          name: string('John Doe'),
+          email: string('john@example.com'),
+          // role, createdAt, etc. omitted (optional fields)
+        },
+      })
+      .executeTest(async (mockServer) => {
+        const user = await getUserById(1, { baseURL: mockServer.url });
+
+        // Consumer handles missing optional fields gracefully
+        expect(user.id).toBe(1);
+        expect(user.name).toBe('John Doe');
+        expect(user.role).toBeUndefined(); // Optional field
+        expect(user.createdAt).toBeUndefined(); // Optional field
+      });
+  });
+});
+```
+
+**API client with retry logic**:
+
+```typescript
+// src/api/user-service.ts
+import axios, { AxiosInstance, AxiosRequestConfig } from 'axios';
+
+export class ApiError extends Error {
+  constructor(
+    message: string,
+    public code: string,
+    public retryable: boolean = false,
+    public retryAfter?: number,
+  ) {
+    super(message);
+  }
+}
+
+/**
+ * User API client with retry and error handling
+ */
+export async function getUserById(
+  id: number,
+  config?: AxiosRequestConfig & { retries?: number; retryDelay?: number; respectRateLimit?: boolean },
+): Promise<User> {
+  const { retries = 3, retryDelay = 1000, respectRateLimit = true, ...axiosConfig } = config || {};
+
+  let lastError: Error;
+
+  for (let attempt = 1; attempt <= retries; attempt++) {
+    try {
+      const response = await axios.get(`/users/${id}`, axiosConfig);
+      return response.data;
+    } catch (error: any) {
+      lastError = error;
+
+      // Handle rate limiting
+      if (error.response?.status === 429) {
+        const retryAfter = parseInt(error.response.headers['retry-after'] || '60');
+        throw new ApiError('Too many requests', 'RATE_LIMIT_EXCEEDED', false, retryAfter);
+      }
+
+      // Retry on 500 errors
+      if (error.response?.status === 500 && attempt < retries) {
+        await new Promise((resolve) => setTimeout(resolve, retryDelay * attempt));
+        continue;
+      }
+
+      // Handle 404
+      if (error.response?.status === 404) {
+        throw new ApiError('User not found', 'USER_NOT_FOUND', false);
+      }
+
+      // Handle timeout
+      if (error.code === 'ECONNABORTED') {
+        throw new ApiError('Request timeout', 'TIMEOUT', true);
+      }
+
+      break;
+    }
+  }
+
+  throw new ApiError('Request failed after retries', 'INTERNAL_ERROR', true);
+}
+```
+
+**Key Points**:
+
+- **Resilience contracts**: Timeouts, retries, errors explicitly tested
+- **State handlers**: Provider sets up each test scenario
+- **Error handling**: Consumer validates graceful degradation
+- **Retry logic**: Exponential backoff tested
+- **Optional fields**: Consumer handles partial responses
+
+---
+
+### Example 4: Pact Broker Housekeeping & Lifecycle Management
+
+**Context**: Automated broker maintenance to prevent contract sprawl and noise.
+
+**Implementation**:
+
+```typescript
+// scripts/pact-broker-housekeeping.ts
+/**
+ * Pact Broker Housekeeping Script
+ * - Archive superseded contracts
+ * - Expire unused pacts
+ * - Tag releases for environment tracking
+ */
+
+import { execSync } from 'child_process';
+
+const PACT_BROKER_URL = process.env.PACT_BROKER_URL!;
+const PACT_BROKER_TOKEN = process.env.PACT_BROKER_TOKEN!;
+const PACTICIPANT = 'user-api-service';
+
+/**
+ * Tag release with environment
+ */
+function tagRelease(version: string, environment: 'staging' | 'production') {
+  console.log(`🏷️  Tagging ${PACTICIPANT} v${version} as ${environment}`);
+
+  execSync(
+    `npx pact-broker create-version-tag \
+      --pacticipant ${PACTICIPANT} \
+      --version ${version} \
+      --tag ${environment} \
+      --broker-base-url ${PACT_BROKER_URL} \
+      --broker-token ${PACT_BROKER_TOKEN}`,
+    { stdio: 'inherit' },
+  );
+}
+
+/**
+ * Record deployment to environment
+ */
+function recordDeployment(version: string, environment: 'staging' | 'production') {
+  console.log(`📝 Recording deployment of ${PACTICIPANT} v${version} to ${environment}`);
+
+  execSync(
+    `npx pact-broker record-deployment \
+      --pacticipant ${PACTICIPANT} \
+      --version ${version} \
+      --environment ${environment} \
+      --broker-base-url ${PACT_BROKER_URL} \
+      --broker-token ${PACT_BROKER_TOKEN}`,
+    { stdio: 'inherit' },
+  );
+}
+
+/**
+ * Clean up old pact versions (retention policy)
+ * Keep: last 30 days, all production tags, latest from each branch
+ */
+function cleanupOldPacts() {
+  console.log(`🧹 Cleaning up old pacts for ${PACTICIPANT}`);
+
+  execSync(
+    `npx pact-broker clean \
+      --pacticipant ${PACTICIPANT} \
+      --broker-base-url ${PACT_BROKER_URL} \
+      --broker-token ${PACT_BROKER_TOKEN} \
+      --keep-latest-for-branch 1 \
+      --keep-min-age 30`,
+    { stdio: 'inherit' },
+  );
+}
+
+/**
+ * Check deployment compatibility
+ */
+function canIDeploy(version: string, toEnvironment: string): boolean {
+  console.log(`🔍 Checking if ${PACTICIPANT} v${version} can deploy to ${toEnvironment}`);
+
+  try {
+    execSync(
+      `npx pact-broker can-i-deploy \
+        --pacticipant ${PACTICIPANT} \
+        --version ${version} \
+        --to-environment ${toEnvironment} \
+        --broker-base-url ${PACT_BROKER_URL} \
+        --broker-token ${PACT_BROKER_TOKEN} \
+        --retry-while-unknown 6 \
+        --retry-interval 10`,
+      { stdio: 'inherit' },
+    );
+    return true;
+  } catch (error) {
+    console.error(`❌ Cannot deploy to ${toEnvironment}`);
+    return false;
+  }
+}
+
+/**
+ * Main housekeeping workflow
+ */
+async function main() {
+  const command = process.argv[2];
+  const version = process.argv[3];
+  const environment = process.argv[4] as 'staging' | 'production';
+
+  switch (command) {
+    case 'tag-release':
+      tagRelease(version, environment);
+      break;
+
+    case 'record-deployment':
+      recordDeployment(version, environment);
+      break;
+
+    case 'can-i-deploy':
+      const canDeploy = canIDeploy(version, environment);
+      process.exit(canDeploy ? 0 : 1);
+
+    case 'cleanup':
+      cleanupOldPacts();
+      break;
+
+    default:
+      console.error('Unknown command. Use: tag-release | record-deployment | can-i-deploy | cleanup');
+      process.exit(1);
+  }
+}
+
+main();
+```
+
+**package.json scripts**:
+
+```json
+{
+  "scripts": {
+    "pact:tag": "ts-node scripts/pact-broker-housekeeping.ts tag-release",
+    "pact:record": "ts-node scripts/pact-broker-housekeeping.ts record-deployment",
+    "pact:can-deploy": "ts-node scripts/pact-broker-housekeeping.ts can-i-deploy",
+    "pact:cleanup": "ts-node scripts/pact-broker-housekeeping.ts cleanup"
+  }
+}
+```
+
+**Deployment workflow integration**:
+
+```yaml
+# .github/workflows/deploy-production.yml
+name: Deploy to Production
+on:
+  push:
+    tags:
+      - 'v*'
+
+jobs:
+  verify-contracts:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Check pact compatibility
+        run: npm run pact:can-deploy ${{ github.ref_name }} production
+        env:
+          PACT_BROKER_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+
+  deploy:
+    needs: verify-contracts
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to production
+        run: ./scripts/deploy.sh production
+
+      - name: Record deployment in Pact Broker
+        run: npm run pact:record ${{ github.ref_name }} production
+        env:
+          PACT_BROKER_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+```
+
+**Scheduled cleanup**:
+
+```yaml
+# .github/workflows/pact-housekeeping.yml
+name: Pact Broker Housekeeping
+on:
+  schedule:
+    - cron: '0 2 * * 0' # Weekly on Sunday at 2 AM
+
+jobs:
+  cleanup:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Cleanup old pacts
+        run: npm run pact:cleanup
+        env:
+          PACT_BROKER_URL: ${{ secrets.PACT_BROKER_URL }}
+          PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }}
+```
+
+**Key Points**:
+
+- **Automated tagging**: Releases tagged with environment
+- **Deployment tracking**: Broker knows which version is where
+- **Safety gate**: can-i-deploy blocks incompatible deployments
+- **Retention policy**: Keep recent, production, and branch-latest pacts
+- **Webhook triggers**: Provider verification runs on consumer changes
+
+---
+
+## Contract Testing Checklist
+
+Before implementing contract testing, verify:
+
+- [ ] **Pact Broker setup**: Hosted (Pactflow) or self-hosted broker configured
+- [ ] **Consumer tests**: Generate pacts in CI, publish to broker on merge
+- [ ] **Provider verification**: Runs on PR, verifies all consumer pacts
+- [ ] **State handlers**: Provider implements all given() states
+- [ ] **can-i-deploy**: Blocks deployment if contracts incompatible
+- [ ] **Webhooks configured**: Consumer changes trigger provider verification
+- [ ] **Retention policy**: Old pacts archived (keep 30 days, all production tags)
+- [ ] **Resilience tested**: Timeouts, retries, error codes in contracts
+
+## Integration Points
+
+- Used in workflows: `*automate` (integration test generation), `*ci` (contract CI setup)
+- Related fragments: `test-levels-framework.md`, `ci-burn-in.md`
+- Tools: Pact.js, Pact Broker (Pactflow or self-hosted), Pact CLI
+
+_Source: Pact consumer/provider sample repos, Murat contract testing blog, Pact official documentation_
--- a/src/modules/bmm/testarch/knowledge/data-factories.md
+++ b/src/modules/bmm/testarch/knowledge/data-factories.md
@@ -1,9 +1,500 @@
 # Data Factories and API-First Setup

- Prefer factory functions that accept overrides and return complete objects (`createUser(overrides)`)—never rely on static fixtures.
- Seed state through APIs, tasks, or direct DB helpers before visiting the UI; UI-based setup is for validation only.
- Ensure factories generate parallel-safe identifiers (UUIDs, timestamps) and perform cleanup after each test.
- Centralize factory exports to avoid duplication; version them alongside schema changes to catch drift in reviews.
- When working with shared environments, layer feature toggles or targeted cleanup so factories do not clobber concurrent runs.
+## Principle

-_Source: Murat Testing Philosophy, blog posts on functional helpers and API-first testing._
+Prefer factory functions that accept overrides and return complete objects (`createUser(overrides)`). Seed test state through APIs, tasks, or direct DB helpers before visiting the UI—never via slow UI interactions. UI is for validation only, not setup.
+
+## Rationale
+
+Static fixtures (JSON files, hardcoded objects) create brittle tests that:
+
+- Fail when schemas evolve (missing new required fields)
+- Cause collisions in parallel execution (same user IDs)
+- Hide test intent (what matters for _this_ test?)
+
+Dynamic factories with overrides provide:
+
+- **Parallel safety**: UUIDs and timestamps prevent collisions
+- **Schema evolution**: Defaults adapt to schema changes automatically
+- **Explicit intent**: Overrides show what matters for each test
+- **Speed**: API setup is 10-50x faster than UI
+
+## Pattern Examples
+
+### Example 1: Factory Function with Overrides
+
+**Context**: When creating test data, build factory functions with sensible defaults and explicit overrides. Use `faker` for dynamic values that prevent collisions.
+
+**Implementation**:
+
+```typescript
+// test-utils/factories/user-factory.ts
+import { faker } from '@faker-js/faker';
+
+type User = {
+  id: string;
+  email: string;
+  name: string;
+  role: 'user' | 'admin' | 'moderator';
+  createdAt: Date;
+  isActive: boolean;
+};
+
+export const createUser = (overrides: Partial<User> = {}): User => ({
+  id: faker.string.uuid(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  role: 'user',
+  createdAt: new Date(),
+  isActive: true,
+  ...overrides,
+});
+
+// test-utils/factories/product-factory.ts
+type Product = {
+  id: string;
+  name: string;
+  price: number;
+  stock: number;
+  category: string;
+};
+
+export const createProduct = (overrides: Partial<Product> = {}): Product => ({
+  id: faker.string.uuid(),
+  name: faker.commerce.productName(),
+  price: parseFloat(faker.commerce.price()),
+  stock: faker.number.int({ min: 0, max: 100 }),
+  category: faker.commerce.department(),
+  ...overrides,
+});
+
+// Usage in tests:
+test('admin can delete users', async ({ page, apiRequest }) => {
+  // Default user
+  const user = createUser();
+
+  // Admin user (explicit override shows intent)
+  const admin = createUser({ role: 'admin' });
+
+  // Seed via API (fast!)
+  await apiRequest({ method: 'POST', url: '/api/users', data: user });
+  await apiRequest({ method: 'POST', url: '/api/users', data: admin });
+
+  // Now test UI behavior
+  await page.goto('/admin/users');
+  await page.click(`[data-testid="delete-user-${user.id}"]`);
+  await expect(page.getByText(`User ${user.name} deleted`)).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- `Partial<User>` allows overriding any field without breaking type safety
+- Faker generates unique values—no collisions in parallel tests
+- Override shows test intent: `createUser({ role: 'admin' })` is explicit
+- Factory lives in `test-utils/factories/` for easy reuse
+
+### Example 2: Nested Factory Pattern
+
+**Context**: When testing relationships (orders with users and products), nest factories to create complete object graphs. Control relationship data explicitly.
+
+**Implementation**:
+
+```typescript
+// test-utils/factories/order-factory.ts
+import { createUser } from './user-factory';
+import { createProduct } from './product-factory';
+
+type OrderItem = {
+  product: Product;
+  quantity: number;
+  price: number;
+};
+
+type Order = {
+  id: string;
+  user: User;
+  items: OrderItem[];
+  total: number;
+  status: 'pending' | 'paid' | 'shipped' | 'delivered';
+  createdAt: Date;
+};
+
+export const createOrderItem = (overrides: Partial<OrderItem> = {}): OrderItem => {
+  const product = overrides.product || createProduct();
+  const quantity = overrides.quantity || faker.number.int({ min: 1, max: 5 });
+
+  return {
+    product,
+    quantity,
+    price: product.price * quantity,
+    ...overrides,
+  };
+};
+
+export const createOrder = (overrides: Partial<Order> = {}): Order => {
+  const items = overrides.items || [createOrderItem(), createOrderItem()];
+  const total = items.reduce((sum, item) => sum + item.price, 0);
+
+  return {
+    id: faker.string.uuid(),
+    user: overrides.user || createUser(),
+    items,
+    total,
+    status: 'pending',
+    createdAt: new Date(),
+    ...overrides,
+  };
+};
+
+// Usage in tests:
+test('user can view order details', async ({ page, apiRequest }) => {
+  const user = createUser({ email: 'test@example.com' });
+  const product1 = createProduct({ name: 'Widget A', price: 10.0 });
+  const product2 = createProduct({ name: 'Widget B', price: 15.0 });
+
+  // Explicit relationships
+  const order = createOrder({
+    user,
+    items: [
+      createOrderItem({ product: product1, quantity: 2 }), // $20
+      createOrderItem({ product: product2, quantity: 1 }), // $15
+    ],
+  });
+
+  // Seed via API
+  await apiRequest({ method: 'POST', url: '/api/users', data: user });
+  await apiRequest({ method: 'POST', url: '/api/products', data: product1 });
+  await apiRequest({ method: 'POST', url: '/api/products', data: product2 });
+  await apiRequest({ method: 'POST', url: '/api/orders', data: order });
+
+  // Test UI
+  await page.goto(`/orders/${order.id}`);
+  await expect(page.getByText('Widget A x 2')).toBeVisible();
+  await expect(page.getByText('Widget B x 1')).toBeVisible();
+  await expect(page.getByText('Total: $35.00')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Nested factories handle relationships (order → user, order → products)
+- Overrides cascade: provide custom user/products or use defaults
+- Calculated fields (total) derived automatically from nested data
+- Explicit relationships make test data clear and maintainable
+
+### Example 3: Factory with API Seeding
+
+**Context**: When tests need data setup, always use API calls or database tasks—never UI navigation. Wrap factory usage with seeding utilities for clean test setup.
+
+**Implementation**:
+
+```typescript
+// playwright/support/helpers/seed-helpers.ts
+import { APIRequestContext } from '@playwright/test';
+import { User, createUser } from '../../test-utils/factories/user-factory';
+import { Product, createProduct } from '../../test-utils/factories/product-factory';
+
+export async function seedUser(request: APIRequestContext, overrides: Partial<User> = {}): Promise<User> {
+  const user = createUser(overrides);
+
+  const response = await request.post('/api/users', {
+    data: user,
+  });
+
+  if (!response.ok()) {
+    throw new Error(`Failed to seed user: ${response.status()}`);
+  }
+
+  return user;
+}
+
+export async function seedProduct(request: APIRequestContext, overrides: Partial<Product> = {}): Promise<Product> {
+  const product = createProduct(overrides);
+
+  const response = await request.post('/api/products', {
+    data: product,
+  });
+
+  if (!response.ok()) {
+    throw new Error(`Failed to seed product: ${response.status()}`);
+  }
+
+  return product;
+}
+
+// Playwright globalSetup for shared data
+// playwright/support/global-setup.ts
+import { chromium, FullConfig } from '@playwright/test';
+import { seedUser } from './helpers/seed-helpers';
+
+async function globalSetup(config: FullConfig) {
+  const browser = await chromium.launch();
+  const page = await browser.newPage();
+  const context = page.context();
+
+  // Seed admin user for all tests
+  const admin = await seedUser(context.request, {
+    email: 'admin@example.com',
+    role: 'admin',
+  });
+
+  // Save auth state for reuse
+  await context.storageState({ path: 'playwright/.auth/admin.json' });
+
+  await browser.close();
+}
+
+export default globalSetup;
+
+// Cypress equivalent with cy.task
+// cypress/support/tasks.ts
+export const seedDatabase = async (entity: string, data: unknown) => {
+  // Direct database insert or API call
+  if (entity === 'users') {
+    await db.users.create(data);
+  }
+  return null;
+};
+
+// Usage in Cypress tests:
+beforeEach(() => {
+  const user = createUser({ email: 'test@example.com' });
+  cy.task('db:seed', { entity: 'users', data: user });
+});
+```
+
+**Key Points**:
+
+- API seeding is 10-50x faster than UI-based setup
+- `globalSetup` seeds shared data once (e.g., admin user)
+- Per-test seeding uses `seedUser()` helpers for isolation
+- Cypress `cy.task` allows direct database access for speed
+
+### Example 4: Anti-Pattern - Hardcoded Test Data
+
+**Problem**:
+
+```typescript
+// ❌ BAD: Hardcoded test data
+test('user can login', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', 'test@test.com'); // Hardcoded
+  await page.fill('[data-testid="password"]', 'password123'); // Hardcoded
+  await page.click('[data-testid="submit"]');
+
+  // What if this user already exists? Test fails in parallel runs.
+  // What if schema adds required fields? Test breaks.
+});
+
+// ❌ BAD: Static JSON fixtures
+// fixtures/users.json
+{
+  "users": [
+    { "id": 1, "email": "user1@test.com", "name": "User 1" },
+    { "id": 2, "email": "user2@test.com", "name": "User 2" }
+  ]
+}
+
+test('admin can delete user', async ({ page }) => {
+  const users = require('../fixtures/users.json');
+  // Brittle: IDs collide in parallel, schema drift breaks tests
+});
+```
+
+**Why It Fails**:
+
+- **Parallel collisions**: Hardcoded IDs (`id: 1`, `email: 'test@test.com'`) cause failures when tests run concurrently
+- **Schema drift**: Adding required fields (`phoneNumber`, `address`) breaks all tests using fixtures
+- **Hidden intent**: Does this test need `email: 'test@test.com'` specifically, or any email?
+- **Slow setup**: UI-based data creation is 10-50x slower than API
+
+**Better Approach**: Use factories
+
+```typescript
+// ✅ GOOD: Factory-based data
+test('user can login', async ({ page, apiRequest }) => {
+  const user = createUser({ email: 'unique@example.com', password: 'secure123' });
+
+  // Seed via API (fast, parallel-safe)
+  await apiRequest({ method: 'POST', url: '/api/users', data: user });
+
+  // Test UI
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', user.email);
+  await page.fill('[data-testid="password"]', user.password);
+  await page.click('[data-testid="submit"]');
+
+  await expect(page).toHaveURL('/dashboard');
+});
+
+// ✅ GOOD: Factories adapt to schema changes automatically
+// When `phoneNumber` becomes required, update factory once:
+export const createUser = (overrides: Partial<User> = {}): User => ({
+  id: faker.string.uuid(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  phoneNumber: faker.phone.number(), // NEW field, all tests get it automatically
+  role: 'user',
+  ...overrides,
+});
+```
+
+**Key Points**:
+
+- Factories generate unique, parallel-safe data
+- Schema evolution handled in one place (factory), not every test
+- Test intent explicit via overrides
+- API seeding is fast and reliable
+
+### Example 5: Factory Composition
+
+**Context**: When building specialized factories, compose simpler factories instead of duplicating logic. Layer overrides for specific test scenarios.
+
+**Implementation**:
+
+```typescript
+// test-utils/factories/user-factory.ts (base)
+export const createUser = (overrides: Partial<User> = {}): User => ({
+  id: faker.string.uuid(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  role: 'user',
+  createdAt: new Date(),
+  isActive: true,
+  ...overrides,
+});
+
+// Compose specialized factories
+export const createAdminUser = (overrides: Partial<User> = {}): User => createUser({ role: 'admin', ...overrides });
+
+export const createModeratorUser = (overrides: Partial<User> = {}): User => createUser({ role: 'moderator', ...overrides });
+
+export const createInactiveUser = (overrides: Partial<User> = {}): User => createUser({ isActive: false, ...overrides });
+
+// Account-level factories with feature flags
+type Account = {
+  id: string;
+  owner: User;
+  plan: 'free' | 'pro' | 'enterprise';
+  features: string[];
+  maxUsers: number;
+};
+
+export const createAccount = (overrides: Partial<Account> = {}): Account => ({
+  id: faker.string.uuid(),
+  owner: overrides.owner || createUser(),
+  plan: 'free',
+  features: [],
+  maxUsers: 1,
+  ...overrides,
+});
+
+export const createProAccount = (overrides: Partial<Account> = {}): Account =>
+  createAccount({
+    plan: 'pro',
+    features: ['advanced-analytics', 'priority-support'],
+    maxUsers: 10,
+    ...overrides,
+  });
+
+export const createEnterpriseAccount = (overrides: Partial<Account> = {}): Account =>
+  createAccount({
+    plan: 'enterprise',
+    features: ['advanced-analytics', 'priority-support', 'sso', 'audit-logs'],
+    maxUsers: 100,
+    ...overrides,
+  });
+
+// Usage in tests:
+test('pro accounts can access analytics', async ({ page, apiRequest }) => {
+  const admin = createAdminUser({ email: 'admin@company.com' });
+  const account = createProAccount({ owner: admin });
+
+  await apiRequest({ method: 'POST', url: '/api/users', data: admin });
+  await apiRequest({ method: 'POST', url: '/api/accounts', data: account });
+
+  await page.goto('/analytics');
+  await expect(page.getByText('Advanced Analytics')).toBeVisible();
+});
+
+test('free accounts cannot access analytics', async ({ page, apiRequest }) => {
+  const user = createUser({ email: 'user@company.com' });
+  const account = createAccount({ owner: user }); // Defaults to free plan
+
+  await apiRequest({ method: 'POST', url: '/api/users', data: user });
+  await apiRequest({ method: 'POST', url: '/api/accounts', data: account });
+
+  await page.goto('/analytics');
+  await expect(page.getByText('Upgrade to Pro')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Compose specialized factories from base factories (`createAdminUser` → `createUser`)
+- Defaults cascade: `createProAccount` sets plan + features automatically
+- Still allow overrides: `createProAccount({ maxUsers: 50 })` works
+- Test intent clear: `createProAccount()` vs `createAccount({ plan: 'pro', features: [...] })`
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (test generation), `*automate` (test expansion), `*framework` (factory setup)
+- **Related fragments**:
+  - `fixture-architecture.md` - Pure functions and fixtures for factory integration
+  - `network-first.md` - API-first setup patterns
+  - `test-quality.md` - Parallel-safe, deterministic test design
+
+## Cleanup Strategy
+
+Ensure factories work with cleanup patterns:
+
+```typescript
+// Track created IDs for cleanup
+const createdUsers: string[] = [];
+
+afterEach(async ({ apiRequest }) => {
+  // Clean up all users created during test
+  for (const userId of createdUsers) {
+    await apiRequest({ method: 'DELETE', url: `/api/users/${userId}` });
+  }
+  createdUsers.length = 0;
+});
+
+test('user registration flow', async ({ page, apiRequest }) => {
+  const user = createUser();
+  createdUsers.push(user.id);
+
+  await apiRequest({ method: 'POST', url: '/api/users', data: user });
+  // ... test logic
+});
+```
+
+## Feature Flag Integration
+
+When working with feature flags, layer them into factories:
+
+```typescript
+export const createUserWithFlags = (
+  overrides: Partial<User> = {},
+  flags: Record<string, boolean> = {},
+): User & { flags: Record<string, boolean> } => ({
+  ...createUser(overrides),
+  flags: {
+    'new-dashboard': false,
+    'beta-features': false,
+    ...flags,
+  },
+});
+
+// Usage:
+const user = createUserWithFlags(
+  { email: 'test@example.com' },
+  {
+    'new-dashboard': true,
+    'beta-features': true,
+  },
+);
+```
+
+_Source: Murat Testing Philosophy (lines 94-120), API-first testing patterns, faker.js documentation._
--- a/src/modules/bmm/testarch/knowledge/email-auth.md
+++ b/src/modules/bmm/testarch/knowledge/email-auth.md
@@ -1,9 +1,721 @@
 # Email-Based Authentication Testing

- Use services like Mailosaur or in-house SMTP capture; extract magic links via regex or HTML parsing helpers.
- Preserve browser storage (local/session) when processing links—restore state before visiting the authenticated page.
- Cache email payloads with `cypress-data-session` or equivalent so retries don’t exhaust inbox quotas.
- Cover negative cases: expired links, reused links, and multiple requests in rapid succession.
- Ensure the workflow logs the email ID and link for troubleshooting, but scrub PII before committing artifacts.
+## Principle

-_Source: Email authentication blog, Murat testing toolkit._
+Email-based authentication (magic links, one-time codes, passwordless login) requires specialized testing with email capture services like Mailosaur or Ethereal. Extract magic links via HTML parsing or use built-in link extraction, preserve browser storage (local/session/cookies) when processing links, cache email payloads to avoid exhausting inbox quotas, and cover negative cases (expired links, reused links, multiple rapid requests). Log email IDs and links for troubleshooting, but scrub PII before committing artifacts.
+
+## Rationale
+
+Email authentication introduces unique challenges: asynchronous email delivery, quota limits (AWS Cognito: 50/day), cost per email, and complex state management (session preservation across link clicks). Without proper patterns, tests become slow (wait for email each time), expensive (quota exhaustion), and brittle (timing issues, missing state). Using email capture services + session caching + state preservation patterns makes email auth tests fast, reliable, and cost-effective.
+
+## Pattern Examples
+
+### Example 1: Magic Link Extraction with Mailosaur
+
+**Context**: Passwordless login flow where user receives magic link via email, clicks it, and is authenticated.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/magic-link-auth.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Magic Link Authentication Flow
+ * 1. User enters email
+ * 2. Backend sends magic link
+ * 3. Test retrieves email via Mailosaur
+ * 4. Extract and visit magic link
+ * 5. Verify user is authenticated
+ */
+
+// Mailosaur configuration
+const MAILOSAUR_API_KEY = process.env.MAILOSAUR_API_KEY!;
+const MAILOSAUR_SERVER_ID = process.env.MAILOSAUR_SERVER_ID!;
+
+/**
+ * Extract href from HTML email body
+ * DOMParser provides XML/HTML parsing in Node.js
+ */
+function extractMagicLink(htmlString: string): string | null {
+  const { JSDOM } = require('jsdom');
+  const dom = new JSDOM(htmlString);
+  const link = dom.window.document.querySelector('#magic-link-button');
+  return link ? (link as HTMLAnchorElement).href : null;
+}
+
+/**
+ * Alternative: Use Mailosaur's built-in link extraction
+ * Mailosaur automatically parses links - no regex needed!
+ */
+async function getMagicLinkFromEmail(email: string): Promise<string> {
+  const MailosaurClient = require('mailosaur');
+  const mailosaur = new MailosaurClient(MAILOSAUR_API_KEY);
+
+  // Wait for email (timeout: 30 seconds)
+  const message = await mailosaur.messages.get(
+    MAILOSAUR_SERVER_ID,
+    {
+      sentTo: email,
+    },
+    {
+      timeout: 30000, // 30 seconds
+    },
+  );
+
+  // Mailosaur extracts links automatically - no parsing needed!
+  const magicLink = message.html?.links?.[0]?.href;
+
+  if (!magicLink) {
+    throw new Error(`Magic link not found in email to ${email}`);
+  }
+
+  console.log(`📧 Email received. Magic link extracted: ${magicLink}`);
+  return magicLink;
+}
+
+test.describe('Magic Link Authentication', () => {
+  test('should authenticate user via magic link', async ({ page, context }) => {
+    // Arrange: Generate unique test email
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Act: Request magic link
+    await page.goto('/login');
+    await page.getByTestId('email-input').fill(testEmail);
+    await page.getByTestId('send-magic-link').click();
+
+    // Assert: Success message
+    await expect(page.getByTestId('check-email-message')).toBeVisible();
+    await expect(page.getByTestId('check-email-message')).toContainText('Check your email');
+
+    // Retrieve magic link from email
+    const magicLink = await getMagicLinkFromEmail(testEmail);
+
+    // Visit magic link
+    await page.goto(magicLink);
+
+    // Assert: User is authenticated
+    await expect(page.getByTestId('user-menu')).toBeVisible();
+    await expect(page.getByTestId('user-email')).toContainText(testEmail);
+
+    // Verify session storage preserved
+    const localStorage = await page.evaluate(() => JSON.stringify(window.localStorage));
+    expect(localStorage).toContain('authToken');
+  });
+
+  test('should handle expired magic link', async ({ page }) => {
+    // Use pre-expired link (older than 15 minutes)
+    const expiredLink = 'http://localhost:3000/auth/verify?token=expired-token-123';
+
+    await page.goto(expiredLink);
+
+    // Assert: Error message displayed
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText('link has expired');
+
+    // Assert: User NOT authenticated
+    await expect(page.getByTestId('user-menu')).not.toBeVisible();
+  });
+
+  test('should prevent reusing magic link', async ({ page }) => {
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Request magic link
+    await page.goto('/login');
+    await page.getByTestId('email-input').fill(testEmail);
+    await page.getByTestId('send-magic-link').click();
+
+    const magicLink = await getMagicLinkFromEmail(testEmail);
+
+    // Visit link first time (success)
+    await page.goto(magicLink);
+    await expect(page.getByTestId('user-menu')).toBeVisible();
+
+    // Sign out
+    await page.getByTestId('sign-out').click();
+
+    // Try to reuse same link (should fail)
+    await page.goto(magicLink);
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText('link has already been used');
+  });
+});
+```
+
+**Cypress equivalent with Mailosaur plugin**:
+
+```javascript
+// cypress/e2e/magic-link-auth.cy.ts
+describe('Magic Link Authentication', () => {
+  it('should authenticate user via magic link', () => {
+    const serverId = Cypress.env('MAILOSAUR_SERVERID');
+    const randomId = Cypress._.random(1e6);
+    const testEmail = `user-${randomId}@${serverId}.mailosaur.net`;
+
+    // Request magic link
+    cy.visit('/login');
+    cy.get('[data-cy="email-input"]').type(testEmail);
+    cy.get('[data-cy="send-magic-link"]').click();
+    cy.get('[data-cy="check-email-message"]').should('be.visible');
+
+    // Retrieve and visit magic link
+    cy.mailosaurGetMessage(serverId, { sentTo: testEmail })
+      .its('html.links.0.href') // Mailosaur extracts links automatically!
+      .should('exist')
+      .then((magicLink) => {
+        cy.log(`Magic link: ${magicLink}`);
+        cy.visit(magicLink);
+      });
+
+    // Verify authenticated
+    cy.get('[data-cy="user-menu"]').should('be.visible');
+    cy.get('[data-cy="user-email"]').should('contain', testEmail);
+  });
+});
+```
+
+**Key Points**:
+
+- **Mailosaur auto-extraction**: `html.links[0].href` or `html.codes[0].value`
+- **Unique emails**: Random ID prevents collisions
+- **Negative testing**: Expired and reused links tested
+- **State verification**: localStorage/session checked
+- **Fast email retrieval**: 30 second timeout typical
+
+---
+
+### Example 2: State Preservation Pattern with cy.session / Playwright storageState
+
+**Context**: Cache authenticated session to avoid requesting magic link on every test.
+
+**Implementation**:
+
+```typescript
+// playwright/fixtures/email-auth-fixture.ts
+import { test as base } from '@playwright/test';
+import { getMagicLinkFromEmail } from '../support/mailosaur-helpers';
+
+type EmailAuthFixture = {
+  authenticatedUser: { email: string; token: string };
+};
+
+export const test = base.extend<EmailAuthFixture>({
+  authenticatedUser: async ({ page, context }, use) => {
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${process.env.MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Check if we have cached auth state for this email
+    const storageStatePath = `./test-results/auth-state-${testEmail}.json`;
+
+    try {
+      // Try to reuse existing session
+      await context.storageState({ path: storageStatePath });
+      await page.goto('/dashboard');
+
+      // Validate session is still valid
+      const isAuthenticated = await page.getByTestId('user-menu').isVisible({ timeout: 2000 });
+
+      if (isAuthenticated) {
+        console.log(`✅ Reusing cached session for ${testEmail}`);
+        await use({ email: testEmail, token: 'cached' });
+        return;
+      }
+    } catch (error) {
+      console.log(`📧 No cached session, requesting magic link for ${testEmail}`);
+    }
+
+    // Request new magic link
+    await page.goto('/login');
+    await page.getByTestId('email-input').fill(testEmail);
+    await page.getByTestId('send-magic-link').click();
+
+    // Get magic link from email
+    const magicLink = await getMagicLinkFromEmail(testEmail);
+
+    // Visit link and authenticate
+    await page.goto(magicLink);
+    await expect(page.getByTestId('user-menu')).toBeVisible();
+
+    // Extract auth token from localStorage
+    const authToken = await page.evaluate(() => localStorage.getItem('authToken'));
+
+    // Save session state for reuse
+    await context.storageState({ path: storageStatePath });
+
+    console.log(`💾 Cached session for ${testEmail}`);
+
+    await use({ email: testEmail, token: authToken || '' });
+  },
+});
+```
+
+**Cypress equivalent with cy.session + data-session**:
+
+```javascript
+// cypress/support/commands/email-auth.js
+import { dataSession } from 'cypress-data-session';
+
+/**
+ * Authenticate via magic link with session caching
+ * - First run: Requests email, extracts link, authenticates
+ * - Subsequent runs: Reuses cached session (no email)
+ */
+Cypress.Commands.add('authViaMagicLink', (email) => {
+  return dataSession({
+    name: `magic-link-${email}`,
+
+    // First-time setup: Request and process magic link
+    setup: () => {
+      cy.visit('/login');
+      cy.get('[data-cy="email-input"]').type(email);
+      cy.get('[data-cy="send-magic-link"]').click();
+
+      // Get magic link from Mailosaur
+      cy.mailosaurGetMessage(Cypress.env('MAILOSAUR_SERVERID'), {
+        sentTo: email,
+      })
+        .its('html.links.0.href')
+        .should('exist')
+        .then((magicLink) => {
+          cy.visit(magicLink);
+        });
+
+      // Wait for authentication
+      cy.get('[data-cy="user-menu"]', { timeout: 10000 }).should('be.visible');
+
+      // Preserve authentication state
+      return cy.getAllLocalStorage().then((storage) => {
+        return { storage, email };
+      });
+    },
+
+    // Validate cached session is still valid
+    validate: (cached) => {
+      return cy.wrap(Boolean(cached?.storage));
+    },
+
+    // Recreate session from cache (no email needed)
+    recreate: (cached) => {
+      // Restore localStorage
+      cy.setLocalStorage(cached.storage);
+      cy.visit('/dashboard');
+      cy.get('[data-cy="user-menu"]', { timeout: 5000 }).should('be.visible');
+    },
+
+    shareAcrossSpecs: true, // Share session across all tests
+  });
+});
+```
+
+**Usage in tests**:
+
+```javascript
+// cypress/e2e/dashboard.cy.ts
+describe('Dashboard', () => {
+  const serverId = Cypress.env('MAILOSAUR_SERVERID');
+  const testEmail = `test-user@${serverId}.mailosaur.net`;
+
+  beforeEach(() => {
+    // First test: Requests magic link
+    // Subsequent tests: Reuses cached session (no email!)
+    cy.authViaMagicLink(testEmail);
+  });
+
+  it('should display user dashboard', () => {
+    cy.get('[data-cy="dashboard-content"]').should('be.visible');
+  });
+
+  it('should show user profile', () => {
+    cy.get('[data-cy="user-email"]').should('contain', testEmail);
+  });
+
+  // Both tests share same session - only 1 email consumed!
+});
+```
+
+**Key Points**:
+
+- **Session caching**: First test requests email, rest reuse session
+- **State preservation**: localStorage/cookies saved and restored
+- **Validation**: Check cached session is still valid
+- **Quota optimization**: Massive reduction in email consumption
+- **Fast tests**: Cached auth takes seconds vs. minutes
+
+---
+
+### Example 3: Negative Flow Tests (Expired, Invalid, Reused Links)
+
+**Context**: Comprehensive negative testing for email authentication edge cases.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/email-auth-negative.spec.ts
+import { test, expect } from '@playwright/test';
+import { getMagicLinkFromEmail } from '../support/mailosaur-helpers';
+
+const MAILOSAUR_SERVER_ID = process.env.MAILOSAUR_SERVER_ID!;
+
+test.describe('Email Auth Negative Flows', () => {
+  test('should reject expired magic link', async ({ page }) => {
+    // Generate expired link (simulate 24 hours ago)
+    const expiredToken = Buffer.from(
+      JSON.stringify({
+        email: 'test@example.com',
+        exp: Date.now() - 24 * 60 * 60 * 1000, // 24 hours ago
+      }),
+    ).toString('base64');
+
+    const expiredLink = `http://localhost:3000/auth/verify?token=${expiredToken}`;
+
+    // Visit expired link
+    await page.goto(expiredLink);
+
+    // Assert: Error displayed
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText(/link.*expired|expired.*link/i);
+
+    // Assert: Link to request new one
+    await expect(page.getByTestId('request-new-link')).toBeVisible();
+
+    // Assert: User NOT authenticated
+    await expect(page.getByTestId('user-menu')).not.toBeVisible();
+  });
+
+  test('should reject invalid magic link token', async ({ page }) => {
+    const invalidLink = 'http://localhost:3000/auth/verify?token=invalid-garbage';
+
+    await page.goto(invalidLink);
+
+    // Assert: Error displayed
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText(/invalid.*link|link.*invalid/i);
+
+    // Assert: User not authenticated
+    await expect(page.getByTestId('user-menu')).not.toBeVisible();
+  });
+
+  test('should reject already-used magic link', async ({ page, context }) => {
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Request magic link
+    await page.goto('/login');
+    await page.getByTestId('email-input').fill(testEmail);
+    await page.getByTestId('send-magic-link').click();
+
+    const magicLink = await getMagicLinkFromEmail(testEmail);
+
+    // Visit link FIRST time (success)
+    await page.goto(magicLink);
+    await expect(page.getByTestId('user-menu')).toBeVisible();
+
+    // Sign out
+    await page.getByTestId('user-menu').click();
+    await page.getByTestId('sign-out').click();
+    await expect(page.getByTestId('user-menu')).not.toBeVisible();
+
+    // Try to reuse SAME link (should fail)
+    await page.goto(magicLink);
+
+    // Assert: Link already used error
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText(/already.*used|link.*used/i);
+
+    // Assert: User not authenticated
+    await expect(page.getByTestId('user-menu')).not.toBeVisible();
+  });
+
+  test('should handle rapid successive link requests', async ({ page }) => {
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Request magic link 3 times rapidly
+    for (let i = 0; i < 3; i++) {
+      await page.goto('/login');
+      await page.getByTestId('email-input').fill(testEmail);
+      await page.getByTestId('send-magic-link').click();
+      await expect(page.getByTestId('check-email-message')).toBeVisible();
+    }
+
+    // Only the LATEST link should work
+    const MailosaurClient = require('mailosaur');
+    const mailosaur = new MailosaurClient(process.env.MAILOSAUR_API_KEY);
+
+    const messages = await mailosaur.messages.list(MAILOSAUR_SERVER_ID, {
+      sentTo: testEmail,
+    });
+
+    // Should receive 3 emails
+    expect(messages.items.length).toBeGreaterThanOrEqual(3);
+
+    // Get the LATEST magic link
+    const latestMessage = messages.items[0]; // Most recent first
+    const latestLink = latestMessage.html.links[0].href;
+
+    // Latest link works
+    await page.goto(latestLink);
+    await expect(page.getByTestId('user-menu')).toBeVisible();
+
+    // Older links should NOT work (if backend invalidates previous)
+    await page.getByTestId('sign-out').click();
+    const olderLink = messages.items[1].html.links[0].href;
+
+    await page.goto(olderLink);
+    await expect(page.getByTestId('error-message')).toBeVisible();
+  });
+
+  test('should rate-limit excessive magic link requests', async ({ page }) => {
+    const randomId = Math.floor(Math.random() * 1000000);
+    const testEmail = `user-${randomId}@${MAILOSAUR_SERVER_ID}.mailosaur.net`;
+
+    // Request magic link 10 times rapidly (should hit rate limit)
+    for (let i = 0; i < 10; i++) {
+      await page.goto('/login');
+      await page.getByTestId('email-input').fill(testEmail);
+      await page.getByTestId('send-magic-link').click();
+
+      // After N requests, should show rate limit error
+      const errorVisible = await page
+        .getByTestId('rate-limit-error')
+        .isVisible({ timeout: 1000 })
+        .catch(() => false);
+
+      if (errorVisible) {
+        console.log(`Rate limit hit after ${i + 1} requests`);
+        await expect(page.getByTestId('rate-limit-error')).toContainText(/too many.*requests|rate.*limit/i);
+        return;
+      }
+    }
+
+    // If no rate limit after 10 requests, log warning
+    console.warn('⚠️  No rate limit detected after 10 requests');
+  });
+});
+```
+
+**Key Points**:
+
+- **Expired links**: Test 24+ hour old tokens
+- **Invalid tokens**: Malformed or garbage tokens rejected
+- **Reuse prevention**: Same link can't be used twice
+- **Rapid requests**: Multiple requests handled gracefully
+- **Rate limiting**: Excessive requests blocked
+
+---
+
+### Example 4: Caching Strategy with cypress-data-session / Playwright Projects
+
+**Context**: Minimize email consumption by sharing authentication state across tests and specs.
+
+**Implementation**:
+
+```javascript
+// cypress/support/commands/register-and-sign-in.js
+import { dataSession } from 'cypress-data-session';
+
+/**
+ * Email Authentication Caching Strategy
+ * - One email per test run (not per spec, not per test)
+ * - First spec: Full registration flow (form → email → code → sign in)
+ * - Subsequent specs: Only sign in (reuse user)
+ * - Subsequent tests in same spec: Session already active (no sign in)
+ */
+
+// Helper: Fill registration form
+function fillRegistrationForm({ fullName, userName, email, password }) {
+  cy.intercept('POST', 'https://cognito-idp*').as('cognito');
+  cy.contains('Register').click();
+  cy.get('#reg-dialog-form').should('be.visible');
+  cy.get('#first-name').type(fullName, { delay: 0 });
+  cy.get('#last-name').type(lastName, { delay: 0 });
+  cy.get('#email').type(email, { delay: 0 });
+  cy.get('#username').type(userName, { delay: 0 });
+  cy.get('#password').type(password, { delay: 0 });
+  cy.contains('button', 'Create an account').click();
+  cy.wait('@cognito').its('response.statusCode').should('equal', 200);
+}
+
+// Helper: Confirm registration with email code
+function confirmRegistration(email) {
+  return cy
+    .mailosaurGetMessage(Cypress.env('MAILOSAUR_SERVERID'), { sentTo: email })
+    .its('html.codes.0.value') // Mailosaur auto-extracts codes!
+    .then((code) => {
+      cy.intercept('POST', 'https://cognito-idp*').as('cognito');
+      cy.get('#verification-code').type(code, { delay: 0 });
+      cy.contains('button', 'Confirm registration').click();
+      cy.wait('@cognito');
+      cy.contains('You are now registered!').should('be.visible');
+      cy.contains('button', /ok/i).click();
+      return cy.wrap(code); // Return code for reference
+    });
+}
+
+// Helper: Full registration (form + email)
+function register({ fullName, userName, email, password }) {
+  fillRegistrationForm({ fullName, userName, email, password });
+  return confirmRegistration(email);
+}
+
+// Helper: Sign in
+function signIn({ userName, password }) {
+  cy.intercept('POST', 'https://cognito-idp*').as('cognito');
+  cy.contains('Sign in').click();
+  cy.get('#sign-in-username').type(userName, { delay: 0 });
+  cy.get('#sign-in-password').type(password, { delay: 0 });
+  cy.contains('button', 'Sign in').click();
+  cy.wait('@cognito');
+  cy.contains('Sign out').should('be.visible');
+}
+
+/**
+ * Register and sign in with email caching
+ * ONE EMAIL PER MACHINE (cypress run or cypress open)
+ */
+Cypress.Commands.add('registerAndSignIn', ({ fullName, userName, email, password }) => {
+  return dataSession({
+    name: email, // Unique session per email
+
+    // First time: Full registration (form → email → code)
+    init: () => register({ fullName, userName, email, password }),
+
+    // Subsequent specs: Just check email exists (code already used)
+    setup: () => confirmRegistration(email),
+
+    // Always runs after init/setup: Sign in
+    recreate: () => signIn({ userName, password }),
+
+    // Share across ALL specs (one email for entire test run)
+    shareAcrossSpecs: true,
+  });
+});
+```
+
+**Usage across multiple specs**:
+
+```javascript
+// cypress/e2e/place-order.cy.ts
+describe('Place Order', () => {
+  beforeEach(() => {
+    cy.visit('/');
+    cy.registerAndSignIn({
+      fullName: Cypress.env('fullName'), // From cypress.config
+      userName: Cypress.env('userName'),
+      email: Cypress.env('email'), // SAME email across all specs
+      password: Cypress.env('password'),
+    });
+  });
+
+  it('should place order', () => {
+    /* ... */
+  });
+  it('should view order history', () => {
+    /* ... */
+  });
+});
+
+// cypress/e2e/profile.cy.ts
+describe('User Profile', () => {
+  beforeEach(() => {
+    cy.visit('/');
+    cy.registerAndSignIn({
+      fullName: Cypress.env('fullName'),
+      userName: Cypress.env('userName'),
+      email: Cypress.env('email'), // SAME email - no new email sent!
+      password: Cypress.env('password'),
+    });
+  });
+
+  it('should update profile', () => {
+    /* ... */
+  });
+});
+```
+
+**Playwright equivalent with storageState**:
+
+```typescript
+// playwright.config.ts
+import { defineConfig } from '@playwright/test';
+
+export default defineConfig({
+  projects: [
+    {
+      name: 'setup',
+      testMatch: /global-setup\.ts/,
+    },
+    {
+      name: 'authenticated',
+      testMatch: /.*\.spec\.ts/,
+      dependencies: ['setup'],
+      use: {
+        storageState: '.auth/user-session.json', // Reuse auth state
+      },
+    },
+  ],
+});
+```
+
+```typescript
+// tests/global-setup.ts (runs once)
+import { test as setup } from '@playwright/test';
+import { getMagicLinkFromEmail } from './support/mailosaur-helpers';
+
+const authFile = '.auth/user-session.json';
+
+setup('authenticate via magic link', async ({ page }) => {
+  const testEmail = process.env.TEST_USER_EMAIL!;
+
+  // Request magic link
+  await page.goto('/login');
+  await page.getByTestId('email-input').fill(testEmail);
+  await page.getByTestId('send-magic-link').click();
+
+  // Get and visit magic link
+  const magicLink = await getMagicLinkFromEmail(testEmail);
+  await page.goto(magicLink);
+
+  // Verify authenticated
+  await expect(page.getByTestId('user-menu')).toBeVisible();
+
+  // Save authenticated state (ONE TIME for all tests)
+  await page.context().storageState({ path: authFile });
+
+  console.log('✅ Authentication state saved to', authFile);
+});
+```
+
+**Key Points**:
+
+- **One email per run**: Global setup authenticates once
+- **State reuse**: All tests use cached storageState
+- **cypress-data-session**: Intelligently manages cache lifecycle
+- **shareAcrossSpecs**: Session shared across all spec files
+- **Massive savings**: 500 tests = 1 email (not 500!)
+
+---
+
+## Email Authentication Testing Checklist
+
+Before implementing email auth tests, verify:
+
+- [ ] **Email service**: Mailosaur/Ethereal/MailHog configured with API keys
+- [ ] **Link extraction**: Use built-in parsing (html.links[0].href) over regex
+- [ ] **State preservation**: localStorage/session/cookies saved and restored
+- [ ] **Session caching**: cypress-data-session or storageState prevents redundant emails
+- [ ] **Negative flows**: Expired, invalid, reused, rapid requests tested
+- [ ] **Quota awareness**: One email per run (not per test)
+- [ ] **PII scrubbing**: Email IDs logged for debug, but scrubbed from artifacts
+- [ ] **Timeout handling**: 30 second email retrieval timeout configured
+
+## Integration Points
+
+- Used in workflows: `*framework` (email auth setup), `*automate` (email auth test generation)
+- Related fragments: `fixture-architecture.md`, `test-quality.md`
+- Email services: Mailosaur (recommended), Ethereal (free), MailHog (self-hosted)
+- Plugins: cypress-mailosaur, cypress-data-session
+
+_Source: Email authentication blog, Murat testing toolkit, Mailosaur documentation_
--- a/src/modules/bmm/testarch/knowledge/error-handling.md
+++ b/src/modules/bmm/testarch/knowledge/error-handling.md
@@ -1,9 +1,725 @@
 # Error Handling and Resilience Checks

- Treat expected failures explicitly: intercept network errors and assert UI fallbacks (`error-message` visible, retries triggered).
- In Cypress, use scoped `Cypress.on('uncaught:exception')` to ignore known errors; rethrow anything else so regressions fail.
- In Playwright, hook `page.on('pageerror')` and only swallow the specific, documented error messages.
- Test retry/backoff logic by forcing sequential failures (e.g., 500, timeout, success) and asserting telemetry gets recorded.
- Log captured errors with context (request payload, user/session) but redact secrets to keep artifacts safe for sharing.
+## Principle

-_Source: Murat error-handling patterns, Pact resilience guidance._
+Treat expected failures explicitly: intercept network errors, assert UI fallbacks (error messages visible, retries triggered), and use scoped exception handling to ignore known errors while catching regressions. Test retry/backoff logic by forcing sequential failures (500 → timeout → success) and validate telemetry logging. Log captured errors with context (request payload, user/session) but redact secrets to keep artifacts safe for sharing.
+
+## Rationale
+
+Tests fail for two reasons: genuine bugs or poor error handling in the test itself. Without explicit error handling patterns, tests become noisy (uncaught exceptions cause false failures) or silent (swallowing all errors hides real bugs). Scoped exception handling (Cypress.on('uncaught:exception'), page.on('pageerror')) allows tests to ignore documented, expected errors while surfacing unexpected ones. Resilience testing (retry logic, graceful degradation) ensures applications handle failures gracefully in production.
+
+## Pattern Examples
+
+### Example 1: Scoped Exception Handling (Expected Errors Only)
+
+**Context**: Handle known errors (Network failures, expected 500s) without masking unexpected bugs.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/error-handling.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Scoped Error Handling Pattern
+ * - Only ignore specific, documented errors
+ * - Rethrow everything else to catch regressions
+ * - Validate error UI and user experience
+ */
+
+test.describe('API Error Handling', () => {
+  test('should display error message when API returns 500', async ({ page }) => {
+    // Scope error handling to THIS test only
+    const consoleErrors: string[] = [];
+    page.on('pageerror', (error) => {
+      // Only swallow documented NetworkError
+      if (error.message.includes('NetworkError: Failed to fetch')) {
+        consoleErrors.push(error.message);
+        return; // Swallow this specific error
+      }
+      // Rethrow all other errors (catch regressions!)
+      throw error;
+    });
+
+    // Arrange: Mock 500 error response
+    await page.route('**/api/users', (route) =>
+      route.fulfill({
+        status: 500,
+        contentType: 'application/json',
+        body: JSON.stringify({
+          error: 'Internal server error',
+          code: 'INTERNAL_ERROR',
+        }),
+      }),
+    );
+
+    // Act: Navigate to page that fetches users
+    await page.goto('/dashboard');
+
+    // Assert: Error UI displayed
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText(/error.*loading|failed.*load/i);
+
+    // Assert: Retry button visible
+    await expect(page.getByTestId('retry-button')).toBeVisible();
+
+    // Assert: NetworkError was thrown and caught
+    expect(consoleErrors).toContainEqual(expect.stringContaining('NetworkError'));
+  });
+
+  test('should NOT swallow unexpected errors', async ({ page }) => {
+    let unexpectedError: Error | null = null;
+
+    page.on('pageerror', (error) => {
+      // Capture but don't swallow - test should fail
+      unexpectedError = error;
+      throw error;
+    });
+
+    // Arrange: App has JavaScript error (bug)
+    await page.addInitScript(() => {
+      // Simulate bug in app code
+      (window as any).buggyFunction = () => {
+        throw new Error('UNEXPECTED BUG: undefined is not a function');
+      };
+    });
+
+    await page.goto('/dashboard');
+
+    // Trigger buggy function
+    await page.evaluate(() => (window as any).buggyFunction());
+
+    // Assert: Test fails because unexpected error was NOT swallowed
+    expect(unexpectedError).not.toBeNull();
+    expect(unexpectedError?.message).toContain('UNEXPECTED BUG');
+  });
+});
+```
+
+**Cypress equivalent**:
+
+```javascript
+// cypress/e2e/error-handling.cy.ts
+describe('API Error Handling', () => {
+  it('should display error message when API returns 500', () => {
+    // Scoped to this test only
+    cy.on('uncaught:exception', (err) => {
+      // Only swallow documented NetworkError
+      if (err.message.includes('NetworkError')) {
+        return false; // Prevent test failure
+      }
+      // All other errors fail the test
+      return true;
+    });
+
+    // Arrange: Mock 500 error
+    cy.intercept('GET', '**/api/users', {
+      statusCode: 500,
+      body: {
+        error: 'Internal server error',
+        code: 'INTERNAL_ERROR',
+      },
+    }).as('getUsers');
+
+    // Act
+    cy.visit('/dashboard');
+    cy.wait('@getUsers');
+
+    // Assert: Error UI
+    cy.get('[data-cy="error-message"]').should('be.visible');
+    cy.get('[data-cy="error-message"]').should('contain', 'error loading');
+    cy.get('[data-cy="retry-button"]').should('be.visible');
+  });
+
+  it('should NOT swallow unexpected errors', () => {
+    // No exception handler - test should fail on unexpected errors
+
+    cy.visit('/dashboard');
+
+    // Trigger unexpected error
+    cy.window().then((win) => {
+      // This should fail the test
+      win.eval('throw new Error("UNEXPECTED BUG")');
+    });
+
+    // Test fails (as expected) - validates error detection works
+  });
+});
+```
+
+**Key Points**:
+
+- **Scoped handling**: page.on() / cy.on() scoped to specific tests
+- **Explicit allow-list**: Only ignore documented errors
+- **Rethrow unexpected**: Catch regressions by failing on unknown errors
+- **Error UI validation**: Assert user sees error message
+- **Logging**: Capture errors for debugging, don't swallow silently
+
+---
+
+### Example 2: Retry Validation Pattern (Network Resilience)
+
+**Context**: Test that retry/backoff logic works correctly for transient failures.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/retry-resilience.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Retry Validation Pattern
+ * - Force sequential failures (500 → 500 → 200)
+ * - Validate retry attempts and backoff timing
+ * - Assert telemetry captures retry events
+ */
+
+test.describe('Network Retry Logic', () => {
+  test('should retry on 500 error and succeed', async ({ page }) => {
+    let attemptCount = 0;
+    const attemptTimestamps: number[] = [];
+
+    // Mock API: Fail twice, succeed on third attempt
+    await page.route('**/api/products', (route) => {
+      attemptCount++;
+      attemptTimestamps.push(Date.now());
+
+      if (attemptCount <= 2) {
+        // First 2 attempts: 500 error
+        route.fulfill({
+          status: 500,
+          body: JSON.stringify({ error: 'Server error' }),
+        });
+      } else {
+        // 3rd attempt: Success
+        route.fulfill({
+          status: 200,
+          contentType: 'application/json',
+          body: JSON.stringify({ products: [{ id: 1, name: 'Product 1' }] }),
+        });
+      }
+    });
+
+    // Act: Navigate (should retry automatically)
+    await page.goto('/products');
+
+    // Assert: Data eventually loads after retries
+    await expect(page.getByTestId('product-list')).toBeVisible();
+    await expect(page.getByTestId('product-item')).toHaveCount(1);
+
+    // Assert: Exactly 3 attempts made
+    expect(attemptCount).toBe(3);
+
+    // Assert: Exponential backoff timing (1s → 2s between attempts)
+    if (attemptTimestamps.length === 3) {
+      const delay1 = attemptTimestamps[1] - attemptTimestamps[0];
+      const delay2 = attemptTimestamps[2] - attemptTimestamps[1];
+
+      expect(delay1).toBeGreaterThanOrEqual(900); // ~1 second
+      expect(delay1).toBeLessThan(1200);
+      expect(delay2).toBeGreaterThanOrEqual(1900); // ~2 seconds
+      expect(delay2).toBeLessThan(2200);
+    }
+
+    // Assert: Telemetry logged retry events
+    const telemetryEvents = await page.evaluate(() => (window as any).__TELEMETRY_EVENTS__ || []);
+    expect(telemetryEvents).toContainEqual(
+      expect.objectContaining({
+        event: 'api_retry',
+        attempt: 1,
+        endpoint: '/api/products',
+      }),
+    );
+    expect(telemetryEvents).toContainEqual(
+      expect.objectContaining({
+        event: 'api_retry',
+        attempt: 2,
+      }),
+    );
+  });
+
+  test('should give up after max retries and show error', async ({ page }) => {
+    let attemptCount = 0;
+
+    // Mock API: Always fail (test retry limit)
+    await page.route('**/api/products', (route) => {
+      attemptCount++;
+      route.fulfill({
+        status: 500,
+        body: JSON.stringify({ error: 'Persistent server error' }),
+      });
+    });
+
+    // Act
+    await page.goto('/products');
+
+    // Assert: Max retries reached (3 attempts typical)
+    expect(attemptCount).toBe(3);
+
+    // Assert: Error UI displayed after exhausting retries
+    await expect(page.getByTestId('error-message')).toBeVisible();
+    await expect(page.getByTestId('error-message')).toContainText(/unable.*load|failed.*after.*retries/i);
+
+    // Assert: Data not displayed
+    await expect(page.getByTestId('product-list')).not.toBeVisible();
+  });
+
+  test('should NOT retry on 404 (non-retryable error)', async ({ page }) => {
+    let attemptCount = 0;
+
+    // Mock API: 404 error (should NOT retry)
+    await page.route('**/api/products/999', (route) => {
+      attemptCount++;
+      route.fulfill({
+        status: 404,
+        body: JSON.stringify({ error: 'Product not found' }),
+      });
+    });
+
+    await page.goto('/products/999');
+
+    // Assert: Only 1 attempt (no retries on 404)
+    expect(attemptCount).toBe(1);
+
+    // Assert: 404 error displayed immediately
+    await expect(page.getByTestId('not-found-message')).toBeVisible();
+  });
+});
+```
+
+**Cypress with retry interception**:
+
+```javascript
+// cypress/e2e/retry-resilience.cy.ts
+describe('Network Retry Logic', () => {
+  it('should retry on 500 and succeed on 3rd attempt', () => {
+    let attemptCount = 0;
+
+    cy.intercept('GET', '**/api/products', (req) => {
+      attemptCount++;
+
+      if (attemptCount <= 2) {
+        req.reply({ statusCode: 500, body: { error: 'Server error' } });
+      } else {
+        req.reply({ statusCode: 200, body: { products: [{ id: 1, name: 'Product 1' }] } });
+      }
+    }).as('getProducts');
+
+    cy.visit('/products');
+
+    // Wait for final successful request
+    cy.wait('@getProducts').its('response.statusCode').should('eq', 200);
+
+    // Assert: Data loaded
+    cy.get('[data-cy="product-list"]').should('be.visible');
+    cy.get('[data-cy="product-item"]').should('have.length', 1);
+
+    // Validate retry count
+    cy.wrap(attemptCount).should('eq', 3);
+  });
+});
+```
+
+**Key Points**:
+
+- **Sequential failures**: Test retry logic with 500 → 500 → 200
+- **Backoff timing**: Validate exponential backoff delays
+- **Retry limits**: Max attempts enforced (typically 3)
+- **Non-retryable errors**: 404s don't trigger retries
+- **Telemetry**: Log retry attempts for monitoring
+
+---
+
+### Example 3: Telemetry Logging with Context (Sentry Integration)
+
+**Context**: Capture errors with full context for production debugging without exposing secrets.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/telemetry-logging.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Telemetry Logging Pattern
+ * - Log errors with request context
+ * - Redact sensitive data (tokens, passwords, PII)
+ * - Integrate with monitoring (Sentry, Datadog)
+ * - Validate error logging without exposing secrets
+ */
+
+type ErrorLog = {
+  level: 'error' | 'warn' | 'info';
+  message: string;
+  context?: {
+    endpoint?: string;
+    method?: string;
+    statusCode?: number;
+    userId?: string;
+    sessionId?: string;
+  };
+  timestamp: string;
+};
+
+test.describe('Error Telemetry', () => {
+  test('should log API errors with context', async ({ page }) => {
+    const errorLogs: ErrorLog[] = [];
+
+    // Capture console errors
+    page.on('console', (msg) => {
+      if (msg.type() === 'error') {
+        try {
+          const log = JSON.parse(msg.text());
+          errorLogs.push(log);
+        } catch {
+          // Not a structured log, ignore
+        }
+      }
+    });
+
+    // Mock failing API
+    await page.route('**/api/orders', (route) =>
+      route.fulfill({
+        status: 500,
+        body: JSON.stringify({ error: 'Payment processor unavailable' }),
+      }),
+    );
+
+    // Act: Trigger error
+    await page.goto('/checkout');
+    await page.getByTestId('place-order').click();
+
+    // Wait for error UI
+    await expect(page.getByTestId('error-message')).toBeVisible();
+
+    // Assert: Error logged with context
+    expect(errorLogs).toContainEqual(
+      expect.objectContaining({
+        level: 'error',
+        message: expect.stringContaining('API request failed'),
+        context: expect.objectContaining({
+          endpoint: '/api/orders',
+          method: 'POST',
+          statusCode: 500,
+          userId: expect.any(String),
+        }),
+      }),
+    );
+
+    // Assert: Sensitive data NOT logged
+    const logString = JSON.stringify(errorLogs);
+    expect(logString).not.toContain('password');
+    expect(logString).not.toContain('token');
+    expect(logString).not.toContain('creditCard');
+  });
+
+  test('should send errors to Sentry with breadcrumbs', async ({ page }) => {
+    const sentryEvents: any[] = [];
+
+    // Mock Sentry SDK
+    await page.addInitScript(() => {
+      (window as any).Sentry = {
+        captureException: (error: Error, context?: any) => {
+          (window as any).__SENTRY_EVENTS__ = (window as any).__SENTRY_EVENTS__ || [];
+          (window as any).__SENTRY_EVENTS__.push({
+            error: error.message,
+            context,
+            timestamp: Date.now(),
+          });
+        },
+        addBreadcrumb: (breadcrumb: any) => {
+          (window as any).__SENTRY_BREADCRUMBS__ = (window as any).__SENTRY_BREADCRUMBS__ || [];
+          (window as any).__SENTRY_BREADCRUMBS__.push(breadcrumb);
+        },
+      };
+    });
+
+    // Mock failing API
+    await page.route('**/api/users', (route) => route.fulfill({ status: 403, body: { error: 'Forbidden' } }));
+
+    // Act
+    await page.goto('/users');
+
+    // Assert: Sentry captured error
+    const events = await page.evaluate(() => (window as any).__SENTRY_EVENTS__);
+    expect(events).toHaveLength(1);
+    expect(events[0]).toMatchObject({
+      error: expect.stringContaining('403'),
+      context: expect.objectContaining({
+        endpoint: '/api/users',
+        statusCode: 403,
+      }),
+    });
+
+    // Assert: Breadcrumbs include user actions
+    const breadcrumbs = await page.evaluate(() => (window as any).__SENTRY_BREADCRUMBS__);
+    expect(breadcrumbs).toContainEqual(
+      expect.objectContaining({
+        category: 'navigation',
+        message: '/users',
+      }),
+    );
+  });
+});
+```
+
+**Cypress with Sentry**:
+
+```javascript
+// cypress/e2e/telemetry-logging.cy.ts
+describe('Error Telemetry', () => {
+  it('should log API errors with redacted sensitive data', () => {
+    const errorLogs = [];
+
+    // Capture console errors
+    cy.on('window:before:load', (win) => {
+      cy.stub(win.console, 'error').callsFake((msg) => {
+        errorLogs.push(msg);
+      });
+    });
+
+    // Mock failing API
+    cy.intercept('POST', '**/api/orders', {
+      statusCode: 500,
+      body: { error: 'Payment failed' },
+    });
+
+    // Act
+    cy.visit('/checkout');
+    cy.get('[data-cy="place-order"]').click();
+
+    // Assert: Error logged
+    cy.wrap(errorLogs).should('have.length.greaterThan', 0);
+
+    // Assert: Context included
+    cy.wrap(errorLogs[0]).should('include', '/api/orders');
+
+    // Assert: Secrets redacted
+    cy.wrap(JSON.stringify(errorLogs)).should('not.contain', 'password');
+    cy.wrap(JSON.stringify(errorLogs)).should('not.contain', 'creditCard');
+  });
+});
+```
+
+**Error logger utility with redaction**:
+
+```typescript
+// src/utils/error-logger.ts
+type ErrorContext = {
+  endpoint?: string;
+  method?: string;
+  statusCode?: number;
+  userId?: string;
+  sessionId?: string;
+  requestPayload?: any;
+};
+
+const SENSITIVE_KEYS = ['password', 'token', 'creditCard', 'ssn', 'apiKey'];
+
+/**
+ * Redact sensitive data from objects
+ */
+function redactSensitiveData(obj: any): any {
+  if (typeof obj !== 'object' || obj === null) return obj;
+
+  const redacted = { ...obj };
+
+  for (const key of Object.keys(redacted)) {
+    if (SENSITIVE_KEYS.some((sensitive) => key.toLowerCase().includes(sensitive))) {
+      redacted[key] = '[REDACTED]';
+    } else if (typeof redacted[key] === 'object') {
+      redacted[key] = redactSensitiveData(redacted[key]);
+    }
+  }
+
+  return redacted;
+}
+
+/**
+ * Log error with context (Sentry integration)
+ */
+export function logError(error: Error, context?: ErrorContext) {
+  const safeContext = context ? redactSensitiveData(context) : {};
+
+  const errorLog = {
+    level: 'error' as const,
+    message: error.message,
+    stack: error.stack,
+    context: safeContext,
+    timestamp: new Date().toISOString(),
+  };
+
+  // Console (development)
+  console.error(JSON.stringify(errorLog));
+
+  // Sentry (production)
+  if (typeof window !== 'undefined' && (window as any).Sentry) {
+    (window as any).Sentry.captureException(error, {
+      contexts: { custom: safeContext },
+    });
+  }
+}
+```
+
+**Key Points**:
+
+- **Context-rich logging**: Endpoint, method, status, user ID
+- **Secret redaction**: Passwords, tokens, PII removed before logging
+- **Sentry integration**: Production monitoring with breadcrumbs
+- **Structured logs**: JSON format for easy parsing
+- **Test validation**: Assert logs contain context but not secrets
+
+---
+
+### Example 4: Graceful Degradation Tests (Fallback Behavior)
+
+**Context**: Validate application continues functioning when services are unavailable.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/graceful-degradation.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Graceful Degradation Pattern
+ * - Simulate service unavailability
+ * - Validate fallback behavior
+ * - Ensure user experience degrades gracefully
+ * - Verify telemetry captures degradation events
+ */
+
+test.describe('Service Unavailability', () => {
+  test('should display cached data when API is down', async ({ page }) => {
+    // Arrange: Seed localStorage with cached data
+    await page.addInitScript(() => {
+      localStorage.setItem(
+        'products_cache',
+        JSON.stringify({
+          data: [
+            { id: 1, name: 'Cached Product 1' },
+            { id: 2, name: 'Cached Product 2' },
+          ],
+          timestamp: Date.now(),
+        }),
+      );
+    });
+
+    // Mock API unavailable
+    await page.route(
+      '**/api/products',
+      (route) => route.abort('connectionrefused'), // Simulate server down
+    );
+
+    // Act
+    await page.goto('/products');
+
+    // Assert: Cached data displayed
+    await expect(page.getByTestId('product-list')).toBeVisible();
+    await expect(page.getByText('Cached Product 1')).toBeVisible();
+
+    // Assert: Stale data warning shown
+    await expect(page.getByTestId('cache-warning')).toBeVisible();
+    await expect(page.getByTestId('cache-warning')).toContainText(/showing.*cached|offline.*mode/i);
+
+    // Assert: Retry button available
+    await expect(page.getByTestId('refresh-button')).toBeVisible();
+  });
+
+  test('should show fallback UI when analytics service fails', async ({ page }) => {
+    // Mock analytics service down (non-critical)
+    await page.route('**/analytics/track', (route) => route.fulfill({ status: 503, body: 'Service unavailable' }));
+
+    // Act: Navigate normally
+    await page.goto('/dashboard');
+
+    // Assert: Page loads successfully (analytics failure doesn't block)
+    await expect(page.getByTestId('dashboard-content')).toBeVisible();
+
+    // Assert: Analytics error logged but not shown to user
+    const consoleErrors = [];
+    page.on('console', (msg) => {
+      if (msg.type() === 'error') consoleErrors.push(msg.text());
+    });
+
+    // Trigger analytics event
+    await page.getByTestId('track-action-button').click();
+
+    // Analytics error logged
+    expect(consoleErrors).toContainEqual(expect.stringContaining('Analytics service unavailable'));
+
+    // But user doesn't see error
+    await expect(page.getByTestId('error-message')).not.toBeVisible();
+  });
+
+  test('should fallback to local validation when API is slow', async ({ page }) => {
+    // Mock slow API (> 5 seconds)
+    await page.route('**/api/validate-email', async (route) => {
+      await new Promise((resolve) => setTimeout(resolve, 6000)); // 6 second delay
+      route.fulfill({
+        status: 200,
+        body: JSON.stringify({ valid: true }),
+      });
+    });
+
+    // Act: Fill form
+    await page.goto('/signup');
+    await page.getByTestId('email-input').fill('test@example.com');
+    await page.getByTestId('email-input').blur();
+
+    // Assert: Client-side validation triggers immediately (doesn't wait for API)
+    await expect(page.getByTestId('email-valid-icon')).toBeVisible({ timeout: 1000 });
+
+    // Assert: Eventually API validates too (but doesn't block UX)
+    await expect(page.getByTestId('email-validated-badge')).toBeVisible({ timeout: 7000 });
+  });
+
+  test('should maintain functionality with third-party script failure', async ({ page }) => {
+    // Block third-party scripts (Google Analytics, Intercom, etc.)
+    await page.route('**/*.google-analytics.com/**', (route) => route.abort());
+    await page.route('**/*.intercom.io/**', (route) => route.abort());
+
+    // Act
+    await page.goto('/');
+
+    // Assert: App works without third-party scripts
+    await expect(page.getByTestId('main-content')).toBeVisible();
+    await expect(page.getByTestId('nav-menu')).toBeVisible();
+
+    // Assert: Core functionality intact
+    await page.getByTestId('nav-products').click();
+    await expect(page).toHaveURL(/.*\/products/);
+  });
+});
+```
+
+**Key Points**:
+
+- **Cached fallbacks**: Display stale data when API unavailable
+- **Non-critical degradation**: Analytics failures don't block app
+- **Client-side fallbacks**: Local validation when API slow
+- **Third-party resilience**: App works without external scripts
+- **User transparency**: Stale data warnings displayed
+
+---
+
+## Error Handling Testing Checklist
+
+Before shipping error handling code, verify:
+
+- [ ] **Scoped exception handling**: Only ignore documented errors (NetworkError, specific codes)
+- [ ] **Rethrow unexpected**: Unknown errors fail tests (catch regressions)
+- [ ] **Error UI tested**: User sees error messages for all error states
+- [ ] **Retry logic validated**: Sequential failures test backoff and max attempts
+- [ ] **Telemetry verified**: Errors logged with context (endpoint, status, user)
+- [ ] **Secret redaction**: Logs don't contain passwords, tokens, PII
+- [ ] **Graceful degradation**: Critical services down, app shows fallback UI
+- [ ] **Non-critical failures**: Analytics/tracking failures don't block app
+
+## Integration Points
+
+- Used in workflows: `*automate` (error handling test generation), `*test-review` (error pattern detection)
+- Related fragments: `network-first.md`, `test-quality.md`, `contract-testing.md`
+- Monitoring tools: Sentry, Datadog, LogRocket
+
+_Source: Murat error-handling patterns, Pact resilience guidance, SEON production error handling_
--- a/src/modules/bmm/testarch/knowledge/feature-flags.md
+++ b/src/modules/bmm/testarch/knowledge/feature-flags.md
@@ -1,9 +1,750 @@
 # Feature Flag Governance

- Centralize flag definitions in a frozen enum; expose helpers to set, clear, and target specific audiences.
- Test both enabled and disabled states in CI; clean up targeting after each spec to keep shared environments stable.
- For LaunchDarkly-style systems, script API helpers to seed variations instead of mutating via UI.
- Maintain a checklist for new flags: default state, owners, expiry date, telemetry, rollback plan.
- Document flag dependencies in story/PR templates so QA and release reviews know which toggles must flip before launch.
+## Principle

-_Source: LaunchDarkly strategy blog, Murat test architecture notes._
+Feature flags enable controlled rollouts and A/B testing, but require disciplined testing governance. Centralize flag definitions in a frozen enum, test both enabled and disabled states, clean up targeting after each spec, and maintain a comprehensive flag lifecycle checklist. For LaunchDarkly-style systems, script API helpers to seed variations programmatically rather than manual UI mutations.
+
+## Rationale
+
+Poorly managed feature flags become technical debt: untested variations ship broken code, forgotten flags clutter the codebase, and shared environments become unstable from leftover targeting rules. Structured governance ensures flags are testable, traceable, temporary, and safe. Testing both states prevents surprises when flags flip in production.
+
+## Pattern Examples
+
+### Example 1: Feature Flag Enum Pattern with Type Safety
+
+**Context**: Centralized flag management with TypeScript type safety and runtime validation.
+
+**Implementation**:
+
+```typescript
+// src/utils/feature-flags.ts
+/**
+ * Centralized feature flag definitions
+ * - Object.freeze prevents runtime modifications
+ * - TypeScript ensures compile-time type safety
+ * - Single source of truth for all flag keys
+ */
+export const FLAGS = Object.freeze({
+  // User-facing features
+  NEW_CHECKOUT_FLOW: 'new-checkout-flow',
+  DARK_MODE: 'dark-mode',
+  ENHANCED_SEARCH: 'enhanced-search',
+
+  // Experiments
+  PRICING_EXPERIMENT_A: 'pricing-experiment-a',
+  HOMEPAGE_VARIANT_B: 'homepage-variant-b',
+
+  // Infrastructure
+  USE_NEW_API_ENDPOINT: 'use-new-api-endpoint',
+  ENABLE_ANALYTICS_V2: 'enable-analytics-v2',
+
+  // Killswitches (emergency disables)
+  DISABLE_PAYMENT_PROCESSING: 'disable-payment-processing',
+  DISABLE_EMAIL_NOTIFICATIONS: 'disable-email-notifications',
+} as const);
+
+/**
+ * Type-safe flag keys
+ * Prevents typos and ensures autocomplete in IDEs
+ */
+export type FlagKey = (typeof FLAGS)[keyof typeof FLAGS];
+
+/**
+ * Flag metadata for governance
+ */
+type FlagMetadata = {
+  key: FlagKey;
+  name: string;
+  owner: string;
+  createdDate: string;
+  expiryDate?: string;
+  defaultState: boolean;
+  requiresCleanup: boolean;
+  dependencies?: FlagKey[];
+  telemetryEvents?: string[];
+};
+
+/**
+ * Flag registry with governance metadata
+ * Used for flag lifecycle tracking and cleanup alerts
+ */
+export const FLAG_REGISTRY: Record<FlagKey, FlagMetadata> = {
+  [FLAGS.NEW_CHECKOUT_FLOW]: {
+    key: FLAGS.NEW_CHECKOUT_FLOW,
+    name: 'New Checkout Flow',
+    owner: 'payments-team',
+    createdDate: '2025-01-15',
+    expiryDate: '2025-03-15',
+    defaultState: false,
+    requiresCleanup: true,
+    dependencies: [FLAGS.USE_NEW_API_ENDPOINT],
+    telemetryEvents: ['checkout_started', 'checkout_completed'],
+  },
+  [FLAGS.DARK_MODE]: {
+    key: FLAGS.DARK_MODE,
+    name: 'Dark Mode UI',
+    owner: 'frontend-team',
+    createdDate: '2025-01-10',
+    defaultState: false,
+    requiresCleanup: false, // Permanent feature toggle
+  },
+  // ... rest of registry
+};
+
+/**
+ * Validate flag exists in registry
+ * Throws at runtime if flag is unregistered
+ */
+export function validateFlag(flag: string): asserts flag is FlagKey {
+  if (!Object.values(FLAGS).includes(flag as FlagKey)) {
+    throw new Error(`Unregistered feature flag: ${flag}`);
+  }
+}
+
+/**
+ * Check if flag is expired (needs removal)
+ */
+export function isFlagExpired(flag: FlagKey): boolean {
+  const metadata = FLAG_REGISTRY[flag];
+  if (!metadata.expiryDate) return false;
+
+  const expiry = new Date(metadata.expiryDate);
+  return Date.now() > expiry.getTime();
+}
+
+/**
+ * Get all expired flags requiring cleanup
+ */
+export function getExpiredFlags(): FlagMetadata[] {
+  return Object.values(FLAG_REGISTRY).filter((meta) => isFlagExpired(meta.key));
+}
+```
+
+**Usage in application code**:
+
+```typescript
+// components/Checkout.tsx
+import { FLAGS } from '@/utils/feature-flags';
+import { useFeatureFlag } from '@/hooks/useFeatureFlag';
+
+export function Checkout() {
+  const isNewFlow = useFeatureFlag(FLAGS.NEW_CHECKOUT_FLOW);
+
+  return isNewFlow ? <NewCheckoutFlow /> : <LegacyCheckoutFlow />;
+}
+```
+
+**Key Points**:
+
+- **Type safety**: TypeScript catches typos at compile time
+- **Runtime validation**: validateFlag ensures only registered flags used
+- **Metadata tracking**: Owner, dates, dependencies documented
+- **Expiry alerts**: Automated detection of stale flags
+- **Single source of truth**: All flags defined in one place
+
+---
+
+### Example 2: Feature Flag Testing Pattern (Both States)
+
+**Context**: Comprehensive testing of feature flag variations with proper cleanup.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/checkout-feature-flag.spec.ts
+import { test, expect } from '@playwright/test';
+import { FLAGS } from '@/utils/feature-flags';
+
+/**
+ * Feature Flag Testing Strategy:
+ * 1. Test BOTH enabled and disabled states
+ * 2. Clean up targeting after each test
+ * 3. Use dedicated test users (not production data)
+ * 4. Verify telemetry events fire correctly
+ */
+
+test.describe('Checkout Flow - Feature Flag Variations', () => {
+  let testUserId: string;
+
+  test.beforeEach(async () => {
+    // Generate unique test user ID
+    testUserId = `test-user-${Date.now()}`;
+  });
+
+  test.afterEach(async ({ request }) => {
+    // CRITICAL: Clean up flag targeting to prevent shared env pollution
+    await request.post('/api/feature-flags/cleanup', {
+      data: {
+        flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+        userId: testUserId,
+      },
+    });
+  });
+
+  test('should use NEW checkout flow when flag is ENABLED', async ({ page, request }) => {
+    // Arrange: Enable flag for test user
+    await request.post('/api/feature-flags/target', {
+      data: {
+        flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+        userId: testUserId,
+        variation: true, // ENABLED
+      },
+    });
+
+    // Act: Navigate as targeted user
+    await page.goto('/checkout', {
+      extraHTTPHeaders: {
+        'X-Test-User-ID': testUserId,
+      },
+    });
+
+    // Assert: New flow UI elements visible
+    await expect(page.getByTestId('checkout-v2-container')).toBeVisible();
+    await expect(page.getByTestId('express-payment-options')).toBeVisible();
+    await expect(page.getByTestId('saved-addresses-dropdown')).toBeVisible();
+
+    // Assert: Legacy flow NOT visible
+    await expect(page.getByTestId('checkout-v1-container')).not.toBeVisible();
+
+    // Assert: Telemetry event fired
+    const analyticsEvents = await page.evaluate(() => (window as any).__ANALYTICS_EVENTS__ || []);
+    expect(analyticsEvents).toContainEqual(
+      expect.objectContaining({
+        event: 'checkout_started',
+        properties: expect.objectContaining({
+          variant: 'new_flow',
+        }),
+      }),
+    );
+  });
+
+  test('should use LEGACY checkout flow when flag is DISABLED', async ({ page, request }) => {
+    // Arrange: Disable flag for test user (or don't target at all)
+    await request.post('/api/feature-flags/target', {
+      data: {
+        flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+        userId: testUserId,
+        variation: false, // DISABLED
+      },
+    });
+
+    // Act: Navigate as targeted user
+    await page.goto('/checkout', {
+      extraHTTPHeaders: {
+        'X-Test-User-ID': testUserId,
+      },
+    });
+
+    // Assert: Legacy flow UI elements visible
+    await expect(page.getByTestId('checkout-v1-container')).toBeVisible();
+    await expect(page.getByTestId('legacy-payment-form')).toBeVisible();
+
+    // Assert: New flow NOT visible
+    await expect(page.getByTestId('checkout-v2-container')).not.toBeVisible();
+    await expect(page.getByTestId('express-payment-options')).not.toBeVisible();
+
+    // Assert: Telemetry event fired with correct variant
+    const analyticsEvents = await page.evaluate(() => (window as any).__ANALYTICS_EVENTS__ || []);
+    expect(analyticsEvents).toContainEqual(
+      expect.objectContaining({
+        event: 'checkout_started',
+        properties: expect.objectContaining({
+          variant: 'legacy_flow',
+        }),
+      }),
+    );
+  });
+
+  test('should handle flag evaluation errors gracefully', async ({ page, request }) => {
+    // Arrange: Simulate flag service unavailable
+    await page.route('**/api/feature-flags/evaluate', (route) => route.fulfill({ status: 500, body: 'Service Unavailable' }));
+
+    // Act: Navigate (should fallback to default state)
+    await page.goto('/checkout', {
+      extraHTTPHeaders: {
+        'X-Test-User-ID': testUserId,
+      },
+    });
+
+    // Assert: Fallback to safe default (legacy flow)
+    await expect(page.getByTestId('checkout-v1-container')).toBeVisible();
+
+    // Assert: Error logged but no user-facing error
+    const consoleErrors = [];
+    page.on('console', (msg) => {
+      if (msg.type() === 'error') consoleErrors.push(msg.text());
+    });
+    expect(consoleErrors).toContain(expect.stringContaining('Feature flag evaluation failed'));
+  });
+});
+```
+
+**Cypress equivalent**:
+
+```javascript
+// cypress/e2e/checkout-feature-flag.cy.ts
+import { FLAGS } from '@/utils/feature-flags';
+
+describe('Checkout Flow - Feature Flag Variations', () => {
+  let testUserId;
+
+  beforeEach(() => {
+    testUserId = `test-user-${Date.now()}`;
+  });
+
+  afterEach(() => {
+    // Clean up targeting
+    cy.task('removeFeatureFlagTarget', {
+      flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+      userId: testUserId,
+    });
+  });
+
+  it('should use NEW checkout flow when flag is ENABLED', () => {
+    // Arrange: Enable flag via Cypress task
+    cy.task('setFeatureFlagVariation', {
+      flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+      userId: testUserId,
+      variation: true,
+    });
+
+    // Act
+    cy.visit('/checkout', {
+      headers: { 'X-Test-User-ID': testUserId },
+    });
+
+    // Assert
+    cy.get('[data-testid="checkout-v2-container"]').should('be.visible');
+    cy.get('[data-testid="checkout-v1-container"]').should('not.exist');
+  });
+
+  it('should use LEGACY checkout flow when flag is DISABLED', () => {
+    // Arrange: Disable flag
+    cy.task('setFeatureFlagVariation', {
+      flagKey: FLAGS.NEW_CHECKOUT_FLOW,
+      userId: testUserId,
+      variation: false,
+    });
+
+    // Act
+    cy.visit('/checkout', {
+      headers: { 'X-Test-User-ID': testUserId },
+    });
+
+    // Assert
+    cy.get('[data-testid="checkout-v1-container"]').should('be.visible');
+    cy.get('[data-testid="checkout-v2-container"]').should('not.exist');
+  });
+});
+```
+
+**Key Points**:
+
+- **Test both states**: Enabled AND disabled variations
+- **Automatic cleanup**: afterEach removes targeting (prevent pollution)
+- **Unique test users**: Avoid conflicts with real user data
+- **Telemetry validation**: Verify analytics events fire correctly
+- **Graceful degradation**: Test fallback behavior on errors
+
+---
+
+### Example 3: Feature Flag Targeting Helper Pattern
+
+**Context**: Reusable helpers for programmatic flag control via LaunchDarkly/Split.io API.
+
+**Implementation**:
+
+```typescript
+// tests/support/feature-flag-helpers.ts
+import { request as playwrightRequest } from '@playwright/test';
+import { FLAGS, FlagKey } from '@/utils/feature-flags';
+
+/**
+ * LaunchDarkly API client configuration
+ * Use test project SDK key (NOT production)
+ */
+const LD_SDK_KEY = process.env.LD_SDK_KEY_TEST;
+const LD_API_BASE = 'https://app.launchdarkly.com/api/v2';
+
+type FlagVariation = boolean | string | number | object;
+
+/**
+ * Set flag variation for specific user
+ * Uses LaunchDarkly API to create user target
+ */
+export async function setFlagForUser(flagKey: FlagKey, userId: string, variation: FlagVariation): Promise<void> {
+  const response = await playwrightRequest.newContext().then((ctx) =>
+    ctx.post(`${LD_API_BASE}/flags/${flagKey}/targeting`, {
+      headers: {
+        Authorization: LD_SDK_KEY!,
+        'Content-Type': 'application/json',
+      },
+      data: {
+        targets: [
+          {
+            values: [userId],
+            variation: variation ? 1 : 0, // 0 = off, 1 = on
+          },
+        ],
+      },
+    }),
+  );
+
+  if (!response.ok()) {
+    throw new Error(`Failed to set flag ${flagKey} for user ${userId}: ${response.status()}`);
+  }
+}
+
+/**
+ * Remove user from flag targeting
+ * CRITICAL for test cleanup
+ */
+export async function removeFlagTarget(flagKey: FlagKey, userId: string): Promise<void> {
+  const response = await playwrightRequest.newContext().then((ctx) =>
+    ctx.delete(`${LD_API_BASE}/flags/${flagKey}/targeting/users/${userId}`, {
+      headers: {
+        Authorization: LD_SDK_KEY!,
+      },
+    }),
+  );
+
+  if (!response.ok() && response.status() !== 404) {
+    // 404 is acceptable (user wasn't targeted)
+    throw new Error(`Failed to remove flag ${flagKey} target for user ${userId}: ${response.status()}`);
+  }
+}
+
+/**
+ * Percentage rollout helper
+ * Enable flag for N% of users
+ */
+export async function setFlagRolloutPercentage(flagKey: FlagKey, percentage: number): Promise<void> {
+  if (percentage < 0 || percentage > 100) {
+    throw new Error('Percentage must be between 0 and 100');
+  }
+
+  const response = await playwrightRequest.newContext().then((ctx) =>
+    ctx.patch(`${LD_API_BASE}/flags/${flagKey}`, {
+      headers: {
+        Authorization: LD_SDK_KEY!,
+        'Content-Type': 'application/json',
+      },
+      data: {
+        rollout: {
+          variations: [
+            { variation: 0, weight: 100 - percentage }, // off
+            { variation: 1, weight: percentage }, // on
+          ],
+        },
+      },
+    }),
+  );
+
+  if (!response.ok()) {
+    throw new Error(`Failed to set rollout for flag ${flagKey}: ${response.status()}`);
+  }
+}
+
+/**
+ * Enable flag globally (100% rollout)
+ */
+export async function enableFlagGlobally(flagKey: FlagKey): Promise<void> {
+  await setFlagRolloutPercentage(flagKey, 100);
+}
+
+/**
+ * Disable flag globally (0% rollout)
+ */
+export async function disableFlagGlobally(flagKey: FlagKey): Promise<void> {
+  await setFlagRolloutPercentage(flagKey, 0);
+}
+
+/**
+ * Stub feature flags in local/test environments
+ * Bypasses LaunchDarkly entirely
+ */
+export function stubFeatureFlags(flags: Record<FlagKey, FlagVariation>): void {
+  // Set flags in localStorage or inject into window
+  if (typeof window !== 'undefined') {
+    (window as any).__STUBBED_FLAGS__ = flags;
+  }
+}
+```
+
+**Usage in Playwright fixture**:
+
+```typescript
+// playwright/fixtures/feature-flag-fixture.ts
+import { test as base } from '@playwright/test';
+import { setFlagForUser, removeFlagTarget } from '../support/feature-flag-helpers';
+import { FlagKey } from '@/utils/feature-flags';
+
+type FeatureFlagFixture = {
+  featureFlags: {
+    enable: (flag: FlagKey, userId: string) => Promise<void>;
+    disable: (flag: FlagKey, userId: string) => Promise<void>;
+    cleanup: (flag: FlagKey, userId: string) => Promise<void>;
+  };
+};
+
+export const test = base.extend<FeatureFlagFixture>({
+  featureFlags: async ({}, use) => {
+    const cleanupQueue: Array<{ flag: FlagKey; userId: string }> = [];
+
+    await use({
+      enable: async (flag, userId) => {
+        await setFlagForUser(flag, userId, true);
+        cleanupQueue.push({ flag, userId });
+      },
+      disable: async (flag, userId) => {
+        await setFlagForUser(flag, userId, false);
+        cleanupQueue.push({ flag, userId });
+      },
+      cleanup: async (flag, userId) => {
+        await removeFlagTarget(flag, userId);
+      },
+    });
+
+    // Auto-cleanup after test
+    for (const { flag, userId } of cleanupQueue) {
+      await removeFlagTarget(flag, userId);
+    }
+  },
+});
+```
+
+**Key Points**:
+
+- **API-driven control**: No manual UI clicks required
+- **Auto-cleanup**: Fixture tracks and removes targeting
+- **Percentage rollouts**: Test gradual feature releases
+- **Stubbing option**: Local development without LaunchDarkly
+- **Type-safe**: FlagKey prevents typos
+
+---
+
+### Example 4: Feature Flag Lifecycle Checklist & Cleanup Strategy
+
+**Context**: Governance checklist and automated cleanup detection for stale flags.
+
+**Implementation**:
+
+```typescript
+// scripts/feature-flag-audit.ts
+/**
+ * Feature Flag Lifecycle Audit Script
+ * Run weekly to detect stale flags requiring cleanup
+ */
+
+import { FLAG_REGISTRY, FLAGS, getExpiredFlags, FlagKey } from '../src/utils/feature-flags';
+import * as fs from 'fs';
+import * as path from 'path';
+
+type AuditResult = {
+  totalFlags: number;
+  expiredFlags: FlagKey[];
+  missingOwners: FlagKey[];
+  missingDates: FlagKey[];
+  permanentFlags: FlagKey[];
+  flagsNearingExpiry: FlagKey[];
+};
+
+/**
+ * Audit all feature flags for governance compliance
+ */
+function auditFeatureFlags(): AuditResult {
+  const allFlags = Object.keys(FLAG_REGISTRY) as FlagKey[];
+  const expiredFlags = getExpiredFlags().map((meta) => meta.key);
+
+  // Flags expiring in next 30 days
+  const thirtyDaysFromNow = Date.now() + 30 * 24 * 60 * 60 * 1000;
+  const flagsNearingExpiry = allFlags.filter((flag) => {
+    const meta = FLAG_REGISTRY[flag];
+    if (!meta.expiryDate) return false;
+    const expiry = new Date(meta.expiryDate).getTime();
+    return expiry > Date.now() && expiry < thirtyDaysFromNow;
+  });
+
+  // Missing metadata
+  const missingOwners = allFlags.filter((flag) => !FLAG_REGISTRY[flag].owner);
+  const missingDates = allFlags.filter((flag) => !FLAG_REGISTRY[flag].createdDate);
+
+  // Permanent flags (no expiry, requiresCleanup = false)
+  const permanentFlags = allFlags.filter((flag) => {
+    const meta = FLAG_REGISTRY[flag];
+    return !meta.expiryDate && !meta.requiresCleanup;
+  });
+
+  return {
+    totalFlags: allFlags.length,
+    expiredFlags,
+    missingOwners,
+    missingDates,
+    permanentFlags,
+    flagsNearingExpiry,
+  };
+}
+
+/**
+ * Generate markdown report
+ */
+function generateReport(audit: AuditResult): string {
+  let report = `# Feature Flag Audit Report\n\n`;
+  report += `**Date**: ${new Date().toISOString()}\n`;
+  report += `**Total Flags**: ${audit.totalFlags}\n\n`;
+
+  if (audit.expiredFlags.length > 0) {
+    report += `## ⚠️ EXPIRED FLAGS - IMMEDIATE CLEANUP REQUIRED\n\n`;
+    audit.expiredFlags.forEach((flag) => {
+      const meta = FLAG_REGISTRY[flag];
+      report += `- **${meta.name}** (\`${flag}\`)\n`;
+      report += `  - Owner: ${meta.owner}\n`;
+      report += `  - Expired: ${meta.expiryDate}\n`;
+      report += `  - Action: Remove flag code, update tests, deploy\n\n`;
+    });
+  }
+
+  if (audit.flagsNearingExpiry.length > 0) {
+    report += `## ⏰ FLAGS EXPIRING SOON (Next 30 Days)\n\n`;
+    audit.flagsNearingExpiry.forEach((flag) => {
+      const meta = FLAG_REGISTRY[flag];
+      report += `- **${meta.name}** (\`${flag}\`)\n`;
+      report += `  - Owner: ${meta.owner}\n`;
+      report += `  - Expires: ${meta.expiryDate}\n`;
+      report += `  - Action: Plan cleanup or extend expiry\n\n`;
+    });
+  }
+
+  if (audit.permanentFlags.length > 0) {
+    report += `## 🔄 PERMANENT FLAGS (No Expiry)\n\n`;
+    audit.permanentFlags.forEach((flag) => {
+      const meta = FLAG_REGISTRY[flag];
+      report += `- **${meta.name}** (\`${flag}\`) - Owner: ${meta.owner}\n`;
+    });
+    report += `\n`;
+  }
+
+  if (audit.missingOwners.length > 0 || audit.missingDates.length > 0) {
+    report += `## ❌ GOVERNANCE ISSUES\n\n`;
+    if (audit.missingOwners.length > 0) {
+      report += `**Missing Owners**: ${audit.missingOwners.join(', ')}\n`;
+    }
+    if (audit.missingDates.length > 0) {
+      report += `**Missing Created Dates**: ${audit.missingDates.join(', ')}\n`;
+    }
+    report += `\n`;
+  }
+
+  return report;
+}
+
+/**
+ * Feature Flag Lifecycle Checklist
+ */
+const FLAG_LIFECYCLE_CHECKLIST = `
+# Feature Flag Lifecycle Checklist
+
+## Before Creating a New Flag
+
+- [ ] **Name**: Follow naming convention (kebab-case, descriptive)
+- [ ] **Owner**: Assign team/individual responsible
+- [ ] **Default State**: Determine safe default (usually false)
+- [ ] **Expiry Date**: Set removal date (30-90 days typical)
+- [ ] **Dependencies**: Document related flags
+- [ ] **Telemetry**: Plan analytics events to track
+- [ ] **Rollback Plan**: Define how to disable quickly
+
+## During Development
+
+- [ ] **Code Paths**: Both enabled/disabled states implemented
+- [ ] **Tests**: Both variations tested in CI
+- [ ] **Documentation**: Flag purpose documented in code/PR
+- [ ] **Telemetry**: Analytics events instrumented
+- [ ] **Error Handling**: Graceful degradation on flag service failure
+
+## Before Launch
+
+- [ ] **QA**: Both states tested in staging
+- [ ] **Rollout Plan**: Gradual rollout percentage defined
+- [ ] **Monitoring**: Dashboards/alerts for flag-related metrics
+- [ ] **Stakeholder Communication**: Product/design aligned
+
+## After Launch (Monitoring)
+
+- [ ] **Metrics**: Success criteria tracked
+- [ ] **Error Rates**: No increase in errors
+- [ ] **Performance**: No degradation
+- [ ] **User Feedback**: Qualitative data collected
+
+## Cleanup (Post-Launch)
+
+- [ ] **Remove Flag Code**: Delete if/else branches
+- [ ] **Update Tests**: Remove flag-specific tests
+- [ ] **Remove Targeting**: Clear all user targets
+- [ ] **Delete Flag Config**: Remove from LaunchDarkly/registry
+- [ ] **Update Documentation**: Remove references
+- [ ] **Deploy**: Ship cleanup changes
+`;
+
+// Run audit
+const audit = auditFeatureFlags();
+const report = generateReport(audit);
+
+// Save report
+const outputPath = path.join(__dirname, '../feature-flag-audit-report.md');
+fs.writeFileSync(outputPath, report);
+fs.writeFileSync(path.join(__dirname, '../FEATURE-FLAG-CHECKLIST.md'), FLAG_LIFECYCLE_CHECKLIST);
+
+console.log(`✅ Audit complete. Report saved to: ${outputPath}`);
+console.log(`Total flags: ${audit.totalFlags}`);
+console.log(`Expired flags: ${audit.expiredFlags.length}`);
+console.log(`Flags expiring soon: ${audit.flagsNearingExpiry.length}`);
+
+// Exit with error if expired flags exist
+if (audit.expiredFlags.length > 0) {
+  console.error(`\n❌ EXPIRED FLAGS DETECTED - CLEANUP REQUIRED`);
+  process.exit(1);
+}
+```
+
+**package.json scripts**:
+
+```json
+{
+  "scripts": {
+    "feature-flags:audit": "ts-node scripts/feature-flag-audit.ts",
+    "feature-flags:audit:ci": "npm run feature-flags:audit || true"
+  }
+}
+```
+
+**Key Points**:
+
+- **Automated detection**: Weekly audit catches stale flags
+- **Lifecycle checklist**: Comprehensive governance guide
+- **Expiry tracking**: Flags auto-expire after defined date
+- **CI integration**: Audit runs in pipeline, warns on expiry
+- **Ownership clarity**: Every flag has assigned owner
+
+---
+
+## Feature Flag Testing Checklist
+
+Before merging flag-related code, verify:
+
+- [ ] **Both states tested**: Enabled AND disabled variations covered
+- [ ] **Cleanup automated**: afterEach removes targeting (no manual cleanup)
+- [ ] **Unique test data**: Test users don't collide with production
+- [ ] **Telemetry validated**: Analytics events fire for both variations
+- [ ] **Error handling**: Graceful fallback when flag service unavailable
+- [ ] **Flag metadata**: Owner, dates, dependencies documented in registry
+- [ ] **Rollback plan**: Clear steps to disable flag in production
+- [ ] **Expiry date set**: Removal date defined (or marked permanent)
+
+## Integration Points
+
+- Used in workflows: `*automate` (test generation), `*framework` (flag setup)
+- Related fragments: `test-quality.md`, `selective-testing.md`
+- Flag services: LaunchDarkly, Split.io, Unleash, custom implementations
+
+_Source: LaunchDarkly strategy blog, Murat test architecture notes, SEON feature flag governance_
--- a/src/modules/bmm/testarch/knowledge/fixture-architecture.md
+++ b/src/modules/bmm/testarch/knowledge/fixture-architecture.md
@@ -1,9 +1,401 @@
 # Fixture Architecture Playbook

- Build helpers as pure functions first, then expose them via Playwright `extend` or Cypress commands so logic stays testable in isolation.
- Compose capabilities with `mergeTests` (Playwright) or layered Cypress commands instead of inheritance; each fixture should solve one concern (auth, api, logs, network).
- Keep HTTP helpers framework agnostic—accept all required params explicitly and return results so unit tests and runtime fixtures can share them.
- Export fixtures through package subpaths (`"./api-request"`, `"./api-request/fixtures"`) to make reuse trivial across suites and projects.
- Treat fixture files as infrastructure: document dependencies, enforce deterministic timeouts, and ban hidden retries that mask flakiness.
+## Principle

-_Source: Murat Testing Philosophy, cy-vs-pw comparison, SEON production patterns._
+Build test helpers as pure functions first, then wrap them in framework-specific fixtures. Compose capabilities using `mergeTests` (Playwright) or layered commands (Cypress) instead of inheritance. Each fixture should solve one isolated concern (auth, API, logs, network).
+
+## Rationale
+
+Traditional Page Object Models create tight coupling through inheritance chains (`BasePage → LoginPage → AdminPage`). When base classes change, all descendants break. Pure functions with fixture wrappers provide:
+
+- **Testability**: Pure functions run in unit tests without framework overhead
+- **Composability**: Mix capabilities freely via `mergeTests`, no inheritance constraints
+- **Reusability**: Export fixtures via package subpaths for cross-project sharing
+- **Maintainability**: One concern per fixture = clear responsibility boundaries
+
+## Pattern Examples
+
+### Example 1: Pure Function → Fixture Pattern
+
+**Context**: When building any test helper, always start with a pure function that accepts all dependencies explicitly. Then wrap it in a Playwright fixture or Cypress command.
+
+**Implementation**:
+
+```typescript
+// playwright/support/helpers/api-request.ts
+// Step 1: Pure function (ALWAYS FIRST!)
+type ApiRequestParams = {
+  request: APIRequestContext;
+  method: 'GET' | 'POST' | 'PUT' | 'DELETE';
+  url: string;
+  data?: unknown;
+  headers?: Record<string, string>;
+};
+
+export async function apiRequest({
+  request,
+  method,
+  url,
+  data,
+  headers = {}
+}: ApiRequestParams) {
+  const response = await request.fetch(url, {
+    method,
+    data,
+    headers: {
+      'Content-Type': 'application/json',
+      ...headers
+    }
+  });
+
+  if (!response.ok()) {
+    throw new Error(`API request failed: ${response.status()} ${await response.text()}`);
+  }
+
+  return response.json();
+}
+
+// Step 2: Fixture wrapper
+// playwright/support/fixtures/api-request-fixture.ts
+import { test as base } from '@playwright/test';
+import { apiRequest } from '../helpers/api-request';
+
+export const test = base.extend<{ apiRequest: typeof apiRequest }>({
+  apiRequest: async ({ request }, use) => {
+    // Inject framework dependency, expose pure function
+    await use((params) => apiRequest({ request, ...params }));
+  }
+});
+
+// Step 3: Package exports for reusability
+// package.json
+{
+  "exports": {
+    "./api-request": "./playwright/support/helpers/api-request.ts",
+    "./api-request/fixtures": "./playwright/support/fixtures/api-request-fixture.ts"
+  }
+}
+```
+
+**Key Points**:
+
+- Pure function is unit-testable without Playwright running
+- Framework dependency (`request`) injected at fixture boundary
+- Fixture exposes the pure function to test context
+- Package subpath exports enable `import { apiRequest } from 'my-fixtures/api-request'`
+
+### Example 2: Composable Fixture System with mergeTests
+
+**Context**: When building comprehensive test capabilities, compose multiple focused fixtures instead of creating monolithic helper classes. Each fixture provides one capability.
+
+**Implementation**:
+
+```typescript
+// playwright/support/fixtures/merged-fixtures.ts
+import { test as base, mergeTests } from '@playwright/test';
+import { test as apiRequestFixture } from './api-request-fixture';
+import { test as networkFixture } from './network-fixture';
+import { test as authFixture } from './auth-fixture';
+import { test as logFixture } from './log-fixture';
+
+// Compose all fixtures for comprehensive capabilities
+export const test = mergeTests(base, apiRequestFixture, networkFixture, authFixture, logFixture);
+
+export { expect } from '@playwright/test';
+
+// Example usage in tests:
+// import { test, expect } from './support/fixtures/merged-fixtures';
+//
+// test('user can create order', async ({ page, apiRequest, auth, network }) => {
+//   await auth.loginAs('customer@example.com');
+//   await network.interceptRoute('POST', '**/api/orders', { id: 123 });
+//   await page.goto('/checkout');
+//   await page.click('[data-testid="submit-order"]');
+//   await expect(page.getByText('Order #123')).toBeVisible();
+// });
+```
+
+**Individual Fixture Examples**:
+
+```typescript
+// network-fixture.ts
+export const test = base.extend({
+  network: async ({ page }, use) => {
+    const interceptedRoutes = new Map();
+
+    const interceptRoute = async (method: string, url: string, response: unknown) => {
+      await page.route(url, (route) => {
+        if (route.request().method() === method) {
+          route.fulfill({ body: JSON.stringify(response) });
+        }
+      });
+      interceptedRoutes.set(`${method}:${url}`, response);
+    };
+
+    await use({ interceptRoute });
+
+    // Cleanup
+    interceptedRoutes.clear();
+  },
+});
+
+// auth-fixture.ts
+export const test = base.extend({
+  auth: async ({ page, context }, use) => {
+    const loginAs = async (email: string) => {
+      // Use API to setup auth (fast!)
+      const token = await getAuthToken(email);
+      await context.addCookies([
+        {
+          name: 'auth_token',
+          value: token,
+          domain: 'localhost',
+          path: '/',
+        },
+      ]);
+    };
+
+    await use({ loginAs });
+  },
+});
+```
+
+**Key Points**:
+
+- `mergeTests` combines fixtures without inheritance
+- Each fixture has single responsibility (network, auth, logs)
+- Tests import merged fixture and access all capabilities
+- No coupling between fixtures—add/remove freely
+
+### Example 3: Framework-Agnostic HTTP Helper
+
+**Context**: When building HTTP helpers, keep them framework-agnostic. Accept all params explicitly so they work in unit tests, Playwright, Cypress, or any context.
+
+**Implementation**:
+
+```typescript
+// shared/helpers/http-helper.ts
+// Pure, framework-agnostic function
+type HttpHelperParams = {
+  baseUrl: string;
+  endpoint: string;
+  method: 'GET' | 'POST' | 'PUT' | 'DELETE';
+  body?: unknown;
+  headers?: Record<string, string>;
+  token?: string;
+};
+
+export async function makeHttpRequest({ baseUrl, endpoint, method, body, headers = {}, token }: HttpHelperParams): Promise<unknown> {
+  const url = `${baseUrl}${endpoint}`;
+  const requestHeaders = {
+    'Content-Type': 'application/json',
+    ...(token && { Authorization: `Bearer ${token}` }),
+    ...headers,
+  };
+
+  const response = await fetch(url, {
+    method,
+    headers: requestHeaders,
+    body: body ? JSON.stringify(body) : undefined,
+  });
+
+  if (!response.ok) {
+    const errorText = await response.text();
+    throw new Error(`HTTP ${method} ${url} failed: ${response.status} ${errorText}`);
+  }
+
+  return response.json();
+}
+
+// Playwright fixture wrapper
+// playwright/support/fixtures/http-fixture.ts
+import { test as base } from '@playwright/test';
+import { makeHttpRequest } from '../../shared/helpers/http-helper';
+
+export const test = base.extend({
+  httpHelper: async ({}, use) => {
+    const baseUrl = process.env.API_BASE_URL || 'http://localhost:3000';
+
+    await use((params) => makeHttpRequest({ baseUrl, ...params }));
+  },
+});
+
+// Cypress command wrapper
+// cypress/support/commands.ts
+import { makeHttpRequest } from '../../shared/helpers/http-helper';
+
+Cypress.Commands.add('apiRequest', (params) => {
+  const baseUrl = Cypress.env('API_BASE_URL') || 'http://localhost:3000';
+  return cy.wrap(makeHttpRequest({ baseUrl, ...params }));
+});
+```
+
+**Key Points**:
+
+- Pure function uses only standard `fetch`, no framework dependencies
+- Unit tests call `makeHttpRequest` directly with all params
+- Playwright and Cypress wrappers inject framework-specific config
+- Same logic runs everywhere—zero duplication
+
+### Example 4: Fixture Cleanup Pattern
+
+**Context**: When fixtures create resources (data, files, connections), ensure automatic cleanup in fixture teardown. Tests must not leak state.
+
+**Implementation**:
+
+```typescript
+// playwright/support/fixtures/database-fixture.ts
+import { test as base } from '@playwright/test';
+import { seedDatabase, deleteRecord } from '../helpers/db-helpers';
+
+type DatabaseFixture = {
+  seedUser: (userData: Partial<User>) => Promise<User>;
+  seedOrder: (orderData: Partial<Order>) => Promise<Order>;
+};
+
+export const test = base.extend<DatabaseFixture>({
+  seedUser: async ({}, use) => {
+    const createdUsers: string[] = [];
+
+    const seedUser = async (userData: Partial<User>) => {
+      const user = await seedDatabase('users', userData);
+      createdUsers.push(user.id);
+      return user;
+    };
+
+    await use(seedUser);
+
+    // Auto-cleanup: Delete all users created during test
+    for (const userId of createdUsers) {
+      await deleteRecord('users', userId);
+    }
+    createdUsers.length = 0;
+  },
+
+  seedOrder: async ({}, use) => {
+    const createdOrders: string[] = [];
+
+    const seedOrder = async (orderData: Partial<Order>) => {
+      const order = await seedDatabase('orders', orderData);
+      createdOrders.push(order.id);
+      return order;
+    };
+
+    await use(seedOrder);
+
+    // Auto-cleanup: Delete all orders
+    for (const orderId of createdOrders) {
+      await deleteRecord('orders', orderId);
+    }
+    createdOrders.length = 0;
+  },
+});
+
+// Example usage:
+// test('user can place order', async ({ seedUser, seedOrder, page }) => {
+//   const user = await seedUser({ email: 'test@example.com' });
+//   const order = await seedOrder({ userId: user.id, total: 100 });
+//
+//   await page.goto(`/orders/${order.id}`);
+//   await expect(page.getByText('Order Total: $100')).toBeVisible();
+//
+//   // No manual cleanup needed—fixture handles it automatically
+// });
+```
+
+**Key Points**:
+
+- Track all created resources in array during test execution
+- Teardown (after `use()`) deletes all tracked resources
+- Tests don't manually clean up—happens automatically
+- Prevents test pollution and flakiness from shared state
+
+### Anti-Pattern: Inheritance-Based Page Objects
+
+**Problem**:
+
+```typescript
+// ❌ BAD: Page Object Model with inheritance
+class BasePage {
+  constructor(public page: Page) {}
+
+  async navigate(url: string) {
+    await this.page.goto(url);
+  }
+
+  async clickButton(selector: string) {
+    await this.page.click(selector);
+  }
+}
+
+class LoginPage extends BasePage {
+  async login(email: string, password: string) {
+    await this.navigate('/login');
+    await this.page.fill('#email', email);
+    await this.page.fill('#password', password);
+    await this.clickButton('#submit');
+  }
+}
+
+class AdminPage extends LoginPage {
+  async accessAdminPanel() {
+    await this.login('admin@example.com', 'admin123');
+    await this.navigate('/admin');
+  }
+}
+```
+
+**Why It Fails**:
+
+- Changes to `BasePage` break all descendants (`LoginPage`, `AdminPage`)
+- `AdminPage` inherits unnecessary `login` details—tight coupling
+- Cannot compose capabilities (e.g., admin + reporting features require multiple inheritance)
+- Hard to test `BasePage` methods in isolation
+- Hidden state in class instances leads to unpredictable behavior
+
+**Better Approach**: Use pure functions + fixtures
+
+```typescript
+// ✅ GOOD: Pure functions with fixture composition
+// helpers/navigation.ts
+export async function navigate(page: Page, url: string) {
+  await page.goto(url);
+}
+
+// helpers/auth.ts
+export async function login(page: Page, email: string, password: string) {
+  await page.fill('[data-testid="email"]', email);
+  await page.fill('[data-testid="password"]', password);
+  await page.click('[data-testid="submit"]');
+}
+
+// fixtures/admin-fixture.ts
+export const test = base.extend({
+  adminPage: async ({ page }, use) => {
+    await login(page, 'admin@example.com', 'admin123');
+    await navigate(page, '/admin');
+    await use(page);
+  },
+});
+
+// Tests import exactly what they need—no inheritance
+```
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (test generation), `*automate` (test expansion), `*framework` (initial setup)
+- **Related fragments**:
+  - `data-factories.md` - Factory functions for test data
+  - `network-first.md` - Network interception patterns
+  - `test-quality.md` - Deterministic test design principles
+
+## Helper Function Reuse Guidelines
+
+When deciding whether to create a fixture, follow these rules:
+
+- **3+ uses** → Create fixture with subpath export (shared across tests/projects)
+- **2-3 uses** → Create utility module (shared within project)
+- **1 use** → Keep inline (avoid premature abstraction)
+- **Complex logic** → Factory function pattern (dynamic data generation)
+
+_Source: Murat Testing Philosophy (lines 74-122), SEON production patterns, Playwright fixture docs._
--- a/src/modules/bmm/testarch/knowledge/network-first.md
+++ b/src/modules/bmm/testarch/knowledge/network-first.md
@@ -1,9 +1,486 @@
 # Network-First Safeguards

- Register interceptions before any navigation or user action; store the promise and await it immediately after the triggering step.
- Assert on structured responses (status, body schema, headers) instead of generic waits so failures surface with actionable context.
- Capture HAR files or Playwright traces on successful runs—reuse them for deterministic CI playback when upstream services flake.
- Prefer edge mocking: stub at service boundaries, never deep within the stack unless risk analysis demands it.
- Replace implicit waits with deterministic signals like `waitForResponse`, disappearance of spinners, or event hooks.
+## Principle

-_Source: Murat Testing Philosophy, Playwright patterns book, blog on network interception._
+Register network interceptions **before** any navigation or user action. Store the interception promise and await it immediately after the triggering step. Replace implicit waits with deterministic signals based on network responses, spinner disappearance, or event hooks.
+
+## Rationale
+
+The most common source of flaky E2E tests is **race conditions** between navigation and network interception:
+
+- Navigate then intercept = missed requests (too late)
+- No explicit wait = assertion runs before response arrives
+- Hard waits (`waitForTimeout(3000)`) = slow, unreliable, brittle
+
+Network-first patterns provide:
+
+- **Zero race conditions**: Intercept is active before triggering action
+- **Deterministic waits**: Wait for actual response, not arbitrary timeouts
+- **Actionable failures**: Assert on response status/body, not generic "element not found"
+- **Speed**: No padding with extra wait time
+
+## Pattern Examples
+
+### Example 1: Intercept Before Navigate Pattern
+
+**Context**: The foundational pattern for all E2E tests. Always register route interception **before** the action that triggers the request (navigation, click, form submit).
+
+**Implementation**:
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigate
+test('user can view dashboard data', async ({ page }) => {
+  // Step 1: Register interception FIRST
+  const usersPromise = page.waitForResponse((resp) => resp.url().includes('/api/users') && resp.status() === 200);
+
+  // Step 2: THEN trigger the request
+  await page.goto('/dashboard');
+
+  // Step 3: THEN await the response
+  const usersResponse = await usersPromise;
+  const users = await usersResponse.json();
+
+  // Step 4: Assert on structured data
+  expect(users).toHaveLength(10);
+  await expect(page.getByText(users[0].name)).toBeVisible();
+});
+
+// Cypress equivalent
+describe('Dashboard', () => {
+  it('should display users', () => {
+    // Step 1: Register interception FIRST
+    cy.intercept('GET', '**/api/users').as('getUsers');
+
+    // Step 2: THEN trigger
+    cy.visit('/dashboard');
+
+    // Step 3: THEN await
+    cy.wait('@getUsers').then((interception) => {
+      // Step 4: Assert on structured data
+      expect(interception.response.statusCode).to.equal(200);
+      expect(interception.response.body).to.have.length(10);
+      cy.contains(interception.response.body[0].name).should('be.visible');
+    });
+  });
+});
+
+// ❌ WRONG: Navigate BEFORE intercept (race condition!)
+test('flaky test example', async ({ page }) => {
+  await page.goto('/dashboard'); // Request fires immediately
+
+  const usersPromise = page.waitForResponse('/api/users'); // TOO LATE - might miss it
+  const response = await usersPromise; // May timeout randomly
+});
+```
+
+**Key Points**:
+
+- Playwright: Use `page.waitForResponse()` with URL pattern or predicate **before** `page.goto()` or `page.click()`
+- Cypress: Use `cy.intercept().as()` **before** `cy.visit()` or `cy.click()`
+- Store promise/alias, trigger action, **then** await response
+- This prevents 95% of race-condition flakiness in E2E tests
+
+### Example 2: HAR Capture for Debugging
+
+**Context**: When debugging flaky tests or building deterministic mocks, capture real network traffic with HAR files. Replay them in tests for consistent, offline-capable test runs.
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts - Enable HAR recording
+export default defineConfig({
+  use: {
+    // Record HAR on first run
+    recordHar: { path: './hars/', mode: 'minimal' },
+    // Or replay HAR in tests
+    // serviceWorkers: 'block',
+  },
+});
+
+// Capture HAR for specific test
+test('capture network for order flow', async ({ page, context }) => {
+  // Start recording
+  await context.routeFromHAR('./hars/order-flow.har', {
+    url: '**/api/**',
+    update: true, // Update HAR with new requests
+  });
+
+  await page.goto('/checkout');
+  await page.fill('[data-testid="credit-card"]', '4111111111111111');
+  await page.click('[data-testid="submit-order"]');
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+
+  // HAR saved to ./hars/order-flow.har
+});
+
+// Replay HAR for deterministic tests (no real API needed)
+test('replay order flow from HAR', async ({ page, context }) => {
+  // Replay captured HAR
+  await context.routeFromHAR('./hars/order-flow.har', {
+    url: '**/api/**',
+    update: false, // Read-only mode
+  });
+
+  // Test runs with exact recorded responses - fully deterministic
+  await page.goto('/checkout');
+  await page.fill('[data-testid="credit-card"]', '4111111111111111');
+  await page.click('[data-testid="submit-order"]');
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+});
+
+// Custom mock based on HAR insights
+test('mock order response based on HAR', async ({ page }) => {
+  // After analyzing HAR, create focused mock
+  await page.route('**/api/orders', (route) =>
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      body: JSON.stringify({
+        orderId: '12345',
+        status: 'confirmed',
+        total: 99.99,
+      }),
+    }),
+  );
+
+  await page.goto('/checkout');
+  await page.click('[data-testid="submit-order"]');
+  await expect(page.getByText('Order #12345')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- HAR files capture real request/response pairs for analysis
+- `update: true` records new traffic; `update: false` replays existing
+- Replay mode makes tests fully deterministic (no upstream API needed)
+- Use HAR to understand API contracts, then create focused mocks
+
+### Example 3: Network Stub with Edge Cases
+
+**Context**: When testing error handling, timeouts, and edge cases, stub network responses to simulate failures. Test both happy path and error scenarios.
+
+**Implementation**:
+
+```typescript
+// Test happy path
+test('order succeeds with valid data', async ({ page }) => {
+  await page.route('**/api/orders', (route) =>
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      body: JSON.stringify({ orderId: '123', status: 'confirmed' }),
+    }),
+  );
+
+  await page.goto('/checkout');
+  await page.click('[data-testid="submit-order"]');
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+});
+
+// Test 500 error
+test('order fails with server error', async ({ page }) => {
+  // Listen for console errors (app should log gracefully)
+  const consoleErrors: string[] = [];
+  page.on('console', (msg) => {
+    if (msg.type() === 'error') consoleErrors.push(msg.text());
+  });
+
+  // Stub 500 error
+  await page.route('**/api/orders', (route) =>
+    route.fulfill({
+      status: 500,
+      contentType: 'application/json',
+      body: JSON.stringify({ error: 'Internal Server Error' }),
+    }),
+  );
+
+  await page.goto('/checkout');
+  await page.click('[data-testid="submit-order"]');
+
+  // Assert UI shows error gracefully
+  await expect(page.getByText('Something went wrong')).toBeVisible();
+  await expect(page.getByText('Please try again')).toBeVisible();
+
+  // Verify error logged (not thrown)
+  expect(consoleErrors.some((e) => e.includes('Order failed'))).toBeTruthy();
+});
+
+// Test network timeout
+test('order times out after 10 seconds', async ({ page }) => {
+  // Stub delayed response (never resolves within timeout)
+  await page.route(
+    '**/api/orders',
+    (route) => new Promise(() => {}), // Never resolves - simulates timeout
+  );
+
+  await page.goto('/checkout');
+  await page.click('[data-testid="submit-order"]');
+
+  // App should show timeout message after configured timeout
+  await expect(page.getByText('Request timed out')).toBeVisible({ timeout: 15000 });
+});
+
+// Test partial data response
+test('order handles missing optional fields', async ({ page }) => {
+  await page.route('**/api/orders', (route) =>
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      // Missing optional fields like 'trackingNumber', 'estimatedDelivery'
+      body: JSON.stringify({ orderId: '123', status: 'confirmed' }),
+    }),
+  );
+
+  await page.goto('/checkout');
+  await page.click('[data-testid="submit-order"]');
+
+  // App should handle gracefully - no crash, shows what's available
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+  await expect(page.getByText('Tracking information pending')).toBeVisible();
+});
+
+// Cypress equivalents
+describe('Order Edge Cases', () => {
+  it('should handle 500 error', () => {
+    cy.intercept('POST', '**/api/orders', {
+      statusCode: 500,
+      body: { error: 'Internal Server Error' },
+    }).as('orderFailed');
+
+    cy.visit('/checkout');
+    cy.get('[data-testid="submit-order"]').click();
+    cy.wait('@orderFailed');
+    cy.contains('Something went wrong').should('be.visible');
+  });
+
+  it('should handle timeout', () => {
+    cy.intercept('POST', '**/api/orders', (req) => {
+      req.reply({ delay: 20000 }); // Delay beyond app timeout
+    }).as('orderTimeout');
+
+    cy.visit('/checkout');
+    cy.get('[data-testid="submit-order"]').click();
+    cy.contains('Request timed out', { timeout: 15000 }).should('be.visible');
+  });
+});
+```
+
+**Key Points**:
+
+- Stub different HTTP status codes (200, 400, 500, 503)
+- Simulate timeouts with `delay` or non-resolving promises
+- Test partial/incomplete data responses
+- Verify app handles errors gracefully (no crashes, user-friendly messages)
+
+### Example 4: Deterministic Waiting
+
+**Context**: Never use hard waits (`waitForTimeout(3000)`). Always wait for explicit signals: network responses, element state changes, or custom events.
+
+**Implementation**:
+
+```typescript
+// ✅ GOOD: Wait for response with predicate
+test('wait for specific response', async ({ page }) => {
+  const responsePromise = page.waitForResponse((resp) => resp.url().includes('/api/users') && resp.status() === 200);
+
+  await page.goto('/dashboard');
+  const response = await responsePromise;
+
+  expect(response.status()).toBe(200);
+  await expect(page.getByText('Dashboard')).toBeVisible();
+});
+
+// ✅ GOOD: Wait for multiple responses
+test('wait for all required data', async ({ page }) => {
+  const usersPromise = page.waitForResponse('**/api/users');
+  const productsPromise = page.waitForResponse('**/api/products');
+  const ordersPromise = page.waitForResponse('**/api/orders');
+
+  await page.goto('/dashboard');
+
+  // Wait for all in parallel
+  const [users, products, orders] = await Promise.all([usersPromise, productsPromise, ordersPromise]);
+
+  expect(users.status()).toBe(200);
+  expect(products.status()).toBe(200);
+  expect(orders.status()).toBe(200);
+});
+
+// ✅ GOOD: Wait for spinner to disappear
+test('wait for loading indicator', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // Wait for spinner to disappear (signals data loaded)
+  await expect(page.getByTestId('loading-spinner')).not.toBeVisible();
+  await expect(page.getByText('Dashboard')).toBeVisible();
+});
+
+// ✅ GOOD: Wait for custom event (advanced)
+test('wait for custom ready event', async ({ page }) => {
+  let appReady = false;
+  page.on('console', (msg) => {
+    if (msg.text() === 'App ready') appReady = true;
+  });
+
+  await page.goto('/dashboard');
+
+  // Poll until custom condition met
+  await page.waitForFunction(() => appReady, { timeout: 10000 });
+
+  await expect(page.getByText('Dashboard')).toBeVisible();
+});
+
+// ❌ BAD: Hard wait (arbitrary timeout)
+test('flaky hard wait example', async ({ page }) => {
+  await page.goto('/dashboard');
+  await page.waitForTimeout(3000); // WHY 3 seconds? What if slower? What if faster?
+  await expect(page.getByText('Dashboard')).toBeVisible(); // May fail if >3s
+});
+
+// Cypress equivalents
+describe('Deterministic Waiting', () => {
+  it('should wait for response', () => {
+    cy.intercept('GET', '**/api/users').as('getUsers');
+    cy.visit('/dashboard');
+    cy.wait('@getUsers').its('response.statusCode').should('eq', 200);
+    cy.contains('Dashboard').should('be.visible');
+  });
+
+  it('should wait for spinner to disappear', () => {
+    cy.visit('/dashboard');
+    cy.get('[data-testid="loading-spinner"]').should('not.exist');
+    cy.contains('Dashboard').should('be.visible');
+  });
+
+  // ❌ BAD: Hard wait
+  it('flaky hard wait', () => {
+    cy.visit('/dashboard');
+    cy.wait(3000); // NEVER DO THIS
+    cy.contains('Dashboard').should('be.visible');
+  });
+});
+```
+
+**Key Points**:
+
+- `waitForResponse()` with URL pattern or predicate = deterministic
+- `waitForLoadState('networkidle')` = wait for all network activity to finish
+- Wait for element state changes (spinner disappears, button enabled)
+- **NEVER** use `waitForTimeout()` or `cy.wait(ms)` - always non-deterministic
+
+### Example 5: Anti-Pattern - Navigate Then Mock
+
+**Problem**:
+
+```typescript
+// ❌ BAD: Race condition - mock registered AFTER navigation starts
+test('flaky test - navigate then mock', async ({ page }) => {
+  // Navigation starts immediately
+  await page.goto('/dashboard'); // Request to /api/users fires NOW
+
+  // Mock registered too late - request already sent
+  await page.route('**/api/users', (route) =>
+    route.fulfill({
+      status: 200,
+      body: JSON.stringify([{ id: 1, name: 'Test User' }]),
+    }),
+  );
+
+  // Test randomly passes/fails depending on timing
+  await expect(page.getByText('Test User')).toBeVisible(); // Flaky!
+});
+
+// ❌ BAD: No wait for response
+test('flaky test - no explicit wait', async ({ page }) => {
+  await page.route('**/api/users', (route) => route.fulfill({ status: 200, body: JSON.stringify([]) }));
+
+  await page.goto('/dashboard');
+
+  // Assertion runs immediately - may fail if response slow
+  await expect(page.getByText('No users found')).toBeVisible(); // Flaky!
+});
+
+// ❌ BAD: Generic timeout
+test('flaky test - hard wait', async ({ page }) => {
+  await page.goto('/dashboard');
+  await page.waitForTimeout(2000); // Arbitrary wait - brittle
+
+  await expect(page.getByText('Dashboard')).toBeVisible();
+});
+```
+
+**Why It Fails**:
+
+- **Mock after navigate**: Request fires during navigation, mock isn't active yet (race condition)
+- **No explicit wait**: Assertion runs before response arrives (timing-dependent)
+- **Hard waits**: Slow tests, brittle (fails if < timeout, wastes time if > timeout)
+- **Non-deterministic**: Passes locally, fails in CI (different speeds)
+
+**Better Approach**: Always intercept → trigger → await
+
+```typescript
+// ✅ GOOD: Intercept BEFORE navigate
+test('deterministic test', async ({ page }) => {
+  // Step 1: Register mock FIRST
+  await page.route('**/api/users', (route) =>
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      body: JSON.stringify([{ id: 1, name: 'Test User' }]),
+    }),
+  );
+
+  // Step 2: Store response promise BEFORE trigger
+  const responsePromise = page.waitForResponse('**/api/users');
+
+  // Step 3: THEN trigger
+  await page.goto('/dashboard');
+
+  // Step 4: THEN await response
+  await responsePromise;
+
+  // Step 5: THEN assert (data is guaranteed loaded)
+  await expect(page.getByText('Test User')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Order matters: Mock → Promise → Trigger → Await → Assert
+- No race conditions: Mock is active before request fires
+- Explicit wait: Response promise ensures data loaded
+- Deterministic: Always passes if app works correctly
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (test generation), `*automate` (test expansion), `*framework` (network setup)
+- **Related fragments**:
+  - `fixture-architecture.md` - Network fixture patterns
+  - `data-factories.md` - API-first setup with network
+  - `test-quality.md` - Deterministic test principles
+
+## Debugging Network Issues
+
+When network tests fail, check:
+
+1. **Timing**: Is interception registered **before** action?
+2. **URL pattern**: Does pattern match actual request URL?
+3. **Response format**: Is mocked response valid JSON/format?
+4. **Status code**: Is app checking for 200 vs 201 vs 204?
+5. **HAR file**: Capture real traffic to understand actual API contract
+
+```typescript
+// Debug network issues with logging
+test('debug network', async ({ page }) => {
+  // Log all requests
+  page.on('request', (req) => console.log('→', req.method(), req.url()));
+
+  // Log all responses
+  page.on('response', (resp) => console.log('←', resp.status(), resp.url()));
+
+  await page.goto('/dashboard');
+});
+```
+
+_Source: Murat Testing Philosophy (lines 94-137), Playwright network patterns, Cypress intercept best practices._
--- a/src/modules/bmm/testarch/knowledge/nfr-criteria.md
+++ b/src/modules/bmm/testarch/knowledge/nfr-criteria.md
@@ -1,21 +1,670 @@
-# Non-Functional Review Criteria
+# Non-Functional Requirements (NFR) Criteria

- **Security**
-  - PASS: auth/authz, secret handling, and threat mitigations in place.
-  - CONCERNS: minor gaps with clear owners.
-  - FAIL: critical exposure or missing controls.
- **Performance**
-  - PASS: metrics meet targets with profiling evidence.
-  - CONCERNS: trending toward limits or missing baselines.
-  - FAIL: breaches SLO/SLA or introduces resource leaks.
- **Reliability**
-  - PASS: error handling, retries, health checks verified.
-  - CONCERNS: partial coverage or missing telemetry.
-  - FAIL: no recovery path or crash scenarios unresolved.
- **Maintainability**
-  - PASS: clean code, tests, and documentation shipped together.
-  - CONCERNS: duplication, low coverage, or unclear ownership.
-  - FAIL: absent tests, tangled implementations, or no observability.
- Default to CONCERNS when targets or evidence are undefined—force the team to clarify before sign-off.
+## Principle

-_Source: Murat NFR assessment guidance._
+Non-functional requirements (security, performance, reliability, maintainability) are **validated through automated tests**, not checklists. NFR assessment uses objective pass/fail criteria tied to measurable thresholds. Ambiguous requirements default to CONCERNS until clarified.
+
+## Rationale
+
+**The Problem**: Teams ship features that "work" functionally but fail under load, expose security vulnerabilities, or lack error recovery. NFRs are treated as optional "nice-to-haves" instead of release blockers.
+
+**The Solution**: Define explicit NFR criteria with automated validation. Security tests verify auth/authz and secret handling. Performance tests enforce SLO/SLA thresholds with profiling evidence. Reliability tests validate error handling, retries, and health checks. Maintainability is measured by test coverage, code duplication, and observability.
+
+**Why This Matters**:
+
+- Prevents production incidents (security breaches, performance degradation, cascading failures)
+- Provides objective release criteria (no subjective "feels fast enough")
+- Automates compliance validation (audit trail for regulated environments)
+- Forces clarity on ambiguous requirements (default to CONCERNS)
+
+## Pattern Examples
+
+### Example 1: Security NFR Validation (Auth, Secrets, OWASP)
+
+**Context**: Automated security tests enforcing authentication, authorization, and secret handling
+
+**Implementation**:
+
+```typescript
+// tests/nfr/security.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Security NFR: Authentication & Authorization', () => {
+  test('unauthenticated users cannot access protected routes', async ({ page }) => {
+    // Attempt to access dashboard without auth
+    await page.goto('/dashboard');
+
+    // Should redirect to login (not expose data)
+    await expect(page).toHaveURL(/\/login/);
+    await expect(page.getByText('Please sign in')).toBeVisible();
+
+    // Verify no sensitive data leaked in response
+    const pageContent = await page.content();
+    expect(pageContent).not.toContain('user_id');
+    expect(pageContent).not.toContain('api_key');
+  });
+
+  test('JWT tokens expire after 15 minutes', async ({ page, request }) => {
+    // Login and capture token
+    await page.goto('/login');
+    await page.getByLabel('Email').fill('test@example.com');
+    await page.getByLabel('Password').fill('ValidPass123!');
+    await page.getByRole('button', { name: 'Sign In' }).click();
+
+    const token = await page.evaluate(() => localStorage.getItem('auth_token'));
+    expect(token).toBeTruthy();
+
+    // Wait 16 minutes (use mock clock in real tests)
+    await page.clock.fastForward('00:16:00');
+
+    // Token should be expired, API call should fail
+    const response = await request.get('/api/user/profile', {
+      headers: { Authorization: `Bearer ${token}` },
+    });
+
+    expect(response.status()).toBe(401);
+    const body = await response.json();
+    expect(body.error).toContain('expired');
+  });
+
+  test('passwords are never logged or exposed in errors', async ({ page }) => {
+    // Trigger login error
+    await page.goto('/login');
+    await page.getByLabel('Email').fill('test@example.com');
+    await page.getByLabel('Password').fill('WrongPassword123!');
+
+    // Monitor console for password leaks
+    const consoleLogs: string[] = [];
+    page.on('console', (msg) => consoleLogs.push(msg.text()));
+
+    await page.getByRole('button', { name: 'Sign In' }).click();
+
+    // Error shown to user (generic message)
+    await expect(page.getByText('Invalid credentials')).toBeVisible();
+
+    // Verify password NEVER appears in console, DOM, or network
+    const pageContent = await page.content();
+    expect(pageContent).not.toContain('WrongPassword123!');
+    expect(consoleLogs.join('\n')).not.toContain('WrongPassword123!');
+  });
+
+  test('RBAC: users can only access resources they own', async ({ page, request }) => {
+    // Login as User A
+    const userAToken = await login(request, 'userA@example.com', 'password');
+
+    // Try to access User B's order
+    const response = await request.get('/api/orders/user-b-order-id', {
+      headers: { Authorization: `Bearer ${userAToken}` },
+    });
+
+    expect(response.status()).toBe(403); // Forbidden
+    const body = await response.json();
+    expect(body.error).toContain('insufficient permissions');
+  });
+
+  test('SQL injection attempts are blocked', async ({ page }) => {
+    await page.goto('/search');
+
+    // Attempt SQL injection
+    await page.getByPlaceholder('Search products').fill("'; DROP TABLE users; --");
+    await page.getByRole('button', { name: 'Search' }).click();
+
+    // Should return empty results, NOT crash or expose error
+    await expect(page.getByText('No results found')).toBeVisible();
+
+    // Verify app still works (table not dropped)
+    await page.goto('/dashboard');
+    await expect(page.getByText('Welcome')).toBeVisible();
+  });
+
+  test('XSS attempts are sanitized', async ({ page }) => {
+    await page.goto('/profile/edit');
+
+    // Attempt XSS injection
+    const xssPayload = '<script>alert("XSS")</script>';
+    await page.getByLabel('Bio').fill(xssPayload);
+    await page.getByRole('button', { name: 'Save' }).click();
+
+    // Reload and verify XSS is escaped (not executed)
+    await page.reload();
+    const bio = await page.getByTestId('user-bio').textContent();
+
+    // Text should be escaped, script should NOT execute
+    expect(bio).toContain('&lt;script&gt;');
+    expect(bio).not.toContain('<script>');
+  });
+});
+
+// Helper
+async function login(request: any, email: string, password: string): Promise<string> {
+  const response = await request.post('/api/auth/login', {
+    data: { email, password },
+  });
+  const body = await response.json();
+  return body.token;
+}
+```
+
+**Key Points**:
+
+- Authentication: Unauthenticated access redirected (not exposed)
+- Authorization: RBAC enforced (403 for insufficient permissions)
+- Token expiry: JWT expires after 15 minutes (automated validation)
+- Secret handling: Passwords never logged or exposed in errors
+- OWASP Top 10: SQL injection and XSS blocked (input sanitization)
+
+**Security NFR Criteria**:
+
+- ✅ PASS: All 6 tests green (auth, authz, token expiry, secret handling, SQL injection, XSS)
+- ⚠️ CONCERNS: 1-2 tests failing with mitigation plan and owner assigned
+- ❌ FAIL: Critical exposure (unauthenticated access, password leak, SQL injection succeeds)
+
+---
+
+### Example 2: Performance NFR Validation (k6 Load Testing for SLO/SLA)
+
+**Context**: Use k6 for load testing, stress testing, and SLO/SLA enforcement (NOT Playwright)
+
+**Implementation**:
+
+```javascript
+// tests/nfr/performance.k6.js
+import http from 'k6/http';
+import { check, sleep } from 'k6';
+import { Rate, Trend } from 'k6/metrics';
+
+// Custom metrics
+const errorRate = new Rate('errors');
+const apiDuration = new Trend('api_duration');
+
+// Performance thresholds (SLO/SLA)
+export const options = {
+  stages: [
+    { duration: '1m', target: 50 }, // Ramp up to 50 users
+    { duration: '3m', target: 50 }, // Stay at 50 users for 3 minutes
+    { duration: '1m', target: 100 }, // Spike to 100 users
+    { duration: '3m', target: 100 }, // Stay at 100 users
+    { duration: '1m', target: 0 }, // Ramp down
+  ],
+  thresholds: {
+    // SLO: 95% of requests must complete in <500ms
+    http_req_duration: ['p(95)<500'],
+    // SLO: Error rate must be <1%
+    errors: ['rate<0.01'],
+    // SLA: API endpoints must respond in <1s (99th percentile)
+    api_duration: ['p(99)<1000'],
+  },
+};
+
+export default function () {
+  // Test 1: Homepage load performance
+  const homepageResponse = http.get(`${__ENV.BASE_URL}/`);
+  check(homepageResponse, {
+    'homepage status is 200': (r) => r.status === 200,
+    'homepage loads in <2s': (r) => r.timings.duration < 2000,
+  });
+  errorRate.add(homepageResponse.status !== 200);
+
+  // Test 2: API endpoint performance
+  const apiResponse = http.get(`${__ENV.BASE_URL}/api/products?limit=10`, {
+    headers: { Authorization: `Bearer ${__ENV.API_TOKEN}` },
+  });
+  check(apiResponse, {
+    'API status is 200': (r) => r.status === 200,
+    'API responds in <500ms': (r) => r.timings.duration < 500,
+  });
+  apiDuration.add(apiResponse.timings.duration);
+  errorRate.add(apiResponse.status !== 200);
+
+  // Test 3: Search endpoint under load
+  const searchResponse = http.get(`${__ENV.BASE_URL}/api/search?q=laptop&limit=100`);
+  check(searchResponse, {
+    'search status is 200': (r) => r.status === 200,
+    'search responds in <1s': (r) => r.timings.duration < 1000,
+    'search returns results': (r) => JSON.parse(r.body).results.length > 0,
+  });
+  errorRate.add(searchResponse.status !== 200);
+
+  sleep(1); // Realistic user think time
+}
+
+// Threshold validation (run after test)
+export function handleSummary(data) {
+  const p95Duration = data.metrics.http_req_duration.values['p(95)'];
+  const p99ApiDuration = data.metrics.api_duration.values['p(99)'];
+  const errorRateValue = data.metrics.errors.values.rate;
+
+  console.log(`P95 request duration: ${p95Duration.toFixed(2)}ms`);
+  console.log(`P99 API duration: ${p99ApiDuration.toFixed(2)}ms`);
+  console.log(`Error rate: ${(errorRateValue * 100).toFixed(2)}%`);
+
+  return {
+    'summary.json': JSON.stringify(data),
+    stdout: `
+Performance NFR Results:
+- P95 request duration: ${p95Duration < 500 ? '✅ PASS' : '❌ FAIL'} (${p95Duration.toFixed(2)}ms / 500ms threshold)
+- P99 API duration: ${p99ApiDuration < 1000 ? '✅ PASS' : '❌ FAIL'} (${p99ApiDuration.toFixed(2)}ms / 1000ms threshold)
+- Error rate: ${errorRateValue < 0.01 ? '✅ PASS' : '❌ FAIL'} (${(errorRateValue * 100).toFixed(2)}% / 1% threshold)
+    `,
+  };
+}
+```
+
+**Run k6 tests:**
+
+```bash
+# Local smoke test (10 VUs, 30s)
+k6 run --vus 10 --duration 30s tests/nfr/performance.k6.js
+
+# Full load test (stages defined in script)
+k6 run tests/nfr/performance.k6.js
+
+# CI integration with thresholds
+k6 run --out json=performance-results.json tests/nfr/performance.k6.js
+```
+
+**Key Points**:
+
+- **k6 is the right tool** for load testing (NOT Playwright)
+- SLO/SLA thresholds enforced automatically (`p(95)<500`, `rate<0.01`)
+- Realistic load simulation (ramp up, sustained load, spike testing)
+- Comprehensive metrics (p50, p95, p99, error rate, throughput)
+- CI-friendly (JSON output, exit codes based on thresholds)
+
+**Performance NFR Criteria**:
+
+- ✅ PASS: All SLO/SLA targets met with k6 profiling evidence (p95 < 500ms, error rate < 1%)
+- ⚠️ CONCERNS: Trending toward limits (e.g., p95 = 480ms approaching 500ms) or missing baselines
+- ❌ FAIL: SLO/SLA breached (e.g., p95 > 500ms) or error rate > 1%
+
+**Performance Testing Levels (from Test Architect course):**
+
+- **Load testing**: System behavior under expected load
+- **Stress testing**: System behavior under extreme load (breaking point)
+- **Spike testing**: Sudden load increases (traffic spikes)
+- **Endurance/Soak testing**: System behavior under sustained load (memory leaks, resource exhaustion)
+- **Benchmarking**: Baseline measurements for comparison
+
+**Note**: Playwright can validate **perceived performance** (Core Web Vitals via Lighthouse), but k6 validates **system performance** (throughput, latency, resource limits under load)
+
+---
+
+### Example 3: Reliability NFR Validation (Playwright for UI Resilience)
+
+**Context**: Automated reliability tests validating graceful degradation and recovery paths
+
+**Implementation**:
+
+```typescript
+// tests/nfr/reliability.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Reliability NFR: Error Handling & Recovery', () => {
+  test('app remains functional when API returns 500 error', async ({ page, context }) => {
+    // Mock API failure
+    await context.route('**/api/products', (route) => {
+      route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
+    });
+
+    await page.goto('/products');
+
+    // User sees error message (not blank page or crash)
+    await expect(page.getByText('Unable to load products. Please try again.')).toBeVisible();
+    await expect(page.getByRole('button', { name: 'Retry' })).toBeVisible();
+
+    // App navigation still works (graceful degradation)
+    await page.getByRole('link', { name: 'Home' }).click();
+    await expect(page).toHaveURL('/');
+  });
+
+  test('API client retries on transient failures (3 attempts)', async ({ page, context }) => {
+    let attemptCount = 0;
+
+    await context.route('**/api/checkout', (route) => {
+      attemptCount++;
+
+      // Fail first 2 attempts, succeed on 3rd
+      if (attemptCount < 3) {
+        route.fulfill({ status: 503, body: JSON.stringify({ error: 'Service Unavailable' }) });
+      } else {
+        route.fulfill({ status: 200, body: JSON.stringify({ orderId: '12345' }) });
+      }
+    });
+
+    await page.goto('/checkout');
+    await page.getByRole('button', { name: 'Place Order' }).click();
+
+    // Should succeed after 3 attempts
+    await expect(page.getByText('Order placed successfully')).toBeVisible();
+    expect(attemptCount).toBe(3);
+  });
+
+  test('app handles network disconnection gracefully', async ({ page, context }) => {
+    await page.goto('/dashboard');
+
+    // Simulate offline mode
+    await context.setOffline(true);
+
+    // Trigger action requiring network
+    await page.getByRole('button', { name: 'Refresh Data' }).click();
+
+    // User sees offline indicator (not crash)
+    await expect(page.getByText('You are offline. Changes will sync when reconnected.')).toBeVisible();
+
+    // Reconnect
+    await context.setOffline(false);
+    await page.getByRole('button', { name: 'Refresh Data' }).click();
+
+    // Data loads successfully
+    await expect(page.getByText('Data updated')).toBeVisible();
+  });
+
+  test('health check endpoint returns service status', async ({ request }) => {
+    const response = await request.get('/api/health');
+
+    expect(response.status()).toBe(200);
+
+    const health = await response.json();
+    expect(health).toHaveProperty('status', 'healthy');
+    expect(health).toHaveProperty('timestamp');
+    expect(health).toHaveProperty('services');
+
+    // Verify critical services are monitored
+    expect(health.services).toHaveProperty('database');
+    expect(health.services).toHaveProperty('cache');
+    expect(health.services).toHaveProperty('queue');
+
+    // All services should be UP
+    expect(health.services.database.status).toBe('UP');
+    expect(health.services.cache.status).toBe('UP');
+    expect(health.services.queue.status).toBe('UP');
+  });
+
+  test('circuit breaker opens after 5 consecutive failures', async ({ page, context }) => {
+    let failureCount = 0;
+
+    await context.route('**/api/recommendations', (route) => {
+      failureCount++;
+      route.fulfill({ status: 500, body: JSON.stringify({ error: 'Service Error' }) });
+    });
+
+    await page.goto('/product/123');
+
+    // Wait for circuit breaker to open (fallback UI appears)
+    await expect(page.getByText('Recommendations temporarily unavailable')).toBeVisible({ timeout: 10000 });
+
+    // Verify circuit breaker stopped making requests after threshold (should be ≤5)
+    expect(failureCount).toBeLessThanOrEqual(5);
+  });
+
+  test('rate limiting gracefully handles 429 responses', async ({ page, context }) => {
+    let requestCount = 0;
+
+    await context.route('**/api/search', (route) => {
+      requestCount++;
+
+      if (requestCount > 10) {
+        // Rate limit exceeded
+        route.fulfill({
+          status: 429,
+          headers: { 'Retry-After': '5' },
+          body: JSON.stringify({ error: 'Rate limit exceeded' }),
+        });
+      } else {
+        route.fulfill({ status: 200, body: JSON.stringify({ results: [] }) });
+      }
+    });
+
+    await page.goto('/search');
+
+    // Make 15 search requests rapidly
+    for (let i = 0; i < 15; i++) {
+      await page.getByPlaceholder('Search').fill(`query-${i}`);
+      await page.getByRole('button', { name: 'Search' }).click();
+    }
+
+    // User sees rate limit message (not crash)
+    await expect(page.getByText('Too many requests. Please wait a moment.')).toBeVisible();
+  });
+});
+```
+
+**Key Points**:
+
+- Error handling: Graceful degradation (500 error → user-friendly message + retry button)
+- Retries: 3 attempts on transient failures (503 → eventual success)
+- Offline handling: Network disconnection detected (sync when reconnected)
+- Health checks: `/api/health` monitors database, cache, queue
+- Circuit breaker: Opens after 5 failures (fallback UI, stop retries)
+- Rate limiting: 429 response handled (Retry-After header respected)
+
+**Reliability NFR Criteria**:
+
+- ✅ PASS: Error handling, retries, health checks verified (all 6 tests green)
+- ⚠️ CONCERNS: Partial coverage (e.g., missing circuit breaker) or no telemetry
+- ❌ FAIL: No recovery path (500 error crashes app) or unresolved crash scenarios
+
+---
+
+### Example 4: Maintainability NFR Validation (CI Tools, Not Playwright)
+
+**Context**: Use proper CI tools for code quality validation (coverage, duplication, vulnerabilities)
+
+**Implementation**:
+
+```yaml
+# .github/workflows/nfr-maintainability.yml
+name: NFR - Maintainability
+
+on: [push, pull_request]
+
+jobs:
+  test-coverage:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run tests with coverage
+        run: npm run test:coverage
+
+      - name: Check coverage threshold (80% minimum)
+        run: |
+          COVERAGE=$(jq '.total.lines.pct' coverage/coverage-summary.json)
+          echo "Coverage: $COVERAGE%"
+          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
+            echo "❌ FAIL: Coverage $COVERAGE% below 80% threshold"
+            exit 1
+          else
+            echo "✅ PASS: Coverage $COVERAGE% meets 80% threshold"
+          fi
+
+  code-duplication:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+
+      - name: Check code duplication (<5% allowed)
+        run: |
+          npx jscpd src/ --threshold 5 --format json --output duplication.json
+          DUPLICATION=$(jq '.statistics.total.percentage' duplication.json)
+          echo "Duplication: $DUPLICATION%"
+          if (( $(echo "$DUPLICATION >= 5" | bc -l) )); then
+            echo "❌ FAIL: Duplication $DUPLICATION% exceeds 5% threshold"
+            exit 1
+          else
+            echo "✅ PASS: Duplication $DUPLICATION% below 5% threshold"
+          fi
+
+  vulnerability-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run npm audit (no critical/high vulnerabilities)
+        run: |
+          npm audit --json > audit.json || true
+          CRITICAL=$(jq '.metadata.vulnerabilities.critical' audit.json)
+          HIGH=$(jq '.metadata.vulnerabilities.high' audit.json)
+          echo "Critical: $CRITICAL, High: $HIGH"
+          if [ "$CRITICAL" -gt 0 ] || [ "$HIGH" -gt 0 ]; then
+            echo "❌ FAIL: Found $CRITICAL critical and $HIGH high vulnerabilities"
+            npm audit
+            exit 1
+          else
+            echo "✅ PASS: No critical/high vulnerabilities"
+          fi
+```
+
+**Playwright Tests for Observability (E2E Validation):**
+
+```typescript
+// tests/nfr/observability.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Maintainability NFR: Observability Validation', () => {
+  test('critical errors are reported to monitoring service', async ({ page, context }) => {
+    const sentryEvents: any[] = [];
+
+    // Mock Sentry SDK to verify error tracking
+    await context.addInitScript(() => {
+      (window as any).Sentry = {
+        captureException: (error: Error) => {
+          console.log('SENTRY_CAPTURE:', JSON.stringify({ message: error.message, stack: error.stack }));
+        },
+      };
+    });
+
+    page.on('console', (msg) => {
+      if (msg.text().includes('SENTRY_CAPTURE:')) {
+        sentryEvents.push(JSON.parse(msg.text().replace('SENTRY_CAPTURE:', '')));
+      }
+    });
+
+    // Trigger error by mocking API failure
+    await context.route('**/api/products', (route) => {
+      route.fulfill({ status: 500, body: JSON.stringify({ error: 'Database Error' }) });
+    });
+
+    await page.goto('/products');
+
+    // Wait for error UI and Sentry capture
+    await expect(page.getByText('Unable to load products')).toBeVisible();
+
+    // Verify error was captured by monitoring
+    expect(sentryEvents.length).toBeGreaterThan(0);
+    expect(sentryEvents[0]).toHaveProperty('message');
+    expect(sentryEvents[0]).toHaveProperty('stack');
+  });
+
+  test('API response times are tracked in telemetry', async ({ request }) => {
+    const response = await request.get('/api/products?limit=10');
+
+    expect(response.ok()).toBeTruthy();
+
+    // Verify Server-Timing header for APM (Application Performance Monitoring)
+    const serverTiming = response.headers()['server-timing'];
+
+    expect(serverTiming).toBeTruthy();
+    expect(serverTiming).toContain('db'); // Database query time
+    expect(serverTiming).toContain('total'); // Total processing time
+  });
+
+  test('structured logging present in application', async ({ request }) => {
+    // Make API call that generates logs
+    const response = await request.post('/api/orders', {
+      data: { productId: '123', quantity: 2 },
+    });
+
+    expect(response.ok()).toBeTruthy();
+
+    // Note: In real scenarios, validate logs in monitoring system (Datadog, CloudWatch)
+    // This test validates the logging contract exists (Server-Timing, trace IDs in headers)
+    const traceId = response.headers()['x-trace-id'];
+    expect(traceId).toBeTruthy(); // Confirms structured logging with correlation IDs
+  });
+});
+```
+
+**Key Points**:
+
+- **Coverage/duplication**: CI jobs (GitHub Actions), not Playwright tests
+- **Vulnerability scanning**: npm audit in CI, not Playwright tests
+- **Observability**: Playwright validates error tracking (Sentry) and telemetry headers
+- **Structured logging**: Validate logging contract (trace IDs, Server-Timing headers)
+- **Separation of concerns**: Build-time checks (coverage, audit) vs runtime checks (error tracking, telemetry)
+
+**Maintainability NFR Criteria**:
+
+- ✅ PASS: Clean code (80%+ coverage from CI, <5% duplication from CI), observability validated in E2E, no critical vulnerabilities from npm audit
+- ⚠️ CONCERNS: Duplication >5%, coverage 60-79%, or unclear ownership
+- ❌ FAIL: Absent tests (<60%), tangled implementations (>10% duplication), or no observability
+
+---
+
+## NFR Assessment Checklist
+
+Before release gate:
+
+- [ ] **Security** (Playwright E2E + Security Tools):
+  - [ ] Auth/authz tests green (unauthenticated redirect, RBAC enforced)
+  - [ ] Secrets never logged or exposed in errors
+  - [ ] OWASP Top 10 validated (SQL injection blocked, XSS sanitized)
+  - [ ] Security audit completed (vulnerability scan, penetration test if applicable)
+
+- [ ] **Performance** (k6 Load Testing):
+  - [ ] SLO/SLA targets met with k6 evidence (p95 <500ms, error rate <1%)
+  - [ ] Load testing completed (expected load)
+  - [ ] Stress testing completed (breaking point identified)
+  - [ ] Spike testing completed (handles traffic spikes)
+  - [ ] Endurance testing completed (no memory leaks under sustained load)
+
+- [ ] **Reliability** (Playwright E2E + API Tests):
+  - [ ] Error handling graceful (500 → user-friendly message + retry)
+  - [ ] Retries implemented (3 attempts on transient failures)
+  - [ ] Health checks monitored (/api/health endpoint)
+  - [ ] Circuit breaker tested (opens after failure threshold)
+  - [ ] Offline handling validated (network disconnection graceful)
+
+- [ ] **Maintainability** (CI Tools):
+  - [ ] Test coverage ≥80% (from CI coverage report)
+  - [ ] Code duplication <5% (from jscpd CI job)
+  - [ ] No critical/high vulnerabilities (from npm audit CI job)
+  - [ ] Structured logging validated (Playwright validates telemetry headers)
+  - [ ] Error tracking configured (Sentry/monitoring integration validated)
+
+- [ ] **Ambiguous requirements**: Default to CONCERNS (force team to clarify thresholds and evidence)
+- [ ] **NFR criteria documented**: Measurable thresholds defined (not subjective "fast enough")
+- [ ] **Automated validation**: NFR tests run in CI pipeline (not manual checklists)
+- [ ] **Tool selection**: Right tool for each NFR (k6 for performance, Playwright for security/reliability E2E, CI tools for maintainability)
+
+## NFR Gate Decision Matrix
+
+| Category            | PASS Criteria                                | CONCERNS Criteria                            | FAIL Criteria                                  |
+| ------------------- | -------------------------------------------- | -------------------------------------------- | ---------------------------------------------- |
+| **Security**        | Auth/authz, secret handling, OWASP verified  | Minor gaps with clear owners                 | Critical exposure or missing controls          |
+| **Performance**     | Metrics meet SLO/SLA with profiling evidence | Trending toward limits or missing baselines  | SLO/SLA breached or resource leaks detected    |
+| **Reliability**     | Error handling, retries, health checks OK    | Partial coverage or missing telemetry        | No recovery path or unresolved crash scenarios |
+| **Maintainability** | Clean code, tests, docs shipped together     | Duplication, low coverage, unclear ownership | Absent tests, tangled code, no observability   |
+
+**Default**: If targets or evidence are undefined → **CONCERNS** (force team to clarify before sign-off)
+
+## Integration Points
+
+- **Used in workflows**: `*nfr-assess` (automated NFR validation), `*trace` (gate decision Phase 2), `*test-design` (NFR risk assessment via Utility Tree)
+- **Related fragments**: `risk-governance.md` (NFR risk scoring), `probability-impact.md` (NFR impact assessment), `test-quality.md` (maintainability standards), `test-levels-framework.md` (system-level testing for NFRs)
+- **Tools by NFR Category**:
+  - **Security**: Playwright (E2E auth/authz), OWASP ZAP, Burp Suite, npm audit, Snyk
+  - **Performance**: k6 (load/stress/spike/endurance), Lighthouse (Core Web Vitals), Artillery
+  - **Reliability**: Playwright (E2E error handling), API tests (retries, health checks), Chaos Engineering tools
+  - **Maintainability**: GitHub Actions (coverage, duplication, audit), jscpd, Playwright (observability validation)
+
+_Source: Test Architect course (NFR testing approaches, Utility Tree, Quality Scenarios), ISO/IEC 25010 Software Quality Characteristics, OWASP Top 10, k6 documentation, SRE practices_
--- a/src/modules/bmm/testarch/knowledge/playwright-config.md
+++ b/src/modules/bmm/testarch/knowledge/playwright-config.md
@@ -1,9 +1,730 @@
 # Playwright Configuration Guardrails

- Load environment configs via a central map (`envConfigMap`) and fail fast when `TEST_ENV` is missing or unsupported.
- Standardize timeouts: action 15s, navigation 30s, expect 10s, test 60s; expose overrides through fixtures rather than inline literals.
- Emit HTML + JUnit reporters, disable auto-open, and store artifacts under `test-results/` for CI upload.
- Keep `.env.example`, `.nvmrc`, and browser dependencies versioned so local and CI runs stay aligned.
- Use global setup for shared auth tokens or seeding, but prefer per-test fixtures for anything mutable to avoid cross-test leakage.
+## Principle

-_Source: Playwright book repo, SEON configuration example._
+Load environment configs via a central map (`envConfigMap`), standardize timeouts (action 15s, navigation 30s, expect 10s, test 60s), emit HTML + JUnit reporters, and store artifacts under `test-results/` for CI upload. Keep `.env.example`, `.nvmrc`, and browser dependencies versioned so local and CI runs stay aligned.
+
+## Rationale
+
+Environment-specific configuration prevents hardcoded URLs, timeouts, and credentials from leaking into tests. A central config map with fail-fast validation catches missing environments early. Standardized timeouts reduce flakiness while remaining long enough for real-world network conditions. Consistent artifact storage (`test-results/`, `playwright-report/`) enables CI pipelines to upload failure evidence automatically. Versioned dependencies (`.nvmrc`, `package.json` browser versions) eliminate "works on my machine" issues between local and CI environments.
+
+## Pattern Examples
+
+### Example 1: Environment-Based Configuration
+
+**Context**: When testing against multiple environments (local, staging, production), use a central config map that loads environment-specific settings and fails fast if `TEST_ENV` is invalid.
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts - Central config loader
+import { config as dotenvConfig } from 'dotenv';
+import path from 'path';
+
+// Load .env from project root
+dotenvConfig({
+  path: path.resolve(__dirname, '../../.env'),
+});
+
+// Central environment config map
+const envConfigMap = {
+  local: require('./playwright/config/local.config').default,
+  staging: require('./playwright/config/staging.config').default,
+  production: require('./playwright/config/production.config').default,
+};
+
+const environment = process.env.TEST_ENV || 'local';
+
+// Fail fast if environment not supported
+if (!Object.keys(envConfigMap).includes(environment)) {
+  console.error(`❌ No configuration found for environment: ${environment}`);
+  console.error(`   Available environments: ${Object.keys(envConfigMap).join(', ')}`);
+  process.exit(1);
+}
+
+console.log(`✅ Running tests against: ${environment.toUpperCase()}`);
+
+export default envConfigMap[environment as keyof typeof envConfigMap];
+```
+
+```typescript
+// playwright/config/base.config.ts - Shared base configuration
+import { defineConfig } from '@playwright/test';
+import path from 'path';
+
+export const baseConfig = defineConfig({
+  testDir: path.resolve(__dirname, '../tests'),
+  outputDir: path.resolve(__dirname, '../../test-results'),
+  fullyParallel: true,
+  forbidOnly: !!process.env.CI,
+  retries: process.env.CI ? 2 : 0,
+  workers: process.env.CI ? 1 : undefined,
+  reporter: [
+    ['html', { outputFolder: 'playwright-report', open: 'never' }],
+    ['junit', { outputFile: 'test-results/results.xml' }],
+    ['list'],
+  ],
+  use: {
+    actionTimeout: 15000,
+    navigationTimeout: 30000,
+    trace: 'on-first-retry',
+    screenshot: 'only-on-failure',
+    video: 'retain-on-failure',
+  },
+  globalSetup: path.resolve(__dirname, '../support/global-setup.ts'),
+  timeout: 60000,
+  expect: { timeout: 10000 },
+});
+```
+
+```typescript
+// playwright/config/local.config.ts - Local environment
+import { defineConfig } from '@playwright/test';
+import { baseConfig } from './base.config';
+
+export default defineConfig({
+  ...baseConfig,
+  use: {
+    ...baseConfig.use,
+    baseURL: 'http://localhost:3000',
+    video: 'off', // No video locally for speed
+  },
+  webServer: {
+    command: 'npm run dev',
+    url: 'http://localhost:3000',
+    reuseExistingServer: !process.env.CI,
+    timeout: 120000,
+  },
+});
+```
+
+```typescript
+// playwright/config/staging.config.ts - Staging environment
+import { defineConfig } from '@playwright/test';
+import { baseConfig } from './base.config';
+
+export default defineConfig({
+  ...baseConfig,
+  use: {
+    ...baseConfig.use,
+    baseURL: 'https://staging.example.com',
+    ignoreHTTPSErrors: true, // Allow self-signed certs in staging
+  },
+});
+```
+
+```typescript
+// playwright/config/production.config.ts - Production environment
+import { defineConfig } from '@playwright/test';
+import { baseConfig } from './base.config';
+
+export default defineConfig({
+  ...baseConfig,
+  retries: 3, // More retries in production
+  use: {
+    ...baseConfig.use,
+    baseURL: 'https://example.com',
+    video: 'on', // Always record production failures
+  },
+});
+```
+
+```bash
+# .env.example - Template for developers
+TEST_ENV=local
+API_KEY=your_api_key_here
+DATABASE_URL=postgresql://localhost:5432/test_db
+```
+
+**Key Points**:
+
+- Central `envConfigMap` prevents environment misconfiguration
+- Fail-fast validation with clear error message (available envs listed)
+- Base config defines shared settings, environment configs override
+- `.env.example` provides template for required secrets
+- `TEST_ENV=local` as default for local development
+- Production config increases retries and enables video recording
+
+### Example 2: Timeout Standards
+
+**Context**: When tests fail due to inconsistent timeout settings, standardize timeouts across all tests: action 15s, navigation 30s, expect 10s, test 60s. Expose overrides through fixtures rather than inline literals.
+
+**Implementation**:
+
+```typescript
+// playwright/config/base.config.ts - Standardized timeouts
+import { defineConfig } from '@playwright/test';
+
+export default defineConfig({
+  // Global test timeout: 60 seconds
+  timeout: 60000,
+
+  use: {
+    // Action timeout: 15 seconds (click, fill, etc.)
+    actionTimeout: 15000,
+
+    // Navigation timeout: 30 seconds (page.goto, page.reload)
+    navigationTimeout: 30000,
+  },
+
+  // Expect timeout: 10 seconds (all assertions)
+  expect: {
+    timeout: 10000,
+  },
+});
+```
+
+```typescript
+// playwright/support/fixtures/timeout-fixture.ts - Timeout override fixture
+import { test as base } from '@playwright/test';
+
+type TimeoutOptions = {
+  extendedTimeout: (timeoutMs: number) => Promise<void>;
+};
+
+export const test = base.extend<TimeoutOptions>({
+  extendedTimeout: async ({}, use, testInfo) => {
+    const originalTimeout = testInfo.timeout;
+
+    await use(async (timeoutMs: number) => {
+      testInfo.setTimeout(timeoutMs);
+    });
+
+    // Restore original timeout after test
+    testInfo.setTimeout(originalTimeout);
+  },
+});
+
+export { expect } from '@playwright/test';
+```
+
+```typescript
+// Usage in tests - Standard timeouts (implicit)
+import { test, expect } from '@playwright/test';
+
+test('user can log in', async ({ page }) => {
+  await page.goto('/login'); // Uses 30s navigation timeout
+  await page.fill('[data-testid="email"]', 'test@example.com'); // Uses 15s action timeout
+  await page.click('[data-testid="login-button"]'); // Uses 15s action timeout
+
+  await expect(page.getByText('Welcome')).toBeVisible(); // Uses 10s expect timeout
+});
+```
+
+```typescript
+// Usage in tests - Per-test timeout override
+import { test, expect } from '../support/fixtures/timeout-fixture';
+
+test('slow data processing operation', async ({ page, extendedTimeout }) => {
+  // Override default 60s timeout for this slow test
+  await extendedTimeout(180000); // 3 minutes
+
+  await page.goto('/data-processing');
+  await page.click('[data-testid="process-large-file"]');
+
+  // Wait for long-running operation
+  await expect(page.getByText('Processing complete')).toBeVisible({
+    timeout: 120000, // 2 minutes for assertion
+  });
+});
+```
+
+```typescript
+// Per-assertion timeout override (inline)
+test('API returns quickly', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // Override expect timeout for fast API (reduce flakiness detection)
+  await expect(page.getByTestId('user-name')).toBeVisible({ timeout: 5000 }); // 5s instead of 10s
+
+  // Override expect timeout for slow external API
+  await expect(page.getByTestId('weather-widget')).toBeVisible({ timeout: 20000 }); // 20s instead of 10s
+});
+```
+
+**Key Points**:
+
+- **Standardized timeouts**: action 15s, navigation 30s, expect 10s, test 60s (global defaults)
+- Fixture-based override (`extendedTimeout`) for slow tests (preferred over inline)
+- Per-assertion timeout override via `{ timeout: X }` option (use sparingly)
+- Avoid hard waits (`page.waitForTimeout(3000)`) - use event-based waits instead
+- CI environments may need longer timeouts (handle in environment-specific config)
+
+### Example 3: Artifact Output Configuration
+
+**Context**: When debugging failures in CI, configure artifacts (screenshots, videos, traces, HTML reports) to be captured on failure and stored in consistent locations for upload.
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts - Artifact configuration
+import { defineConfig } from '@playwright/test';
+import path from 'path';
+
+export default defineConfig({
+  // Output directory for test artifacts
+  outputDir: path.resolve(__dirname, './test-results'),
+
+  use: {
+    // Screenshot on failure only (saves space)
+    screenshot: 'only-on-failure',
+
+    // Video recording on failure + retry
+    video: 'retain-on-failure',
+
+    // Trace recording on first retry (best debugging data)
+    trace: 'on-first-retry',
+  },
+
+  reporter: [
+    // HTML report (visual, interactive)
+    [
+      'html',
+      {
+        outputFolder: 'playwright-report',
+        open: 'never', // Don't auto-open in CI
+      },
+    ],
+
+    // JUnit XML (CI integration)
+    [
+      'junit',
+      {
+        outputFile: 'test-results/results.xml',
+      },
+    ],
+
+    // List reporter (console output)
+    ['list'],
+  ],
+});
+```
+
+```typescript
+// playwright/support/fixtures/artifact-fixture.ts - Custom artifact capture
+import { test as base } from '@playwright/test';
+import fs from 'fs';
+import path from 'path';
+
+export const test = base.extend({
+  // Auto-capture console logs on failure
+  page: async ({ page }, use, testInfo) => {
+    const logs: string[] = [];
+
+    page.on('console', (msg) => {
+      logs.push(`[${msg.type()}] ${msg.text()}`);
+    });
+
+    await use(page);
+
+    // Save logs on failure
+    if (testInfo.status !== testInfo.expectedStatus) {
+      const logsPath = path.join(testInfo.outputDir, 'console-logs.txt');
+      fs.writeFileSync(logsPath, logs.join('\n'));
+      testInfo.attachments.push({
+        name: 'console-logs',
+        contentType: 'text/plain',
+        path: logsPath,
+      });
+    }
+  },
+});
+```
+
+```yaml
+# .github/workflows/e2e.yml - CI artifact upload
+name: E2E Tests
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps
+
+      - name: Run tests
+        run: npm run test
+        env:
+          TEST_ENV: staging
+
+      # Upload test artifacts on failure
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results
+          path: test-results/
+          retention-days: 30
+
+      - name: Upload Playwright report
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: playwright-report/
+          retention-days: 30
+```
+
+```typescript
+// Example: Custom screenshot on specific condition
+test('capture screenshot on specific error', async ({ page }) => {
+  await page.goto('/checkout');
+
+  try {
+    await page.click('[data-testid="submit-payment"]');
+    await expect(page.getByText('Order Confirmed')).toBeVisible();
+  } catch (error) {
+    // Capture custom screenshot with timestamp
+    await page.screenshot({
+      path: `test-results/payment-error-${Date.now()}.png`,
+      fullPage: true,
+    });
+    throw error;
+  }
+});
+```
+
+**Key Points**:
+
+- `screenshot: 'only-on-failure'` saves space (not every test)
+- `video: 'retain-on-failure'` captures full flow on failures
+- `trace: 'on-first-retry'` provides deep debugging data (network, DOM, console)
+- HTML report at `playwright-report/` (visual debugging)
+- JUnit XML at `test-results/results.xml` (CI integration)
+- CI uploads artifacts on failure with 30-day retention
+- Custom fixture can capture console logs, network logs, etc.
+
+### Example 4: Parallelization Configuration
+
+**Context**: When tests run slowly in CI, configure parallelization with worker count, sharding, and fully parallel execution to maximize speed while maintaining stability.
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts - Parallelization settings
+import { defineConfig } from '@playwright/test';
+import os from 'os';
+
+export default defineConfig({
+  // Run tests in parallel within single file
+  fullyParallel: true,
+
+  // Worker configuration
+  workers: process.env.CI
+    ? 1 // Serial in CI for stability (or 2 for faster CI)
+    : os.cpus().length - 1, // Parallel locally (leave 1 CPU for OS)
+
+  // Prevent accidentally committed .only() from blocking CI
+  forbidOnly: !!process.env.CI,
+
+  // Retry failed tests in CI
+  retries: process.env.CI ? 2 : 0,
+
+  // Shard configuration (split tests across multiple machines)
+  shard:
+    process.env.SHARD_INDEX && process.env.SHARD_TOTAL
+      ? {
+          current: parseInt(process.env.SHARD_INDEX, 10),
+          total: parseInt(process.env.SHARD_TOTAL, 10),
+        }
+      : undefined,
+});
+```
+
+```yaml
+# .github/workflows/e2e-parallel.yml - Sharded CI execution
+name: E2E Tests (Parallel)
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4] # Split tests across 4 machines
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps
+
+      - name: Run tests (shard ${{ matrix.shard }})
+        run: npm run test
+        env:
+          SHARD_INDEX: ${{ matrix.shard }}
+          SHARD_TOTAL: 4
+          TEST_ENV: staging
+
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-shard-${{ matrix.shard }}
+          path: test-results/
+```
+
+```typescript
+// playwright/config/serial.config.ts - Serial execution for flaky tests
+import { defineConfig } from '@playwright/test';
+import { baseConfig } from './base.config';
+
+export default defineConfig({
+  ...baseConfig,
+
+  // Disable parallel execution
+  fullyParallel: false,
+  workers: 1,
+
+  // Used for: authentication flows, database-dependent tests, feature flag tests
+});
+```
+
+```typescript
+// Usage: Force serial execution for specific tests
+import { test } from '@playwright/test';
+
+// Serial execution for auth tests (shared session state)
+test.describe.configure({ mode: 'serial' });
+
+test.describe('Authentication Flow', () => {
+  test('user can log in', async ({ page }) => {
+    // First test in serial block
+  });
+
+  test('user can access dashboard', async ({ page }) => {
+    // Depends on previous test (serial)
+  });
+});
+```
+
+```typescript
+// Usage: Parallel execution for independent tests (default)
+import { test } from '@playwright/test';
+
+test.describe('Product Catalog', () => {
+  test('can view product 1', async ({ page }) => {
+    // Runs in parallel with other tests
+  });
+
+  test('can view product 2', async ({ page }) => {
+    // Runs in parallel with other tests
+  });
+});
+```
+
+**Key Points**:
+
+- `fullyParallel: true` enables parallel execution within single test file
+- Workers: 1 in CI (stability), N-1 CPUs locally (speed)
+- Sharding splits tests across multiple CI machines (4x faster with 4 shards)
+- `test.describe.configure({ mode: 'serial' })` for dependent tests
+- `forbidOnly: true` in CI prevents `.only()` from blocking pipeline
+- Matrix strategy in CI runs shards concurrently
+
+### Example 5: Project Configuration
+
+**Context**: When testing across multiple browsers, devices, or configurations, use Playwright projects to run the same tests against different environments (chromium, firefox, webkit, mobile).
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts - Multiple browser projects
+import { defineConfig, devices } from '@playwright/test';
+
+export default defineConfig({
+  projects: [
+    // Desktop browsers
+    {
+      name: 'chromium',
+      use: { ...devices['Desktop Chrome'] },
+    },
+    {
+      name: 'firefox',
+      use: { ...devices['Desktop Firefox'] },
+    },
+    {
+      name: 'webkit',
+      use: { ...devices['Desktop Safari'] },
+    },
+
+    // Mobile browsers
+    {
+      name: 'mobile-chrome',
+      use: { ...devices['Pixel 5'] },
+    },
+    {
+      name: 'mobile-safari',
+      use: { ...devices['iPhone 13'] },
+    },
+
+    // Tablet
+    {
+      name: 'tablet',
+      use: { ...devices['iPad Pro'] },
+    },
+  ],
+});
+```
+
+```typescript
+// playwright.config.ts - Authenticated vs. unauthenticated projects
+import { defineConfig } from '@playwright/test';
+import path from 'path';
+
+export default defineConfig({
+  projects: [
+    // Setup project (runs first, creates auth state)
+    {
+      name: 'setup',
+      testMatch: /global-setup\.ts/,
+    },
+
+    // Authenticated tests (reuse auth state)
+    {
+      name: 'authenticated',
+      dependencies: ['setup'],
+      use: {
+        storageState: path.resolve(__dirname, './playwright/.auth/user.json'),
+      },
+      testMatch: /.*authenticated\.spec\.ts/,
+    },
+
+    // Unauthenticated tests (public pages)
+    {
+      name: 'unauthenticated',
+      testMatch: /.*unauthenticated\.spec\.ts/,
+    },
+  ],
+});
+```
+
+```typescript
+// playwright/support/global-setup.ts - Setup project for auth
+import { chromium, FullConfig } from '@playwright/test';
+import path from 'path';
+
+async function globalSetup(config: FullConfig) {
+  const browser = await chromium.launch();
+  const page = await browser.newPage();
+
+  // Perform authentication
+  await page.goto('http://localhost:3000/login');
+  await page.fill('[data-testid="email"]', 'test@example.com');
+  await page.fill('[data-testid="password"]', 'password123');
+  await page.click('[data-testid="login-button"]');
+
+  // Wait for authentication to complete
+  await page.waitForURL('**/dashboard');
+
+  // Save authentication state
+  await page.context().storageState({
+    path: path.resolve(__dirname, '../.auth/user.json'),
+  });
+
+  await browser.close();
+}
+
+export default globalSetup;
+```
+
+```bash
+# Run specific project
+npx playwright test --project=chromium
+npx playwright test --project=mobile-chrome
+npx playwright test --project=authenticated
+
+# Run multiple projects
+npx playwright test --project=chromium --project=firefox
+
+# Run all projects (default)
+npx playwright test
+```
+
+```typescript
+// Usage: Project-specific test
+import { test, expect } from '@playwright/test';
+
+test('mobile navigation works', async ({ page, isMobile }) => {
+  await page.goto('/');
+
+  if (isMobile) {
+    // Open mobile menu
+    await page.click('[data-testid="hamburger-menu"]');
+  }
+
+  await page.click('[data-testid="products-link"]');
+  await expect(page).toHaveURL(/.*products/);
+});
+```
+
+```yaml
+# .github/workflows/e2e-cross-browser.yml - CI cross-browser testing
+name: E2E Tests (Cross-Browser)
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        project: [chromium, firefox, webkit, mobile-chrome]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+      - run: npm ci
+      - run: npx playwright install --with-deps
+
+      - name: Run tests (${{ matrix.project }})
+        run: npx playwright test --project=${{ matrix.project }}
+```
+
+**Key Points**:
+
+- Projects enable testing across browsers, devices, and configurations
+- `devices` from `@playwright/test` provide preset configurations (Pixel 5, iPhone 13, etc.)
+- `dependencies` ensures setup project runs first (auth, data seeding)
+- `storageState` shares authentication across tests (0 seconds auth per test)
+- `testMatch` filters which tests run in which project
+- CI matrix strategy runs projects in parallel (4x faster with 4 projects)
+- `isMobile` context property for conditional logic in tests
+
+## Integration Points
+
+- **Used in workflows**: `*framework` (config setup), `*ci` (parallelization, artifact upload)
+- **Related fragments**:
+  - `fixture-architecture.md` - Fixture-based timeout overrides
+  - `ci-burn-in.md` - CI pipeline artifact upload
+  - `test-quality.md` - Timeout standards (no hard waits)
+  - `data-factories.md` - Per-test isolation (no shared global state)
+
+## Configuration Checklist
+
+**Before deploying tests, verify**:
+
+- [ ] Environment config map with fail-fast validation
+- [ ] Standardized timeouts (action 15s, navigation 30s, expect 10s, test 60s)
+- [ ] Artifact storage at `test-results/` and `playwright-report/`
+- [ ] HTML + JUnit reporters configured
+- [ ] `.env.example`, `.nvmrc`, browser versions committed
+- [ ] Parallelization configured (workers, sharding)
+- [ ] Projects defined for cross-browser/device testing (if needed)
+- [ ] CI uploads artifacts on failure with 30-day retention
+
+_Source: Playwright book repo, SEON configuration example, Murat testing philosophy (lines 216-271)._
--- a/src/modules/bmm/testarch/knowledge/probability-impact.md
+++ b/src/modules/bmm/testarch/knowledge/probability-impact.md
@@ -1,17 +1,601 @@
 # Probability and Impact Scale

- **Probability**
-  - 1 – Unlikely: standard implementation, low uncertainty.
-  - 2 – Possible: edge cases or partial unknowns worth investigation.
-  - 3 – Likely: known issues, new integrations, or high ambiguity.
- **Impact**
-  - 1 – Minor: cosmetic issues or easy workarounds.
-  - 2 – Degraded: partial feature loss or manual workaround required.
-  - 3 – Critical: blockers, data/security/regulatory exposure.
- Multiply probability × impact to derive the risk score.
-  - 1–3: document for awareness.
-  - 4–5: monitor closely, plan mitigations.
-  - 6–8: CONCERNS at the gate until mitigations are implemented.
-  - 9: automatic gate FAIL until resolved or formally waived.
+## Principle

-_Source: Murat risk model summary._
+Risk scoring uses a **probability × impact** matrix (1-9 scale) to prioritize testing efforts. Higher scores (6-9) demand immediate action; lower scores (1-3) require documentation only. This systematic approach ensures testing resources focus on the highest-value risks.
+
+## Rationale
+
+**The Problem**: Without quantifiable risk assessment, teams over-test low-value scenarios while missing critical risks. Gut feeling leads to inconsistent prioritization and missed edge cases.
+
+**The Solution**: Standardize risk evaluation with a 3×3 matrix (probability: 1-3, impact: 1-3). Multiply to derive risk score (1-9). Automate classification (DOCUMENT, MONITOR, MITIGATE, BLOCK) based on thresholds. This approach surfaces hidden risks early and justifies testing decisions to stakeholders.
+
+**Why This Matters**:
+
+- Consistent risk language across product, engineering, and QA
+- Objective prioritization of test scenarios (not politics)
+- Automatic gate decisions (score=9 → FAIL until resolved)
+- Audit trail for compliance and retrospectives
+
+## Pattern Examples
+
+### Example 1: Probability-Impact Matrix Implementation (Automated Classification)
+
+**Context**: Implement a reusable risk scoring system with automatic threshold classification
+
+**Implementation**:
+
+```typescript
+// src/testing/risk-matrix.ts
+
+/**
+ * Probability levels:
+ * 1 = Unlikely (standard implementation, low uncertainty)
+ * 2 = Possible (edge cases or partial unknowns)
+ * 3 = Likely (known issues, new integrations, high ambiguity)
+ */
+export type Probability = 1 | 2 | 3;
+
+/**
+ * Impact levels:
+ * 1 = Minor (cosmetic issues or easy workarounds)
+ * 2 = Degraded (partial feature loss or manual workaround)
+ * 3 = Critical (blockers, data/security/regulatory exposure)
+ */
+export type Impact = 1 | 2 | 3;
+
+/**
+ * Risk score (probability × impact): 1-9
+ */
+export type RiskScore = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;
+
+/**
+ * Action categories based on risk score thresholds
+ */
+export type RiskAction = 'DOCUMENT' | 'MONITOR' | 'MITIGATE' | 'BLOCK';
+
+export type RiskAssessment = {
+  probability: Probability;
+  impact: Impact;
+  score: RiskScore;
+  action: RiskAction;
+  reasoning: string;
+};
+
+/**
+ * Calculate risk score: probability × impact
+ */
+export function calculateRiskScore(probability: Probability, impact: Impact): RiskScore {
+  return (probability * impact) as RiskScore;
+}
+
+/**
+ * Classify risk action based on score thresholds:
+ * - 1-3: DOCUMENT (awareness only)
+ * - 4-5: MONITOR (watch closely, plan mitigations)
+ * - 6-8: MITIGATE (CONCERNS at gate until mitigated)
+ * - 9: BLOCK (automatic FAIL until resolved or waived)
+ */
+export function classifyRiskAction(score: RiskScore): RiskAction {
+  if (score >= 9) return 'BLOCK';
+  if (score >= 6) return 'MITIGATE';
+  if (score >= 4) return 'MONITOR';
+  return 'DOCUMENT';
+}
+
+/**
+ * Full risk assessment with automatic classification
+ */
+export function assessRisk(params: { probability: Probability; impact: Impact; reasoning: string }): RiskAssessment {
+  const { probability, impact, reasoning } = params;
+
+  const score = calculateRiskScore(probability, impact);
+  const action = classifyRiskAction(score);
+
+  return { probability, impact, score, action, reasoning };
+}
+
+/**
+ * Generate risk matrix visualization (3x3 grid)
+ * Returns markdown table with color-coded scores
+ */
+export function generateRiskMatrix(): string {
+  const matrix: string[][] = [];
+  const header = ['Impact \\ Probability', 'Unlikely (1)', 'Possible (2)', 'Likely (3)'];
+  matrix.push(header);
+
+  const impactLabels = ['Critical (3)', 'Degraded (2)', 'Minor (1)'];
+  for (let impact = 3; impact >= 1; impact--) {
+    const row = [impactLabels[3 - impact]];
+    for (let probability = 1; probability <= 3; probability++) {
+      const score = calculateRiskScore(probability as Probability, impact as Impact);
+      const action = classifyRiskAction(score);
+      const emoji = action === 'BLOCK' ? '🔴' : action === 'MITIGATE' ? '🟠' : action === 'MONITOR' ? '🟡' : '🟢';
+      row.push(`${emoji} ${score}`);
+    }
+    matrix.push(row);
+  }
+
+  return matrix.map((row) => `| ${row.join(' | ')} |`).join('\n');
+}
+```
+
+**Key Points**:
+
+- Type-safe probability/impact (1-3 enforced at compile time)
+- Automatic action classification (DOCUMENT, MONITOR, MITIGATE, BLOCK)
+- Visual matrix generation for documentation
+- Risk score formula: `probability * impact` (max = 9)
+- Threshold-based decision rules (6-8 = MITIGATE, 9 = BLOCK)
+
+---
+
+### Example 2: Risk Assessment Workflow (Test Planning Integration)
+
+**Context**: Apply risk matrix during test design to prioritize scenarios
+
+**Implementation**:
+
+```typescript
+// tests/e2e/test-planning/risk-assessment.ts
+import { assessRisk, generateRiskMatrix, type RiskAssessment } from '../../../src/testing/risk-matrix';
+
+export type TestScenario = {
+  id: string;
+  title: string;
+  feature: string;
+  risk: RiskAssessment;
+  testLevel: 'E2E' | 'API' | 'Unit';
+  priority: 'P0' | 'P1' | 'P2' | 'P3';
+  owner: string;
+};
+
+/**
+ * Assess test scenarios and auto-assign priority based on risk score
+ */
+export function assessTestScenarios(scenarios: Omit<TestScenario, 'risk' | 'priority'>[]): TestScenario[] {
+  return scenarios.map((scenario) => {
+    // Auto-assign priority based on risk score
+    const priority = mapRiskToPriority(scenario.risk.score);
+    return { ...scenario, priority };
+  });
+}
+
+/**
+ * Map risk score to test priority (P0-P3)
+ * P0: Critical (score 9) - blocks release
+ * P1: High (score 6-8) - must fix before release
+ * P2: Medium (score 4-5) - fix if time permits
+ * P3: Low (score 1-3) - document and defer
+ */
+function mapRiskToPriority(score: number): 'P0' | 'P1' | 'P2' | 'P3' {
+  if (score === 9) return 'P0';
+  if (score >= 6) return 'P1';
+  if (score >= 4) return 'P2';
+  return 'P3';
+}
+
+/**
+ * Example: Payment flow risk assessment
+ */
+export const paymentScenarios: Array<Omit<TestScenario, 'priority'>> = [
+  {
+    id: 'PAY-001',
+    title: 'Valid credit card payment completes successfully',
+    feature: 'Checkout',
+    risk: assessRisk({
+      probability: 2, // Possible (standard Stripe integration)
+      impact: 3, // Critical (revenue loss if broken)
+      reasoning: 'Core revenue flow, but Stripe is well-tested',
+    }),
+    testLevel: 'E2E',
+    owner: 'qa-team',
+  },
+  {
+    id: 'PAY-002',
+    title: 'Expired credit card shows user-friendly error',
+    feature: 'Checkout',
+    risk: assessRisk({
+      probability: 3, // Likely (edge case handling often buggy)
+      impact: 2, // Degraded (users see error, but can retry)
+      reasoning: 'Error handling logic is custom and complex',
+    }),
+    testLevel: 'E2E',
+    owner: 'qa-team',
+  },
+  {
+    id: 'PAY-003',
+    title: 'Payment confirmation email formatting is correct',
+    feature: 'Email',
+    risk: assessRisk({
+      probability: 2, // Possible (template changes occasionally break)
+      impact: 1, // Minor (cosmetic issue, email still sent)
+      reasoning: 'Non-blocking, users get email regardless',
+    }),
+    testLevel: 'Unit',
+    owner: 'dev-team',
+  },
+  {
+    id: 'PAY-004',
+    title: 'Payment fails gracefully when Stripe is down',
+    feature: 'Checkout',
+    risk: assessRisk({
+      probability: 1, // Unlikely (Stripe has 99.99% uptime)
+      impact: 3, // Critical (complete checkout failure)
+      reasoning: 'Rare but catastrophic, requires retry mechanism',
+    }),
+    testLevel: 'API',
+    owner: 'qa-team',
+  },
+];
+
+/**
+ * Generate risk assessment report with priority distribution
+ */
+export function generateRiskReport(scenarios: TestScenario[]): string {
+  const priorityCounts = scenarios.reduce(
+    (acc, s) => {
+      acc[s.priority] = (acc[s.priority] || 0) + 1;
+      return acc;
+    },
+    {} as Record<string, number>,
+  );
+
+  const actionCounts = scenarios.reduce(
+    (acc, s) => {
+      acc[s.risk.action] = (acc[s.risk.action] || 0) + 1;
+      return acc;
+    },
+    {} as Record<string, number>,
+  );
+
+  return `
+# Risk Assessment Report
+
+## Risk Matrix
+${generateRiskMatrix()}
+
+## Priority Distribution
+- **P0 (Blocker)**: ${priorityCounts.P0 || 0} scenarios
+- **P1 (High)**: ${priorityCounts.P1 || 0} scenarios
+- **P2 (Medium)**: ${priorityCounts.P2 || 0} scenarios
+- **P3 (Low)**: ${priorityCounts.P3 || 0} scenarios
+
+## Action Required
+- **BLOCK**: ${actionCounts.BLOCK || 0} scenarios (auto-fail gate)
+- **MITIGATE**: ${actionCounts.MITIGATE || 0} scenarios (concerns at gate)
+- **MONITOR**: ${actionCounts.MONITOR || 0} scenarios (watch closely)
+- **DOCUMENT**: ${actionCounts.DOCUMENT || 0} scenarios (awareness only)
+
+## Scenarios by Risk Score (Highest First)
+${scenarios
+  .sort((a, b) => b.risk.score - a.risk.score)
+  .map((s) => `- **[${s.priority}]** ${s.id}: ${s.title} (Score: ${s.risk.score} - ${s.risk.action})`)
+  .join('\n')}
+`.trim();
+}
+```
+
+**Key Points**:
+
+- Risk score → Priority mapping (P0-P3 automated)
+- Report generation with priority/action distribution
+- Scenarios sorted by risk score (highest first)
+- Visual matrix included in reports
+- Reusable across projects (extract to shared library)
+
+---
+
+### Example 3: Dynamic Risk Re-Assessment (Continuous Evaluation)
+
+**Context**: Recalculate risk scores as project evolves (requirements change, mitigations implemented)
+
+**Implementation**:
+
+```typescript
+// src/testing/risk-tracking.ts
+import { type RiskAssessment, assessRisk, type Probability, type Impact } from './risk-matrix';
+
+export type RiskHistory = {
+  timestamp: Date;
+  assessment: RiskAssessment;
+  changedBy: string;
+  reason: string;
+};
+
+export type TrackedRisk = {
+  id: string;
+  title: string;
+  feature: string;
+  currentRisk: RiskAssessment;
+  history: RiskHistory[];
+  mitigations: string[];
+  status: 'OPEN' | 'MITIGATED' | 'WAIVED' | 'RESOLVED';
+};
+
+export class RiskTracker {
+  private risks: Map<string, TrackedRisk> = new Map();
+
+  /**
+   * Add new risk to tracker
+   */
+  addRisk(params: {
+    id: string;
+    title: string;
+    feature: string;
+    probability: Probability;
+    impact: Impact;
+    reasoning: string;
+    changedBy: string;
+  }): TrackedRisk {
+    const { id, title, feature, probability, impact, reasoning, changedBy } = params;
+
+    const assessment = assessRisk({ probability, impact, reasoning });
+
+    const risk: TrackedRisk = {
+      id,
+      title,
+      feature,
+      currentRisk: assessment,
+      history: [
+        {
+          timestamp: new Date(),
+          assessment,
+          changedBy,
+          reason: 'Initial assessment',
+        },
+      ],
+      mitigations: [],
+      status: 'OPEN',
+    };
+
+    this.risks.set(id, risk);
+    return risk;
+  }
+
+  /**
+   * Reassess risk (probability or impact changed)
+   */
+  reassessRisk(params: {
+    id: string;
+    probability?: Probability;
+    impact?: Impact;
+    reasoning: string;
+    changedBy: string;
+  }): TrackedRisk | null {
+    const { id, probability, impact, reasoning, changedBy } = params;
+    const risk = this.risks.get(id);
+    if (!risk) return null;
+
+    // Use existing values if not provided
+    const newProbability = probability ?? risk.currentRisk.probability;
+    const newImpact = impact ?? risk.currentRisk.impact;
+
+    const newAssessment = assessRisk({
+      probability: newProbability,
+      impact: newImpact,
+      reasoning,
+    });
+
+    risk.currentRisk = newAssessment;
+    risk.history.push({
+      timestamp: new Date(),
+      assessment: newAssessment,
+      changedBy,
+      reason: reasoning,
+    });
+
+    this.risks.set(id, risk);
+    return risk;
+  }
+
+  /**
+   * Mark risk as mitigated (probability reduced)
+   */
+  mitigateRisk(params: { id: string; newProbability: Probability; mitigation: string; changedBy: string }): TrackedRisk | null {
+    const { id, newProbability, mitigation, changedBy } = params;
+    const risk = this.reassessRisk({
+      id,
+      probability: newProbability,
+      reasoning: `Mitigation implemented: ${mitigation}`,
+      changedBy,
+    });
+
+    if (risk) {
+      risk.mitigations.push(mitigation);
+      if (risk.currentRisk.action === 'DOCUMENT' || risk.currentRisk.action === 'MONITOR') {
+        risk.status = 'MITIGATED';
+      }
+    }
+
+    return risk;
+  }
+
+  /**
+   * Get risks requiring action (MITIGATE or BLOCK)
+   */
+  getRisksRequiringAction(): TrackedRisk[] {
+    return Array.from(this.risks.values()).filter(
+      (r) => r.status === 'OPEN' && (r.currentRisk.action === 'MITIGATE' || r.currentRisk.action === 'BLOCK'),
+    );
+  }
+
+  /**
+   * Generate risk trend report (show changes over time)
+   */
+  generateTrendReport(riskId: string): string | null {
+    const risk = this.risks.get(riskId);
+    if (!risk) return null;
+
+    return `
+# Risk Trend Report: ${risk.id}
+
+**Title**: ${risk.title}
+**Feature**: ${risk.feature}
+**Status**: ${risk.status}
+
+## Current Assessment
+- **Probability**: ${risk.currentRisk.probability}
+- **Impact**: ${risk.currentRisk.impact}
+- **Score**: ${risk.currentRisk.score}
+- **Action**: ${risk.currentRisk.action}
+- **Reasoning**: ${risk.currentRisk.reasoning}
+
+## Mitigations Applied
+${risk.mitigations.length > 0 ? risk.mitigations.map((m) => `- ${m}`).join('\n') : '- None'}
+
+## History (${risk.history.length} changes)
+${risk.history
+  .reverse()
+  .map((h) => `- **${h.timestamp.toISOString()}** by ${h.changedBy}: Score ${h.assessment.score} (${h.assessment.action}) - ${h.reason}`)
+  .join('\n')}
+`.trim();
+  }
+}
+```
+
+**Key Points**:
+
+- Historical tracking (audit trail for risk changes)
+- Mitigation impact tracking (probability reduction)
+- Status lifecycle (OPEN → MITIGATED → RESOLVED)
+- Trend reports (show risk evolution over time)
+- Re-assessment triggers (requirements change, new info)
+
+---
+
+### Example 4: Risk Matrix in Gate Decision (Integration with Trace Workflow)
+
+**Context**: Use probability-impact scores to drive gate decisions (PASS/CONCERNS/FAIL/WAIVED)
+
+**Implementation**:
+
+```typescript
+// src/testing/gate-decision.ts
+import { type RiskScore, classifyRiskAction, type RiskAction } from './risk-matrix';
+import { type TrackedRisk } from './risk-tracking';
+
+export type GateDecision = 'PASS' | 'CONCERNS' | 'FAIL' | 'WAIVED';
+
+export type GateResult = {
+  decision: GateDecision;
+  blockers: TrackedRisk[]; // Score=9, action=BLOCK
+  concerns: TrackedRisk[]; // Score 6-8, action=MITIGATE
+  monitored: TrackedRisk[]; // Score 4-5, action=MONITOR
+  documented: TrackedRisk[]; // Score 1-3, action=DOCUMENT
+  summary: string;
+};
+
+/**
+ * Evaluate gate based on risk assessments
+ */
+export function evaluateGateFromRisks(risks: TrackedRisk[]): GateResult {
+  const blockers = risks.filter((r) => r.currentRisk.action === 'BLOCK' && r.status === 'OPEN');
+  const concerns = risks.filter((r) => r.currentRisk.action === 'MITIGATE' && r.status === 'OPEN');
+  const monitored = risks.filter((r) => r.currentRisk.action === 'MONITOR');
+  const documented = risks.filter((r) => r.currentRisk.action === 'DOCUMENT');
+
+  let decision: GateDecision;
+
+  if (blockers.length > 0) {
+    decision = 'FAIL';
+  } else if (concerns.length > 0) {
+    decision = 'CONCERNS';
+  } else {
+    decision = 'PASS';
+  }
+
+  const summary = generateGateSummary({ decision, blockers, concerns, monitored, documented });
+
+  return { decision, blockers, concerns, monitored, documented, summary };
+}
+
+/**
+ * Generate gate decision summary
+ */
+function generateGateSummary(result: Omit<GateResult, 'summary'>): string {
+  const { decision, blockers, concerns, monitored, documented } = result;
+
+  const lines: string[] = [`## Gate Decision: ${decision}`];
+
+  if (decision === 'FAIL') {
+    lines.push(`\n**Blockers** (${blockers.length}): Automatic FAIL until resolved or waived`);
+    blockers.forEach((r) => {
+      lines.push(`- **${r.id}**: ${r.title} (Score: ${r.currentRisk.score})`);
+      lines.push(`  - Probability: ${r.currentRisk.probability}, Impact: ${r.currentRisk.impact}`);
+      lines.push(`  - Reasoning: ${r.currentRisk.reasoning}`);
+    });
+  }
+
+  if (concerns.length > 0) {
+    lines.push(`\n**Concerns** (${concerns.length}): Address before release`);
+    concerns.forEach((r) => {
+      lines.push(`- **${r.id}**: ${r.title} (Score: ${r.currentRisk.score})`);
+      lines.push(`  - Mitigations: ${r.mitigations.join(', ') || 'None'}`);
+    });
+  }
+
+  if (monitored.length > 0) {
+    lines.push(`\n**Monitored** (${monitored.length}): Watch closely`);
+    monitored.forEach((r) => lines.push(`- **${r.id}**: ${r.title} (Score: ${r.currentRisk.score})`));
+  }
+
+  if (documented.length > 0) {
+    lines.push(`\n**Documented** (${documented.length}): Awareness only`);
+  }
+
+  lines.push(`\n---\n`);
+  lines.push(`**Next Steps**:`);
+  if (decision === 'FAIL') {
+    lines.push(`- Resolve blockers or request formal waiver`);
+  } else if (decision === 'CONCERNS') {
+    lines.push(`- Implement mitigations for high-risk scenarios (score 6-8)`);
+    lines.push(`- Re-run gate after mitigations`);
+  } else {
+    lines.push(`- Proceed with release`);
+  }
+
+  return lines.join('\n');
+}
+```
+
+**Key Points**:
+
+- Gate decision driven by risk scores (not gut feeling)
+- Automatic FAIL for score=9 (blockers)
+- CONCERNS for score 6-8 (requires mitigation)
+- PASS only when no blockers/concerns
+- Actionable summary with next steps
+- Integration with trace workflow (Phase 2)
+
+---
+
+## Probability-Impact Threshold Summary
+
+| Score | Action   | Gate Impact          | Typical Use Case                       |
+| ----- | -------- | -------------------- | -------------------------------------- |
+| 1-3   | DOCUMENT | None                 | Cosmetic issues, low-priority bugs     |
+| 4-5   | MONITOR  | None (watch closely) | Edge cases, partial unknowns           |
+| 6-8   | MITIGATE | CONCERNS at gate     | High-impact scenarios needing coverage |
+| 9     | BLOCK    | Automatic FAIL       | Critical blockers, must resolve        |
+
+## Risk Assessment Checklist
+
+Before deploying risk matrix:
+
+- [ ] **Probability scale defined**: 1 (unlikely), 2 (possible), 3 (likely) with clear examples
+- [ ] **Impact scale defined**: 1 (minor), 2 (degraded), 3 (critical) with concrete criteria
+- [ ] **Threshold rules documented**: Score → Action mapping (1-3 = DOCUMENT, 4-5 = MONITOR, 6-8 = MITIGATE, 9 = BLOCK)
+- [ ] **Gate integration**: Risk scores drive gate decisions (PASS/CONCERNS/FAIL/WAIVED)
+- [ ] **Re-assessment process**: Risks re-evaluated as project evolves (requirements change, mitigations applied)
+- [ ] **Audit trail**: Historical tracking for risk changes (who, when, why)
+- [ ] **Mitigation tracking**: Link mitigations to probability reduction (quantify impact)
+- [ ] **Reporting**: Risk matrix visualization, trend reports, gate summaries
+
+## Integration Points
+
+- **Used in workflows**: `*test-design` (initial risk assessment), `*trace` (gate decision Phase 2), `*nfr-assess` (security/performance risks)
+- **Related fragments**: `risk-governance.md` (risk scoring matrix, gate decision engine), `test-priorities-matrix.md` (P0-P3 mapping), `nfr-criteria.md` (impact assessment for NFRs)
+- **Tools**: TypeScript for type safety, markdown for reports, version control for audit trail
+
+_Source: Murat risk model summary, gate decision patterns from production systems, probability-impact matrix from risk governance practices_
--- a/src/modules/bmm/testarch/knowledge/risk-governance.md
+++ b/src/modules/bmm/testarch/knowledge/risk-governance.md
@@ -1,14 +1,615 @@
 # Risk Governance and Gatekeeping

- Score risk as probability (1–3) × impact (1–3); totals ≥6 demand mitigation before approval, 9 mandates a gate failure.
- Classify risks across TECH, SEC, PERF, DATA, BUS, OPS. Document owners, mitigation plans, and deadlines for any score above 4.
- Trace every acceptance criterion to implemented tests; missing coverage must be resolved or explicitly waived before release.
- Gate decisions:
-  - **PASS** – no critical issues remain and evidence is current.
-  - **CONCERNS** – residual risk exists but has owners, actions, and timelines.
-  - **FAIL** – critical issues unresolved or evidence missing.
-  - **WAIVED** – risk accepted with documented approver, rationale, and expiry.
- Maintain a gate history log capturing updates so auditors can follow the decision trail.
- Use the probability/impact scale fragment for shared definitions when scoring teams run the matrix.
+## Principle

-_Source: Murat risk governance notes, gate schema guidance._
+Risk governance transforms subjective "should we ship?" debates into objective, data-driven decisions. By scoring risk (probability × impact), classifying by category (TECH, SEC, PERF, etc.), and tracking mitigation ownership, teams create transparent quality gates that balance speed with safety.
+
+## Rationale
+
+**The Problem**: Without formal risk governance, releases become political—loud voices win, quiet risks hide, and teams discover critical issues in production. "We thought it was fine" isn't a release strategy.
+
+**The Solution**: Risk scoring (1-3 scale for probability and impact, total 1-9) creates shared language. Scores ≥6 demand documented mitigation. Scores = 9 mandate gate failure. Every acceptance criterion maps to a test, and gaps require explicit waivers with owners and expiry dates.
+
+**Why This Matters**:
+
+- Removes ambiguity from release decisions (objective scores vs subjective opinions)
+- Creates audit trail for compliance (FDA, SOC2, ISO require documented risk management)
+- Identifies true blockers early (prevents last-minute production fires)
+- Distributes responsibility (owners, mitigation plans, deadlines for every risk >4)
+
+## Pattern Examples
+
+### Example 1: Risk Scoring Matrix with Automated Classification (TypeScript)
+
+**Context**: Calculate risk scores automatically from test results and categorize by risk type
+
+**Implementation**:
+
+```typescript
+// risk-scoring.ts - Risk classification and scoring system
+export const RISK_CATEGORIES = {
+  TECH: 'TECH', // Technical debt, architecture fragility
+  SEC: 'SEC', // Security vulnerabilities
+  PERF: 'PERF', // Performance degradation
+  DATA: 'DATA', // Data integrity, corruption
+  BUS: 'BUS', // Business logic errors
+  OPS: 'OPS', // Operational issues (deployment, monitoring)
+} as const;
+
+export type RiskCategory = keyof typeof RISK_CATEGORIES;
+
+export type RiskScore = {
+  id: string;
+  category: RiskCategory;
+  title: string;
+  description: string;
+  probability: 1 | 2 | 3; // 1=Low, 2=Medium, 3=High
+  impact: 1 | 2 | 3; // 1=Low, 2=Medium, 3=High
+  score: number; // probability × impact (1-9)
+  owner: string;
+  mitigationPlan?: string;
+  deadline?: Date;
+  status: 'OPEN' | 'MITIGATED' | 'WAIVED' | 'ACCEPTED';
+  waiverReason?: string;
+  waiverApprover?: string;
+  waiverExpiry?: Date;
+};
+
+// Risk scoring rules
+export function calculateRiskScore(probability: 1 | 2 | 3, impact: 1 | 2 | 3): number {
+  return probability * impact;
+}
+
+export function requiresMitigation(score: number): boolean {
+  return score >= 6; // Scores 6-9 demand action
+}
+
+export function isCriticalBlocker(score: number): boolean {
+  return score === 9; // Probability=3 AND Impact=3 → FAIL gate
+}
+
+export function classifyRiskLevel(score: number): 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL' {
+  if (score === 9) return 'CRITICAL';
+  if (score >= 6) return 'HIGH';
+  if (score >= 4) return 'MEDIUM';
+  return 'LOW';
+}
+
+// Example: Risk assessment from test failures
+export function assessTestFailureRisk(failure: {
+  test: string;
+  category: RiskCategory;
+  affectedUsers: number;
+  revenueImpact: number;
+  securityVulnerability: boolean;
+}): RiskScore {
+  // Probability based on test failure frequency (simplified)
+  const probability: 1 | 2 | 3 = 3; // Test failed = High probability
+
+  // Impact based on business context
+  let impact: 1 | 2 | 3 = 1;
+  if (failure.securityVulnerability) impact = 3;
+  else if (failure.revenueImpact > 10000) impact = 3;
+  else if (failure.affectedUsers > 1000) impact = 2;
+  else impact = 1;
+
+  const score = calculateRiskScore(probability, impact);
+
+  return {
+    id: `risk-${Date.now()}`,
+    category: failure.category,
+    title: `Test failure: ${failure.test}`,
+    description: `Affects ${failure.affectedUsers} users, $${failure.revenueImpact} revenue`,
+    probability,
+    impact,
+    score,
+    owner: 'unassigned',
+    status: score === 9 ? 'OPEN' : 'OPEN',
+  };
+}
+```
+
+**Key Points**:
+
+- **Objective scoring**: Probability (1-3) × Impact (1-3) = Score (1-9)
+- **Clear thresholds**: Score ≥6 requires mitigation, score = 9 blocks release
+- **Business context**: Revenue, users, security drive impact calculation
+- **Status tracking**: OPEN → MITIGATED → WAIVED → ACCEPTED lifecycle
+
+---
+
+### Example 2: Gate Decision Engine with Traceability Validation
+
+**Context**: Automated gate decision based on risk scores and test coverage
+
+**Implementation**:
+
+```typescript
+// gate-decision-engine.ts
+export type GateDecision = 'PASS' | 'CONCERNS' | 'FAIL' | 'WAIVED';
+
+export type CoverageGap = {
+  acceptanceCriteria: string;
+  testMissing: string;
+  reason: string;
+};
+
+export type GateResult = {
+  decision: GateDecision;
+  timestamp: Date;
+  criticalRisks: RiskScore[];
+  highRisks: RiskScore[];
+  coverageGaps: CoverageGap[];
+  summary: string;
+  recommendations: string[];
+};
+
+export function evaluateGate(params: { risks: RiskScore[]; coverageGaps: CoverageGap[]; waiverApprover?: string }): GateResult {
+  const { risks, coverageGaps, waiverApprover } = params;
+
+  // Categorize risks
+  const criticalRisks = risks.filter((r) => r.score === 9 && r.status === 'OPEN');
+  const highRisks = risks.filter((r) => r.score >= 6 && r.score < 9 && r.status === 'OPEN');
+  const unresolvedGaps = coverageGaps.filter((g) => !g.reason);
+
+  // Decision logic
+  let decision: GateDecision;
+
+  // FAIL: Critical blockers (score=9) or missing coverage
+  if (criticalRisks.length > 0 || unresolvedGaps.length > 0) {
+    decision = 'FAIL';
+  }
+  // WAIVED: All risks waived by authorized approver
+  else if (risks.every((r) => r.status === 'WAIVED') && waiverApprover) {
+    decision = 'WAIVED';
+  }
+  // CONCERNS: High risks (score 6-8) with mitigation plans
+  else if (highRisks.length > 0 && highRisks.every((r) => r.mitigationPlan && r.owner !== 'unassigned')) {
+    decision = 'CONCERNS';
+  }
+  // PASS: No critical issues, all risks mitigated or low
+  else {
+    decision = 'PASS';
+  }
+
+  // Generate recommendations
+  const recommendations: string[] = [];
+  if (criticalRisks.length > 0) {
+    recommendations.push(`🚨 ${criticalRisks.length} CRITICAL risk(s) must be mitigated before release`);
+  }
+  if (unresolvedGaps.length > 0) {
+    recommendations.push(`📋 ${unresolvedGaps.length} acceptance criteria lack test coverage`);
+  }
+  if (highRisks.some((r) => !r.mitigationPlan)) {
+    recommendations.push(`⚠️  High risks without mitigation plans: assign owners and deadlines`);
+  }
+  if (decision === 'PASS') {
+    recommendations.push(`✅ All risks mitigated or acceptable. Ready for release.`);
+  }
+
+  return {
+    decision,
+    timestamp: new Date(),
+    criticalRisks,
+    highRisks,
+    coverageGaps: unresolvedGaps,
+    summary: generateSummary(decision, risks, unresolvedGaps),
+    recommendations,
+  };
+}
+
+function generateSummary(decision: GateDecision, risks: RiskScore[], gaps: CoverageGap[]): string {
+  const total = risks.length;
+  const critical = risks.filter((r) => r.score === 9).length;
+  const high = risks.filter((r) => r.score >= 6 && r.score < 9).length;
+
+  return `Gate Decision: ${decision}. Total Risks: ${total} (${critical} critical, ${high} high). Coverage Gaps: ${gaps.length}.`;
+}
+```
+
+**Usage Example**:
+
+```typescript
+// Example: Running gate check before deployment
+import { assessTestFailureRisk, evaluateGate } from './gate-decision-engine';
+
+// Collect risks from test results
+const risks: RiskScore[] = [
+  assessTestFailureRisk({
+    test: 'Payment processing with expired card',
+    category: 'BUS',
+    affectedUsers: 5000,
+    revenueImpact: 50000,
+    securityVulnerability: false,
+  }),
+  assessTestFailureRisk({
+    test: 'SQL injection in search endpoint',
+    category: 'SEC',
+    affectedUsers: 10000,
+    revenueImpact: 0,
+    securityVulnerability: true,
+  }),
+];
+
+// Identify coverage gaps
+const coverageGaps: CoverageGap[] = [
+  {
+    acceptanceCriteria: 'User can reset password via email',
+    testMissing: 'e2e/auth/password-reset.spec.ts',
+    reason: '', // Empty = unresolved
+  },
+];
+
+// Evaluate gate
+const gateResult = evaluateGate({ risks, coverageGaps });
+
+console.log(gateResult.decision); // 'FAIL'
+console.log(gateResult.summary);
+// "Gate Decision: FAIL. Total Risks: 2 (1 critical, 1 high). Coverage Gaps: 1."
+
+console.log(gateResult.recommendations);
+// [
+//   "🚨 1 CRITICAL risk(s) must be mitigated before release",
+//   "📋 1 acceptance criteria lack test coverage"
+// ]
+```
+
+**Key Points**:
+
+- **Automated decision**: No human interpretation required
+- **Clear criteria**: FAIL = critical risks or gaps, CONCERNS = high risks with plans, PASS = low risks
+- **Actionable output**: Recommendations drive next steps
+- **Audit trail**: Timestamp, decision, and context for compliance
+
+---
+
+### Example 3: Risk Mitigation Workflow with Owner Tracking
+
+**Context**: Track risk mitigation from identification to resolution
+
+**Implementation**:
+
+```typescript
+// risk-mitigation.ts
+export type MitigationAction = {
+  riskId: string;
+  action: string;
+  owner: string;
+  deadline: Date;
+  status: 'PENDING' | 'IN_PROGRESS' | 'COMPLETED' | 'BLOCKED';
+  completedAt?: Date;
+  blockedReason?: string;
+};
+
+export class RiskMitigationTracker {
+  private risks: Map<string, RiskScore> = new Map();
+  private actions: Map<string, MitigationAction[]> = new Map();
+  private history: Array<{ riskId: string; event: string; timestamp: Date }> = [];
+
+  // Register a new risk
+  addRisk(risk: RiskScore): void {
+    this.risks.set(risk.id, risk);
+    this.logHistory(risk.id, `Risk registered: ${risk.title} (Score: ${risk.score})`);
+
+    // Auto-assign mitigation requirements for score ≥6
+    if (requiresMitigation(risk.score) && !risk.mitigationPlan) {
+      this.logHistory(risk.id, `⚠️  Mitigation required (score ${risk.score}). Assign owner and plan.`);
+    }
+  }
+
+  // Add mitigation action
+  addMitigationAction(action: MitigationAction): void {
+    const risk = this.risks.get(action.riskId);
+    if (!risk) throw new Error(`Risk ${action.riskId} not found`);
+
+    const existingActions = this.actions.get(action.riskId) || [];
+    existingActions.push(action);
+    this.actions.set(action.riskId, existingActions);
+
+    this.logHistory(action.riskId, `Mitigation action added: ${action.action} (Owner: ${action.owner})`);
+  }
+
+  // Complete mitigation action
+  completeMitigation(riskId: string, actionIndex: number): void {
+    const actions = this.actions.get(riskId);
+    if (!actions || !actions[actionIndex]) throw new Error('Action not found');
+
+    actions[actionIndex].status = 'COMPLETED';
+    actions[actionIndex].completedAt = new Date();
+
+    this.logHistory(riskId, `Mitigation completed: ${actions[actionIndex].action}`);
+
+    // If all actions completed, mark risk as MITIGATED
+    if (actions.every((a) => a.status === 'COMPLETED')) {
+      const risk = this.risks.get(riskId)!;
+      risk.status = 'MITIGATED';
+      this.logHistory(riskId, `✅ Risk mitigated. All actions complete.`);
+    }
+  }
+
+  // Request waiver for a risk
+  requestWaiver(riskId: string, reason: string, approver: string, expiryDays: number): void {
+    const risk = this.risks.get(riskId);
+    if (!risk) throw new Error(`Risk ${riskId} not found`);
+
+    risk.status = 'WAIVED';
+    risk.waiverReason = reason;
+    risk.waiverApprover = approver;
+    risk.waiverExpiry = new Date(Date.now() + expiryDays * 24 * 60 * 60 * 1000);
+
+    this.logHistory(riskId, `⚠️  Waiver granted by ${approver}. Expires: ${risk.waiverExpiry}`);
+  }
+
+  // Generate risk report
+  generateReport(): string {
+    const allRisks = Array.from(this.risks.values());
+    const critical = allRisks.filter((r) => r.score === 9 && r.status === 'OPEN');
+    const high = allRisks.filter((r) => r.score >= 6 && r.score < 9 && r.status === 'OPEN');
+    const mitigated = allRisks.filter((r) => r.status === 'MITIGATED');
+    const waived = allRisks.filter((r) => r.status === 'WAIVED');
+
+    let report = `# Risk Mitigation Report\n\n`;
+    report += `**Generated**: ${new Date().toISOString()}\n\n`;
+    report += `## Summary\n`;
+    report += `- Total Risks: ${allRisks.length}\n`;
+    report += `- Critical (Score=9, OPEN): ${critical.length}\n`;
+    report += `- High (Score 6-8, OPEN): ${high.length}\n`;
+    report += `- Mitigated: ${mitigated.length}\n`;
+    report += `- Waived: ${waived.length}\n\n`;
+
+    if (critical.length > 0) {
+      report += `## 🚨 Critical Risks (BLOCKERS)\n\n`;
+      critical.forEach((r) => {
+        report += `- **${r.title}** (${r.category})\n`;
+        report += `  - Score: ${r.score} (Probability: ${r.probability}, Impact: ${r.impact})\n`;
+        report += `  - Owner: ${r.owner}\n`;
+        report += `  - Mitigation: ${r.mitigationPlan || 'NOT ASSIGNED'}\n\n`;
+      });
+    }
+
+    if (high.length > 0) {
+      report += `## ⚠️  High Risks\n\n`;
+      high.forEach((r) => {
+        report += `- **${r.title}** (${r.category})\n`;
+        report += `  - Score: ${r.score}\n`;
+        report += `  - Owner: ${r.owner}\n`;
+        report += `  - Deadline: ${r.deadline?.toISOString().split('T')[0] || 'NOT SET'}\n\n`;
+      });
+    }
+
+    return report;
+  }
+
+  private logHistory(riskId: string, event: string): void {
+    this.history.push({ riskId, event, timestamp: new Date() });
+  }
+
+  getHistory(riskId: string): Array<{ event: string; timestamp: Date }> {
+    return this.history.filter((h) => h.riskId === riskId).map((h) => ({ event: h.event, timestamp: h.timestamp }));
+  }
+}
+```
+
+**Usage Example**:
+
+```typescript
+const tracker = new RiskMitigationTracker();
+
+// Register critical security risk
+tracker.addRisk({
+  id: 'risk-001',
+  category: 'SEC',
+  title: 'SQL injection vulnerability in user search',
+  description: 'Unsanitized input allows arbitrary SQL execution',
+  probability: 3,
+  impact: 3,
+  score: 9,
+  owner: 'security-team',
+  status: 'OPEN',
+});
+
+// Add mitigation actions
+tracker.addMitigationAction({
+  riskId: 'risk-001',
+  action: 'Add parameterized queries to user-search endpoint',
+  owner: 'alice@example.com',
+  deadline: new Date('2025-10-20'),
+  status: 'IN_PROGRESS',
+});
+
+tracker.addMitigationAction({
+  riskId: 'risk-001',
+  action: 'Add WAF rule to block SQL injection patterns',
+  owner: 'bob@example.com',
+  deadline: new Date('2025-10-22'),
+  status: 'PENDING',
+});
+
+// Complete first action
+tracker.completeMitigation('risk-001', 0);
+
+// Generate report
+console.log(tracker.generateReport());
+// Markdown report with critical risks, owners, deadlines
+
+// View history
+console.log(tracker.getHistory('risk-001'));
+// [
+//   { event: 'Risk registered: SQL injection...', timestamp: ... },
+//   { event: 'Mitigation action added: Add parameterized queries...', timestamp: ... },
+//   { event: 'Mitigation completed: Add parameterized queries...', timestamp: ... }
+// ]
+```
+
+**Key Points**:
+
+- **Ownership enforcement**: Every risk >4 requires owner assignment
+- **Deadline tracking**: Mitigation actions have explicit deadlines
+- **Audit trail**: Complete history of risk lifecycle (registered → mitigated)
+- **Automated reports**: Markdown output for Confluence/GitHub wikis
+
+---
+
+### Example 4: Coverage Traceability Matrix (Test-to-Requirement Mapping)
+
+**Context**: Validate that every acceptance criterion maps to at least one test
+
+**Implementation**:
+
+```typescript
+// coverage-traceability.ts
+export type AcceptanceCriterion = {
+  id: string;
+  story: string;
+  criterion: string;
+  priority: 'P0' | 'P1' | 'P2' | 'P3';
+};
+
+export type TestCase = {
+  file: string;
+  name: string;
+  criteriaIds: string[]; // Links to acceptance criteria
+};
+
+export type CoverageMatrix = {
+  criterion: AcceptanceCriterion;
+  tests: TestCase[];
+  covered: boolean;
+  waiverReason?: string;
+};
+
+export function buildCoverageMatrix(criteria: AcceptanceCriterion[], tests: TestCase[]): CoverageMatrix[] {
+  return criteria.map((criterion) => {
+    const matchingTests = tests.filter((t) => t.criteriaIds.includes(criterion.id));
+
+    return {
+      criterion,
+      tests: matchingTests,
+      covered: matchingTests.length > 0,
+    };
+  });
+}
+
+export function validateCoverage(matrix: CoverageMatrix[]): {
+  gaps: CoverageMatrix[];
+  passRate: number;
+} {
+  const gaps = matrix.filter((m) => !m.covered && !m.waiverReason);
+  const passRate = ((matrix.length - gaps.length) / matrix.length) * 100;
+
+  return { gaps, passRate };
+}
+
+// Example: Extract criteria IDs from test names
+export function extractCriteriaFromTests(testFiles: string[]): TestCase[] {
+  // Simplified: In real implementation, parse test files with AST
+  // Here we simulate extraction from test names
+  return [
+    {
+      file: 'tests/e2e/auth/login.spec.ts',
+      name: 'should allow user to login with valid credentials',
+      criteriaIds: ['AC-001', 'AC-002'], // Linked to acceptance criteria
+    },
+    {
+      file: 'tests/e2e/auth/password-reset.spec.ts',
+      name: 'should send password reset email',
+      criteriaIds: ['AC-003'],
+    },
+  ];
+}
+
+// Generate Markdown traceability report
+export function generateTraceabilityReport(matrix: CoverageMatrix[]): string {
+  let report = `# Requirements-to-Tests Traceability Matrix\n\n`;
+  report += `**Generated**: ${new Date().toISOString()}\n\n`;
+
+  const { gaps, passRate } = validateCoverage(matrix);
+
+  report += `## Summary\n`;
+  report += `- Total Criteria: ${matrix.length}\n`;
+  report += `- Covered: ${matrix.filter((m) => m.covered).length}\n`;
+  report += `- Gaps: ${gaps.length}\n`;
+  report += `- Waived: ${matrix.filter((m) => m.waiverReason).length}\n`;
+  report += `- Coverage Rate: ${passRate.toFixed(1)}%\n\n`;
+
+  if (gaps.length > 0) {
+    report += `## ❌ Coverage Gaps (MUST RESOLVE)\n\n`;
+    report += `| Story | Criterion | Priority | Tests |\n`;
+    report += `|-------|-----------|----------|-------|\n`;
+    gaps.forEach((m) => {
+      report += `| ${m.criterion.story} | ${m.criterion.criterion} | ${m.criterion.priority} | None |\n`;
+    });
+    report += `\n`;
+  }
+
+  report += `## ✅ Covered Criteria\n\n`;
+  report += `| Story | Criterion | Tests |\n`;
+  report += `|-------|-----------|-------|\n`;
+  matrix
+    .filter((m) => m.covered)
+    .forEach((m) => {
+      const testList = m.tests.map((t) => `\`${t.file}\``).join(', ');
+      report += `| ${m.criterion.story} | ${m.criterion.criterion} | ${testList} |\n`;
+    });
+
+  return report;
+}
+```
+
+**Usage Example**:
+
+```typescript
+// Define acceptance criteria
+const criteria: AcceptanceCriterion[] = [
+  { id: 'AC-001', story: 'US-123', criterion: 'User can login with email', priority: 'P0' },
+  { id: 'AC-002', story: 'US-123', criterion: 'User sees error on invalid password', priority: 'P0' },
+  { id: 'AC-003', story: 'US-124', criterion: 'User receives password reset email', priority: 'P1' },
+  { id: 'AC-004', story: 'US-125', criterion: 'User can update profile', priority: 'P2' }, // NO TEST
+];
+
+// Extract tests
+const tests: TestCase[] = extractCriteriaFromTests(['tests/e2e/auth/login.spec.ts', 'tests/e2e/auth/password-reset.spec.ts']);
+
+// Build matrix
+const matrix = buildCoverageMatrix(criteria, tests);
+
+// Validate
+const { gaps, passRate } = validateCoverage(matrix);
+console.log(`Coverage: ${passRate.toFixed(1)}%`); // "Coverage: 75.0%"
+console.log(`Gaps: ${gaps.length}`); // "Gaps: 1" (AC-004 has no test)
+
+// Generate report
+const report = generateTraceabilityReport(matrix);
+console.log(report);
+// Markdown table showing coverage gaps
+```
+
+**Key Points**:
+
+- **Bidirectional traceability**: Criteria → Tests and Tests → Criteria
+- **Gap detection**: Automatically identifies missing coverage
+- **Priority awareness**: P0 gaps are critical blockers
+- **Waiver support**: Allow explicit waivers for low-priority gaps
+
+---
+
+## Risk Governance Checklist
+
+Before deploying to production, ensure:
+
+- [ ] **Risk scoring complete**: All identified risks scored (Probability × Impact)
+- [ ] **Ownership assigned**: Every risk >4 has owner, mitigation plan, deadline
+- [ ] **Coverage validated**: Every acceptance criterion maps to at least one test
+- [ ] **Gate decision documented**: PASS/CONCERNS/FAIL/WAIVED with rationale
+- [ ] **Waivers approved**: All waivers have approver, reason, expiry date
+- [ ] **Audit trail captured**: Risk history log available for compliance review
+- [ ] **Traceability matrix**: Requirements-to-tests mapping up to date
+- [ ] **Critical risks resolved**: No score=9 risks in OPEN status
+
+## Integration Points
+
+- **Used in workflows**: `*trace` (Phase 2: gate decision), `*nfr-assess` (risk scoring), `*test-design` (risk identification)
+- **Related fragments**: `probability-impact.md` (scoring definitions), `test-priorities-matrix.md` (P0-P3 classification), `nfr-criteria.md` (non-functional risks)
+- **Tools**: Risk tracking dashboards (Jira, Linear), gate automation (CI/CD), traceability reports (Markdown, Confluence)
+
+_Source: Murat risk governance notes, gate schema guidance, SEON production gate workflows, ISO 31000 risk management standards_
--- a/src/modules/bmm/testarch/knowledge/selective-testing.md
+++ b/src/modules/bmm/testarch/knowledge/selective-testing.md
@@ -1,9 +1,732 @@
 # Selective and Targeted Test Execution

- Use tags/grep (`--grep "@smoke"`, `--grep "@critical"`) to slice suites by risk, not directory.
- Filter by spec patterns (`--spec "**/*checkout*"`) or git diff (`npm run test:changed`) to focus on impacted areas.
- Combine priority metadata (P0–P3) with change detection to decide which levels to run pre-commit vs. in CI.
- Record burn-in history for newly added specs; promote to main suite only after consistent green runs.
- Document the selection strategy in README/CI so the team understands when full regression is mandatory.
+## Principle

-_Source: 32+ selective testing strategies blog, Murat testing philosophy._
+Run only the tests you need, when you need them. Use tags/grep to slice suites by risk priority (not directory structure), filter by spec patterns or git diff to focus on impacted areas, and combine priority metadata (P0-P3) with change detection to optimize pre-commit vs. CI execution. Document the selection strategy clearly so teams understand when full regression is mandatory.
+
+## Rationale
+
+Running the entire test suite on every commit wastes time and resources. Smart test selection provides fast feedback (smoke tests in minutes, full regression in hours) while maintaining confidence. The "32+ ways of selective testing" philosophy balances speed with coverage: quick loops for developers, comprehensive validation before deployment. Poorly documented selection leads to confusion about when tests run and why.
+
+## Pattern Examples
+
+### Example 1: Tag-Based Execution with Priority Levels
+
+**Context**: Organize tests by risk priority and execution stage using grep/tag patterns.
+
+**Implementation**:
+
+```typescript
+// tests/e2e/checkout.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Tag-based test organization
+ * - @smoke: Critical path tests (run on every commit, < 5 min)
+ * - @regression: Full test suite (run pre-merge, < 30 min)
+ * - @p0: Critical business functions (payment, auth, data integrity)
+ * - @p1: Core features (primary user journeys)
+ * - @p2: Secondary features (supporting functionality)
+ * - @p3: Nice-to-have (cosmetic, non-critical)
+ */
+
+test.describe('Checkout Flow', () => {
+  // P0 + Smoke: Must run on every commit
+  test('@smoke @p0 should complete purchase with valid payment', async ({ page }) => {
+    await page.goto('/checkout');
+    await page.getByTestId('card-number').fill('4242424242424242');
+    await page.getByTestId('submit-payment').click();
+
+    await expect(page.getByTestId('order-confirmation')).toBeVisible();
+  });
+
+  // P0 but not smoke: Run pre-merge
+  test('@regression @p0 should handle payment decline gracefully', async ({ page }) => {
+    await page.goto('/checkout');
+    await page.getByTestId('card-number').fill('4000000000000002'); // Decline card
+    await page.getByTestId('submit-payment').click();
+
+    await expect(page.getByTestId('payment-error')).toBeVisible();
+    await expect(page.getByTestId('payment-error')).toContainText('declined');
+  });
+
+  // P1 + Smoke: Important but not critical
+  test('@smoke @p1 should apply discount code', async ({ page }) => {
+    await page.goto('/checkout');
+    await page.getByTestId('promo-code').fill('SAVE10');
+    await page.getByTestId('apply-promo').click();
+
+    await expect(page.getByTestId('discount-applied')).toBeVisible();
+  });
+
+  // P2: Run in full regression only
+  test('@regression @p2 should remember saved payment methods', async ({ page }) => {
+    await page.goto('/checkout');
+    await expect(page.getByTestId('saved-cards')).toBeVisible();
+  });
+
+  // P3: Low priority, run nightly or weekly
+  test('@nightly @p3 should display checkout page analytics', async ({ page }) => {
+    await page.goto('/checkout');
+    const analyticsEvents = await page.evaluate(() => (window as any).__ANALYTICS__);
+    expect(analyticsEvents).toBeDefined();
+  });
+});
+```
+
+**package.json scripts**:
+
+```json
+{
+  "scripts": {
+    "test": "playwright test",
+    "test:smoke": "playwright test --grep '@smoke'",
+    "test:p0": "playwright test --grep '@p0'",
+    "test:p0-p1": "playwright test --grep '@p0|@p1'",
+    "test:regression": "playwright test --grep '@regression'",
+    "test:nightly": "playwright test --grep '@nightly'",
+    "test:not-slow": "playwright test --grep-invert '@slow'",
+    "test:critical-smoke": "playwright test --grep '@smoke.*@p0'"
+  }
+}
+```
+
+**Cypress equivalent**:
+
+```javascript
+// cypress/e2e/checkout.cy.ts
+describe('Checkout Flow', { tags: ['@checkout'] }, () => {
+  it('should complete purchase', { tags: ['@smoke', '@p0'] }, () => {
+    cy.visit('/checkout');
+    cy.get('[data-cy="card-number"]').type('4242424242424242');
+    cy.get('[data-cy="submit-payment"]').click();
+    cy.get('[data-cy="order-confirmation"]').should('be.visible');
+  });
+
+  it('should handle decline', { tags: ['@regression', '@p0'] }, () => {
+    cy.visit('/checkout');
+    cy.get('[data-cy="card-number"]').type('4000000000000002');
+    cy.get('[data-cy="submit-payment"]').click();
+    cy.get('[data-cy="payment-error"]').should('be.visible');
+  });
+});
+
+// cypress.config.ts
+export default defineConfig({
+  e2e: {
+    env: {
+      grepTags: process.env.GREP_TAGS || '',
+      grepFilterSpecs: true,
+    },
+    setupNodeEvents(on, config) {
+      require('@cypress/grep/src/plugin')(config);
+      return config;
+    },
+  },
+});
+```
+
+**Usage**:
+
+```bash
+# Playwright
+npm run test:smoke                    # Run all @smoke tests
+npm run test:p0                       # Run all P0 tests
+npm run test -- --grep "@smoke.*@p0"  # Run tests with BOTH tags
+
+# Cypress (with @cypress/grep plugin)
+npx cypress run --env grepTags="@smoke"
+npx cypress run --env grepTags="@p0+@smoke"  # AND logic
+npx cypress run --env grepTags="@p0 @p1"     # OR logic
+```
+
+**Key Points**:
+
+- **Multiple tags per test**: Combine priority (@p0) with stage (@smoke)
+- **AND/OR logic**: Grep supports complex filtering
+- **Clear naming**: Tags document test importance
+- **Fast feedback**: @smoke runs < 5 min, full suite < 30 min
+- **CI integration**: Different jobs run different tag combinations
+
+---
+
+### Example 2: Spec Filter Pattern (File-Based Selection)
+
+**Context**: Run tests by file path pattern or directory for targeted execution.
+
+**Implementation**:
+
+```bash
+#!/bin/bash
+# scripts/selective-spec-runner.sh
+# Run tests based on spec file patterns
+
+set -e
+
+PATTERN=${1:-"**/*.spec.ts"}
+TEST_ENV=${TEST_ENV:-local}
+
+echo "🎯 Selective Spec Runner"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Pattern: $PATTERN"
+echo "Environment: $TEST_ENV"
+echo ""
+
+# Pattern examples and their use cases
+case "$PATTERN" in
+  "**/checkout*")
+    echo "📦 Running checkout-related tests"
+    npx playwright test --grep-files="**/checkout*"
+    ;;
+  "**/auth*"|"**/login*"|"**/signup*")
+    echo "🔐 Running authentication tests"
+    npx playwright test --grep-files="**/auth*|**/login*|**/signup*"
+    ;;
+  "tests/e2e/**")
+    echo "🌐 Running all E2E tests"
+    npx playwright test tests/e2e/
+    ;;
+  "tests/integration/**")
+    echo "🔌 Running all integration tests"
+    npx playwright test tests/integration/
+    ;;
+  "tests/component/**")
+    echo "🧩 Running all component tests"
+    npx playwright test tests/component/
+    ;;
+  *)
+    echo "🔍 Running tests matching pattern: $PATTERN"
+    npx playwright test "$PATTERN"
+    ;;
+esac
+```
+
+**Playwright config for file filtering**:
+
+```typescript
+// playwright.config.ts
+import { defineConfig, devices } from '@playwright/test';
+
+export default defineConfig({
+  // ... other config
+
+  // Project-based organization
+  projects: [
+    {
+      name: 'smoke',
+      testMatch: /.*smoke.*\.spec\.ts/,
+      retries: 0,
+    },
+    {
+      name: 'e2e',
+      testMatch: /tests\/e2e\/.*\.spec\.ts/,
+      retries: 2,
+    },
+    {
+      name: 'integration',
+      testMatch: /tests\/integration\/.*\.spec\.ts/,
+      retries: 1,
+    },
+    {
+      name: 'component',
+      testMatch: /tests\/component\/.*\.spec\.ts/,
+      use: { ...devices['Desktop Chrome'] },
+    },
+  ],
+});
+```
+
+**Advanced pattern matching**:
+
+```typescript
+// scripts/run-by-component.ts
+/**
+ * Run tests related to specific component(s)
+ * Usage: npm run test:component UserProfile,Settings
+ */
+
+import { execSync } from 'child_process';
+
+const components = process.argv[2]?.split(',') || [];
+
+if (components.length === 0) {
+  console.error('❌ No components specified');
+  console.log('Usage: npm run test:component UserProfile,Settings');
+  process.exit(1);
+}
+
+// Convert component names to glob patterns
+const patterns = components.map((comp) => `**/*${comp}*.spec.ts`).join(' ');
+
+console.log(`🧩 Running tests for components: ${components.join(', ')}`);
+console.log(`Patterns: ${patterns}`);
+
+try {
+  execSync(`npx playwright test ${patterns}`, {
+    stdio: 'inherit',
+    env: { ...process.env, CI: 'false' },
+  });
+} catch (error) {
+  process.exit(1);
+}
+```
+
+**package.json scripts**:
+
+```json
+{
+  "scripts": {
+    "test:checkout": "playwright test **/checkout*.spec.ts",
+    "test:auth": "playwright test **/auth*.spec.ts **/login*.spec.ts",
+    "test:e2e": "playwright test tests/e2e/",
+    "test:integration": "playwright test tests/integration/",
+    "test:component": "ts-node scripts/run-by-component.ts",
+    "test:project": "playwright test --project",
+    "test:smoke-project": "playwright test --project smoke"
+  }
+}
+```
+
+**Key Points**:
+
+- **Glob patterns**: Wildcards match file paths flexibly
+- **Project isolation**: Separate projects have different configs
+- **Component targeting**: Run tests for specific features
+- **Directory-based**: Organize tests by type (e2e, integration, component)
+- **CI optimization**: Run subsets in parallel CI jobs
+
+---
+
+### Example 3: Diff-Based Test Selection (Changed Files Only)
+
+**Context**: Run only tests affected by code changes for maximum speed.
+
+**Implementation**:
+
+```bash
+#!/bin/bash
+# scripts/test-changed-files.sh
+# Intelligent test selection based on git diff
+
+set -e
+
+BASE_BRANCH=${BASE_BRANCH:-main}
+TEST_ENV=${TEST_ENV:-local}
+
+echo "🔍 Changed File Test Selector"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Base branch: $BASE_BRANCH"
+echo "Environment: $TEST_ENV"
+echo ""
+
+# Get changed files
+CHANGED_FILES=$(git diff --name-only $BASE_BRANCH...HEAD)
+
+if [ -z "$CHANGED_FILES" ]; then
+  echo "✅ No files changed. Skipping tests."
+  exit 0
+fi
+
+echo "Changed files:"
+echo "$CHANGED_FILES" | sed 's/^/  - /'
+echo ""
+
+# Arrays to collect test specs
+DIRECT_TEST_FILES=()
+RELATED_TEST_FILES=()
+RUN_ALL_TESTS=false
+
+# Process each changed file
+while IFS= read -r file; do
+  case "$file" in
+    # Changed test files: run them directly
+    *.spec.ts|*.spec.js|*.test.ts|*.test.js|*.cy.ts|*.cy.js)
+      DIRECT_TEST_FILES+=("$file")
+      ;;
+
+    # Critical config changes: run ALL tests
+    package.json|package-lock.json|playwright.config.ts|cypress.config.ts|tsconfig.json|.github/workflows/*)
+      echo "⚠️  Critical file changed: $file"
+      RUN_ALL_TESTS=true
+      break
+      ;;
+
+    # Component changes: find related tests
+    src/components/*.tsx|src/components/*.jsx)
+      COMPONENT_NAME=$(basename "$file" | sed 's/\.[^.]*$//')
+      echo "🧩 Component changed: $COMPONENT_NAME"
+
+      # Find tests matching component name
+      FOUND_TESTS=$(find tests -name "*${COMPONENT_NAME}*.spec.ts" -o -name "*${COMPONENT_NAME}*.cy.ts" 2>/dev/null || true)
+      if [ -n "$FOUND_TESTS" ]; then
+        while IFS= read -r test_file; do
+          RELATED_TEST_FILES+=("$test_file")
+        done <<< "$FOUND_TESTS"
+      fi
+      ;;
+
+    # Utility/lib changes: run integration + unit tests
+    src/utils/*|src/lib/*|src/helpers/*)
+      echo "⚙️  Utility file changed: $file"
+      RELATED_TEST_FILES+=($(find tests/unit tests/integration -name "*.spec.ts" 2>/dev/null || true))
+      ;;
+
+    # API changes: run integration + e2e tests
+    src/api/*|src/services/*|src/controllers/*)
+      echo "🔌 API file changed: $file"
+      RELATED_TEST_FILES+=($(find tests/integration tests/e2e -name "*.spec.ts" 2>/dev/null || true))
+      ;;
+
+    # Type changes: run all TypeScript tests
+    *.d.ts|src/types/*)
+      echo "📝 Type definition changed: $file"
+      RUN_ALL_TESTS=true
+      break
+      ;;
+
+    # Documentation only: skip tests
+    *.md|docs/*|README*)
+      echo "📄 Documentation changed: $file (no tests needed)"
+      ;;
+
+    *)
+      echo "❓ Unclassified change: $file (running smoke tests)"
+      RELATED_TEST_FILES+=($(find tests -name "*smoke*.spec.ts" 2>/dev/null || true))
+      ;;
+  esac
+done <<< "$CHANGED_FILES"
+
+# Execute tests based on analysis
+if [ "$RUN_ALL_TESTS" = true ]; then
+  echo ""
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  echo "🚨 Running FULL test suite (critical changes detected)"
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  npm run test
+  exit $?
+fi
+
+# Combine and deduplicate test files
+ALL_TEST_FILES=(${DIRECT_TEST_FILES[@]} ${RELATED_TEST_FILES[@]})
+UNIQUE_TEST_FILES=($(echo "${ALL_TEST_FILES[@]}" | tr ' ' '\n' | sort -u))
+
+if [ ${#UNIQUE_TEST_FILES[@]} -eq 0 ]; then
+  echo ""
+  echo "✅ No tests found for changed files. Running smoke tests."
+  npm run test:smoke
+  exit $?
+fi
+
+echo ""
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "🎯 Running ${#UNIQUE_TEST_FILES[@]} test file(s)"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+for test_file in "${UNIQUE_TEST_FILES[@]}"; do
+  echo "  - $test_file"
+done
+
+echo ""
+npm run test -- "${UNIQUE_TEST_FILES[@]}"
+```
+
+**GitHub Actions integration**:
+
+```yaml
+# .github/workflows/test-changed.yml
+name: Test Changed Files
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]
+
+jobs:
+  detect-and-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0 # Full history for accurate diff
+
+      - name: Get changed files
+        id: changed-files
+        uses: tj-actions/changed-files@v40
+        with:
+          files: |
+            src/**
+            tests/**
+            *.config.ts
+          files_ignore: |
+            **/*.md
+            docs/**
+
+      - name: Run tests for changed files
+        if: steps.changed-files.outputs.any_changed == 'true'
+        run: |
+          echo "Changed files: ${{ steps.changed-files.outputs.all_changed_files }}"
+          bash scripts/test-changed-files.sh
+        env:
+          BASE_BRANCH: ${{ github.base_ref }}
+          TEST_ENV: staging
+```
+
+**Key Points**:
+
+- **Intelligent mapping**: Code changes → related tests
+- **Critical file detection**: Config changes = full suite
+- **Component mapping**: UI changes → component + E2E tests
+- **Fast feedback**: Run only what's needed (< 2 min typical)
+- **Safety net**: Unrecognized changes run smoke tests
+
+---
+
+### Example 4: Promotion Rules (Pre-Commit → CI → Staging → Production)
+
+**Context**: Progressive test execution strategy across deployment stages.
+
+**Implementation**:
+
+```typescript
+// scripts/test-promotion-strategy.ts
+/**
+ * Test Promotion Strategy
+ * Defines which tests run at each stage of the development lifecycle
+ */
+
+export type TestStage = 'pre-commit' | 'ci-pr' | 'ci-merge' | 'staging' | 'production';
+
+export type TestPromotion = {
+  stage: TestStage;
+  description: string;
+  testCommand: string;
+  timebudget: string; // minutes
+  required: boolean;
+  failureAction: 'block' | 'warn' | 'alert';
+};
+
+export const TEST_PROMOTION_RULES: Record<TestStage, TestPromotion> = {
+  'pre-commit': {
+    stage: 'pre-commit',
+    description: 'Local developer checks before git commit',
+    testCommand: 'npm run test:smoke',
+    timebudget: '2',
+    required: true,
+    failureAction: 'block',
+  },
+  'ci-pr': {
+    stage: 'ci-pr',
+    description: 'CI checks on pull request creation/update',
+    testCommand: 'npm run test:changed && npm run test:p0-p1',
+    timebudget: '10',
+    required: true,
+    failureAction: 'block',
+  },
+  'ci-merge': {
+    stage: 'ci-merge',
+    description: 'Full regression before merge to main',
+    testCommand: 'npm run test:regression',
+    timebudget: '30',
+    required: true,
+    failureAction: 'block',
+  },
+  staging: {
+    stage: 'staging',
+    description: 'Post-deployment validation in staging environment',
+    testCommand: 'npm run test:e2e -- --grep "@smoke"',
+    timebudget: '15',
+    required: true,
+    failureAction: 'block',
+  },
+  production: {
+    stage: 'production',
+    description: 'Production smoke tests post-deployment',
+    testCommand: 'npm run test:e2e:prod -- --grep "@smoke.*@p0"',
+    timebudget: '5',
+    required: false,
+    failureAction: 'alert',
+  },
+};
+
+/**
+ * Get tests to run for a specific stage
+ */
+export function getTestsForStage(stage: TestStage): TestPromotion {
+  return TEST_PROMOTION_RULES[stage];
+}
+
+/**
+ * Validate if tests can be promoted to next stage
+ */
+export function canPromote(currentStage: TestStage, testsPassed: boolean): boolean {
+  const promotion = TEST_PROMOTION_RULES[currentStage];
+
+  if (!promotion.required) {
+    return true; // Non-required tests don't block promotion
+  }
+
+  return testsPassed;
+}
+```
+
+**Husky pre-commit hook**:
+
+```bash
+#!/bin/bash
+# .husky/pre-commit
+# Run smoke tests before allowing commit
+
+echo "🔍 Running pre-commit tests..."
+
+npm run test:smoke
+
+if [ $? -ne 0 ]; then
+  echo ""
+  echo "❌ Pre-commit tests failed!"
+  echo "Please fix failures before committing."
+  echo ""
+  echo "To skip (NOT recommended): git commit --no-verify"
+  exit 1
+fi
+
+echo "✅ Pre-commit tests passed"
+```
+
+**GitHub Actions workflow**:
+
+```yaml
+# .github/workflows/test-promotion.yml
+name: Test Promotion Strategy
+on:
+  pull_request:
+  push:
+    branches: [main]
+  workflow_dispatch:
+
+jobs:
+  # Stage 1: PR tests (changed + P0-P1)
+  pr-tests:
+    if: github.event_name == 'pull_request'
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run PR-level tests
+        run: |
+          npm run test:changed
+          npm run test:p0-p1
+
+  # Stage 2: Full regression (pre-merge)
+  regression-tests:
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run full regression
+        run: npm run test:regression
+
+  # Stage 3: Staging validation (post-deploy)
+  staging-smoke:
+    if: github.event_name == 'workflow_dispatch'
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run staging smoke tests
+        run: npm run test:e2e -- --grep "@smoke"
+        env:
+          TEST_ENV: staging
+
+  # Stage 4: Production smoke (post-deploy, non-blocking)
+  production-smoke:
+    if: github.event_name == 'workflow_dispatch'
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    continue-on-error: true # Don't fail deployment if smoke tests fail
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run production smoke tests
+        run: npm run test:e2e:prod -- --grep "@smoke.*@p0"
+        env:
+          TEST_ENV: production
+
+      - name: Alert on failure
+        if: failure()
+        uses: 8398a7/action-slack@v3
+        with:
+          status: ${{ job.status }}
+          text: '🚨 Production smoke tests failed!'
+          webhook_url: ${{ secrets.SLACK_WEBHOOK }}
+```
+
+**Selection strategy documentation**:
+
+````markdown
+# Test Selection Strategy
+
+## Test Promotion Stages
+
+| Stage      | Tests Run           | Time Budget | Blocks Deploy | Failure Action |
+| ---------- | ------------------- | ----------- | ------------- | -------------- |
+| Pre-Commit | Smoke (@smoke)      | 2 min       | ✅ Yes        | Block commit   |
+| CI PR      | Changed + P0-P1     | 10 min      | ✅ Yes        | Block merge    |
+| CI Merge   | Full regression     | 30 min      | ✅ Yes        | Block deploy   |
+| Staging    | E2E smoke           | 15 min      | ✅ Yes        | Rollback       |
+| Production | Critical smoke only | 5 min       | ❌ No         | Alert team     |
+
+## When Full Regression Runs
+
+Full regression suite (`npm run test:regression`) runs in these scenarios:
+
+- ✅ Before merging to `main` (CI Merge stage)
+- ✅ Nightly builds (scheduled workflow)
+- ✅ Manual trigger (workflow_dispatch)
+- ✅ Release candidate testing
+
+Full regression does NOT run on:
+
+- ❌ Every PR commit (too slow)
+- ❌ Pre-commit hooks (too slow)
+- ❌ Production deployments (deploy-blocking)
+
+## Override Scenarios
+
+Skip tests (emergency only):
+
+```bash
+git commit --no-verify  # Skip pre-commit hook
+gh pr merge --admin     # Force merge (requires admin)
+```
+````
+
+```
+
+**Key Points**:
+- **Progressive validation**: More tests at each stage
+- **Time budgets**: Clear expectations per stage
+- **Blocking vs. alerting**: Production tests don't block deploy
+- **Documentation**: Team knows when full regression runs
+- **Emergency overrides**: Documented but discouraged
+
+---
+
+## Test Selection Strategy Checklist
+
+Before implementing selective testing, verify:
+
+- [ ] **Tag strategy defined**: @smoke, @p0-p3, @regression documented
+- [ ] **Time budgets set**: Each stage has clear timeout (smoke < 5 min, full < 30 min)
+- [ ] **Changed file mapping**: Code changes → test selection logic implemented
+- [ ] **Promotion rules documented**: README explains when full regression runs
+- [ ] **CI integration**: GitHub Actions uses selective strategy
+- [ ] **Local parity**: Developers can run same selections locally
+- [ ] **Emergency overrides**: Skip mechanisms documented (--no-verify, admin merge)
+- [ ] **Metrics tracked**: Monitor test execution time and selection accuracy
+
+## Integration Points
+
+- Used in workflows: `*ci` (CI/CD setup), `*automate` (test generation with tags)
+- Related fragments: `ci-burn-in.md`, `test-priorities-matrix.md`, `test-quality.md`
+- Selection tools: Playwright --grep, Cypress @cypress/grep, git diff
+
+_Source: 32+ selective testing strategies blog, Murat testing philosophy, SEON CI optimization_
+```
--- a/src/modules/bmm/testarch/knowledge/selector-resilience.md
+++ b/src/modules/bmm/testarch/knowledge/selector-resilience.md
@@ -0,0 +1,527 @@
+# Selector Resilience
+
+## Principle
+
+Robust selectors follow a strict hierarchy: **data-testid > ARIA roles > text content > CSS/IDs** (last resort). Selectors must be resilient to UI changes (styling, layout, content updates) and remain human-readable for maintenance.
+
+## Rationale
+
+**The Problem**: Brittle selectors (CSS classes, nth-child, complex XPath) break when UI styling changes, elements are reordered, or design updates occur. This causes test maintenance burden and false negatives.
+
+**The Solution**: Prioritize semantic selectors that reflect user intent (ARIA roles, accessible names, test IDs). Use dynamic filtering for lists instead of nth() indexes. Validate selectors during code review and refactor proactively.
+
+**Why This Matters**:
+
+- Prevents false test failures (UI refactoring doesn't break tests)
+- Improves accessibility (ARIA roles benefit both tests and screen readers)
+- Enhances readability (semantic selectors document user intent)
+- Reduces maintenance burden (robust selectors survive design changes)
+
+## Pattern Examples
+
+### Example 1: Selector Hierarchy (Priority Order with Examples)
+
+**Context**: Choose the most resilient selector for each element type
+
+**Implementation**:
+
+```typescript
+// tests/selectors/hierarchy-examples.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Selector Hierarchy Best Practices', () => {
+  test('Level 1: data-testid (BEST - most resilient)', async ({ page }) => {
+    await page.goto('/login');
+
+    // ✅ Best: Dedicated test attribute (survives all UI changes)
+    await page.getByTestId('email-input').fill('user@example.com');
+    await page.getByTestId('password-input').fill('password123');
+    await page.getByTestId('login-button').click();
+
+    await expect(page.getByTestId('welcome-message')).toBeVisible();
+
+    // Why it's best:
+    // - Survives CSS refactoring (class name changes)
+    // - Survives layout changes (element reordering)
+    // - Survives content changes (button text updates)
+    // - Explicit test contract (developer knows it's for testing)
+  });
+
+  test('Level 2: ARIA roles and accessible names (GOOD - future-proof)', async ({ page }) => {
+    await page.goto('/login');
+
+    // ✅ Good: Semantic HTML roles (benefits accessibility + tests)
+    await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
+    await page.getByRole('textbox', { name: 'Password' }).fill('password123');
+    await page.getByRole('button', { name: 'Sign In' }).click();
+
+    await expect(page.getByRole('heading', { name: 'Welcome' })).toBeVisible();
+
+    // Why it's good:
+    // - Survives CSS refactoring
+    // - Survives layout changes
+    // - Enforces accessibility (screen reader compatible)
+    // - Self-documenting (role + name = clear intent)
+  });
+
+  test('Level 3: Text content (ACCEPTABLE - user-centric)', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // ✅ Acceptable: Text content (matches user perception)
+    await page.getByText('Create New Order').click();
+    await expect(page.getByText('Order Details')).toBeVisible();
+
+    // Why it's acceptable:
+    // - User-centric (what user sees)
+    // - Survives CSS/layout changes
+    // - Breaks when copy changes (forces test update with content)
+
+    // ⚠️ Use with caution for dynamic/localized content:
+    // - Avoid for content with variables: "User 123" (use regex instead)
+    // - Avoid for i18n content (use data-testid or ARIA)
+  });
+
+  test('Level 4: CSS classes/IDs (LAST RESORT - brittle)', async ({ page }) => {
+    await page.goto('/login');
+
+    // ❌ Last resort: CSS class (breaks with styling updates)
+    // await page.locator('.btn-primary').click()
+
+    // ❌ Last resort: ID (breaks if ID changes)
+    // await page.locator('#login-form').fill(...)
+
+    // ✅ Better: Use data-testid or ARIA instead
+    await page.getByTestId('login-button').click();
+
+    // Why CSS/ID is last resort:
+    // - Breaks with CSS refactoring (class name changes)
+    // - Breaks with HTML restructuring (ID changes)
+    // - Not semantic (unclear what element does)
+    // - Tight coupling between tests and styling
+  });
+});
+```
+
+**Key Points**:
+
+- Hierarchy: data-testid (best) > ARIA (good) > text (acceptable) > CSS/ID (last resort)
+- data-testid survives ALL UI changes (explicit test contract)
+- ARIA roles enforce accessibility (screen reader compatible)
+- Text content is user-centric (but breaks with copy changes)
+- CSS/ID are brittle (break with styling refactoring)
+
+---
+
+### Example 2: Dynamic Selector Patterns (Lists, Filters, Regex)
+
+**Context**: Handle dynamic content, lists, and variable data with resilient selectors
+
+**Implementation**:
+
+```typescript
+// tests/selectors/dynamic-selectors.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Dynamic Selector Patterns', () => {
+  test('regex for variable content (user IDs, timestamps)', async ({ page }) => {
+    await page.goto('/users');
+
+    // ✅ Good: Regex pattern for dynamic user IDs
+    await expect(page.getByText(/User \d+/)).toBeVisible();
+
+    // ✅ Good: Regex for timestamps
+    await expect(page.getByText(/Last login: \d{4}-\d{2}-\d{2}/)).toBeVisible();
+
+    // ✅ Good: Regex for dynamic counts
+    await expect(page.getByText(/\d+ items in cart/)).toBeVisible();
+  });
+
+  test('partial text matching (case-insensitive, substring)', async ({ page }) => {
+    await page.goto('/products');
+
+    // ✅ Good: Partial match (survives minor text changes)
+    await page.getByText('Product', { exact: false }).first().click();
+
+    // ✅ Good: Case-insensitive (survives capitalization changes)
+    await expect(page.getByText(/sign in/i)).toBeVisible();
+  });
+
+  test('filter locators for lists (avoid brittle nth)', async ({ page }) => {
+    await page.goto('/products');
+
+    // ❌ Bad: Index-based (breaks when order changes)
+    // await page.locator('.product-card').nth(2).click()
+
+    // ✅ Good: Filter by content (resilient to reordering)
+    await page.locator('[data-testid="product-card"]').filter({ hasText: 'Premium Plan' }).click();
+
+    // ✅ Good: Filter by attribute
+    await page
+      .locator('[data-testid="product-card"]')
+      .filter({ has: page.locator('[data-status="active"]') })
+      .first()
+      .click();
+  });
+
+  test('nth() only when absolutely necessary', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // ⚠️ Acceptable: nth(0) for first item (common pattern)
+    const firstNotification = page.getByTestId('notification').nth(0);
+    await expect(firstNotification).toContainText('Welcome');
+
+    // ❌ Bad: nth(5) for arbitrary index (fragile)
+    // await page.getByTestId('notification').nth(5).click()
+
+    // ✅ Better: Use filter() with specific criteria
+    await page.getByTestId('notification').filter({ hasText: 'Critical Alert' }).click();
+  });
+
+  test('combine multiple locators for specificity', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // ✅ Good: Narrow scope with combined locators
+    const shippingSection = page.getByTestId('shipping-section');
+    await shippingSection.getByLabel('Address Line 1').fill('123 Main St');
+    await shippingSection.getByLabel('City').fill('New York');
+
+    // Scoping prevents ambiguity (multiple "City" fields on page)
+  });
+});
+```
+
+**Key Points**:
+
+- Regex patterns handle variable content (IDs, timestamps, counts)
+- Partial matching survives minor text changes (`exact: false`)
+- `filter()` is more resilient than `nth()` (content-based vs index-based)
+- `nth(0)` acceptable for "first item", avoid arbitrary indexes
+- Combine locators to narrow scope (prevent ambiguity)
+
+---
+
+### Example 3: Selector Anti-Patterns (What NOT to Do)
+
+**Context**: Common selector mistakes that cause brittle tests
+
+**Problem Examples**:
+
+```typescript
+// tests/selectors/anti-patterns.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Selector Anti-Patterns to Avoid', () => {
+  test('❌ Anti-Pattern 1: CSS classes (brittle)', async ({ page }) => {
+    await page.goto('/login');
+
+    // ❌ Bad: CSS class (breaks with design system updates)
+    // await page.locator('.btn-primary').click()
+    // await page.locator('.form-input-lg').fill('test@example.com')
+
+    // ✅ Good: Use data-testid or ARIA role
+    await page.getByTestId('login-button').click();
+    await page.getByRole('textbox', { name: 'Email' }).fill('test@example.com');
+  });
+
+  test('❌ Anti-Pattern 2: Index-based nth() (fragile)', async ({ page }) => {
+    await page.goto('/products');
+
+    // ❌ Bad: Index-based (breaks when product order changes)
+    // await page.locator('.product-card').nth(3).click()
+
+    // ✅ Good: Content-based filter
+    await page.locator('[data-testid="product-card"]').filter({ hasText: 'Laptop' }).click();
+  });
+
+  test('❌ Anti-Pattern 3: Complex XPath (hard to maintain)', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // ❌ Bad: Complex XPath (unreadable, breaks with structure changes)
+    // await page.locator('xpath=//div[@class="container"]//section[2]//button[contains(@class, "primary")]').click()
+
+    // ✅ Good: Semantic selector
+    await page.getByRole('button', { name: 'Create Order' }).click();
+  });
+
+  test('❌ Anti-Pattern 4: ID selectors (coupled to implementation)', async ({ page }) => {
+    await page.goto('/settings');
+
+    // ❌ Bad: HTML ID (breaks if ID changes for accessibility/SEO)
+    // await page.locator('#user-settings-form').fill(...)
+
+    // ✅ Good: data-testid or ARIA landmark
+    await page.getByTestId('user-settings-form').getByLabel('Display Name').fill('John Doe');
+  });
+
+  test('✅ Refactoring: Bad → Good Selector', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // Before (brittle):
+    // await page.locator('.checkout-form > .payment-section > .btn-submit').click()
+
+    // After (resilient):
+    await page.getByTestId('checkout-form').getByRole('button', { name: 'Complete Payment' }).click();
+
+    await expect(page.getByText('Payment successful')).toBeVisible();
+  });
+});
+```
+
+**Why These Fail**:
+
+- **CSS classes**: Change frequently with design updates (Tailwind, CSS modules)
+- **nth() indexes**: Fragile to element reordering (new features, A/B tests)
+- **Complex XPath**: Unreadable, breaks with HTML structure changes
+- **HTML IDs**: Not stable (accessibility improvements change IDs)
+
+**Better Approach**: Use selector hierarchy (testid > ARIA > text)
+
+---
+
+### Example 4: Selector Debugging Techniques (Inspector, DevTools, MCP)
+
+**Context**: Debug selector failures interactively to find better alternatives
+
+**Implementation**:
+
+```typescript
+// tests/selectors/debugging-techniques.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Selector Debugging Techniques', () => {
+  test('use Playwright Inspector to test selectors', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // Pause test to open Inspector
+    await page.pause();
+
+    // In Inspector console, test selectors:
+    // page.getByTestId('user-menu')              ✅ Works
+    // page.getByRole('button', { name: 'Profile' }) ✅ Works
+    // page.locator('.btn-primary')               ❌ Brittle
+
+    // Use "Pick Locator" feature to generate selectors
+    // Use "Record" mode to capture user interactions
+
+    await page.getByTestId('user-menu').click();
+    await expect(page.getByRole('menu')).toBeVisible();
+  });
+
+  test('use locator.all() to debug lists', async ({ page }) => {
+    await page.goto('/products');
+
+    // Debug: How many products are visible?
+    const products = await page.getByTestId('product-card').all();
+    console.log(`Found ${products.length} products`);
+
+    // Debug: What text is in each product?
+    for (const product of products) {
+      const text = await product.textContent();
+      console.log(`Product text: ${text}`);
+    }
+
+    // Use findings to build better selector
+    await page.getByTestId('product-card').filter({ hasText: 'Laptop' }).click();
+  });
+
+  test('use DevTools console to test selectors', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // Open DevTools (manually or via page.pause())
+    // Test selectors in console:
+    // document.querySelectorAll('[data-testid="payment-method"]')
+    // document.querySelector('#credit-card-input')
+
+    // Find robust selector through trial and error
+    await page.getByTestId('payment-method').selectOption('credit-card');
+  });
+
+  test('MCP browser_generate_locator (if available)', async ({ page }) => {
+    await page.goto('/products');
+
+    // If Playwright MCP available, use browser_generate_locator:
+    // 1. Click element in browser
+    // 2. MCP generates optimal selector
+    // 3. Copy into test
+
+    // Example output from MCP:
+    // page.getByRole('link', { name: 'Product A' })
+
+    // Use generated selector
+    await page.getByRole('link', { name: 'Product A' }).click();
+    await expect(page).toHaveURL(/\/products\/\d+/);
+  });
+});
+```
+
+**Key Points**:
+
+- Playwright Inspector: Interactive selector testing with "Pick Locator" feature
+- `locator.all()`: Debug lists to understand structure and content
+- DevTools console: Test CSS selectors before adding to tests
+- MCP browser_generate_locator: Auto-generate optimal selectors (if MCP available)
+- Always validate selectors work before committing
+
+---
+
+### Example 2: Selector Refactoring Guide (Before/After Patterns)
+
+**Context**: Systematically improve brittle selectors to resilient alternatives
+
+**Implementation**:
+
+```typescript
+// tests/selectors/refactoring-guide.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Selector Refactoring Patterns', () => {
+  test('refactor: CSS class → data-testid', async ({ page }) => {
+    await page.goto('/products');
+
+    // ❌ Before: CSS class (breaks with Tailwind updates)
+    // await page.locator('.bg-blue-500.px-4.py-2.rounded').click()
+
+    // ✅ After: data-testid
+    await page.getByTestId('add-to-cart-button').click();
+
+    // Implementation: Add data-testid to button component
+    // <button className="bg-blue-500 px-4 py-2 rounded" data-testid="add-to-cart-button">
+  });
+
+  test('refactor: nth() index → filter()', async ({ page }) => {
+    await page.goto('/users');
+
+    // ❌ Before: Index-based (breaks when users reorder)
+    // await page.locator('.user-row').nth(2).click()
+
+    // ✅ After: Content-based filter
+    await page.locator('[data-testid="user-row"]').filter({ hasText: 'john@example.com' }).click();
+  });
+
+  test('refactor: Complex XPath → ARIA role', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // ❌ Before: Complex XPath (unreadable, brittle)
+    // await page.locator('xpath=//div[@id="payment"]//form//button[contains(@class, "submit")]').click()
+
+    // ✅ After: ARIA role
+    await page.getByRole('button', { name: 'Complete Payment' }).click();
+  });
+
+  test('refactor: ID selector → data-testid', async ({ page }) => {
+    await page.goto('/settings');
+
+    // ❌ Before: HTML ID (changes with accessibility improvements)
+    // await page.locator('#user-profile-section').getByLabel('Name').fill('John')
+
+    // ✅ After: data-testid + semantic label
+    await page.getByTestId('user-profile-section').getByLabel('Display Name').fill('John Doe');
+  });
+
+  test('refactor: Deeply nested CSS → scoped data-testid', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // ❌ Before: Deep nesting (breaks with structure changes)
+    // await page.locator('.container .sidebar .menu .item:nth-child(3) a').click()
+
+    // ✅ After: Scoped data-testid
+    const sidebar = page.getByTestId('sidebar');
+    await sidebar.getByRole('link', { name: 'Settings' }).click();
+  });
+});
+```
+
+**Key Points**:
+
+- CSS class → data-testid (survives design system updates)
+- nth() → filter() (content-based vs index-based)
+- Complex XPath → ARIA role (readable, semantic)
+- ID → data-testid (decouples from HTML structure)
+- Deep nesting → scoped locators (modular, maintainable)
+
+---
+
+### Example 3: Selector Best Practices Checklist
+
+```typescript
+// tests/selectors/validation-checklist.spec.ts
+import { test, expect } from '@playwright/test';
+
+/**
+ * Selector Validation Checklist
+ *
+ * Before committing test, verify selectors meet these criteria:
+ */
+test.describe('Selector Best Practices Validation', () => {
+  test('✅ 1. Prefer data-testid for interactive elements', async ({ page }) => {
+    await page.goto('/login');
+
+    // Interactive elements (buttons, inputs, links) should use data-testid
+    await page.getByTestId('email-input').fill('test@example.com');
+    await page.getByTestId('login-button').click();
+  });
+
+  test('✅ 2. Use ARIA roles for semantic elements', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // Semantic elements (headings, navigation, forms) use ARIA
+    await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
+    await page.getByRole('navigation').getByRole('link', { name: 'Settings' }).click();
+  });
+
+  test('✅ 3. Avoid CSS classes (except when testing styles)', async ({ page }) => {
+    await page.goto('/products');
+
+    // ❌ Never for interaction: page.locator('.btn-primary')
+    // ✅ Only for visual regression: await expect(page.locator('.error-banner')).toHaveCSS('color', 'rgb(255, 0, 0)')
+  });
+
+  test('✅ 4. Use filter() instead of nth() for lists', async ({ page }) => {
+    await page.goto('/orders');
+
+    // List selection should be content-based
+    await page.getByTestId('order-row').filter({ hasText: 'Order #12345' }).click();
+  });
+
+  test('✅ 5. Selectors are human-readable', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // ✅ Good: Clear intent
+    await page.getByTestId('shipping-address-form').getByLabel('Street Address').fill('123 Main St');
+
+    // ❌ Bad: Cryptic
+    // await page.locator('div > div:nth-child(2) > input[type="text"]').fill('123 Main St')
+  });
+});
+```
+
+**Validation Rules**:
+
+1. **Interactive elements** (buttons, inputs) → data-testid
+2. **Semantic elements** (headings, nav, forms) → ARIA roles
+3. **CSS classes** → Avoid (except visual regression tests)
+4. **Lists** → filter() over nth() (content-based selection)
+5. **Readability** → Selectors document user intent (clear, semantic)
+
+---
+
+## Selector Resilience Checklist
+
+Before deploying selectors:
+
+- [ ] **Hierarchy followed**: data-testid (1st choice) > ARIA (2nd) > text (3rd) > CSS/ID (last resort)
+- [ ] **Interactive elements use data-testid**: Buttons, inputs, links have dedicated test attributes
+- [ ] **Semantic elements use ARIA**: Headings, navigation, forms use roles and accessible names
+- [ ] **No brittle patterns**: No CSS classes (except visual tests), no arbitrary nth(), no complex XPath
+- [ ] **Dynamic content handled**: Regex for IDs/timestamps, filter() for lists, partial matching for text
+- [ ] **Selectors are scoped**: Use container locators to narrow scope (prevent ambiguity)
+- [ ] **Human-readable**: Selectors document user intent (clear, semantic, maintainable)
+- [ ] **Validated in Inspector**: Test selectors interactively before committing (page.pause())
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (generate tests with robust selectors), `*automate` (healing selector failures), `*test-review` (validate selector quality)
+- **Related fragments**: `test-healing-patterns.md` (selector failure diagnosis), `fixture-architecture.md` (page object alternatives), `test-quality.md` (maintainability standards)
+- **Tools**: Playwright Inspector (Pick Locator), DevTools console, Playwright MCP browser_generate_locator (optional)
+
+_Source: Playwright selector best practices, accessibility guidelines (ARIA), production test maintenance patterns_
--- a/src/modules/bmm/testarch/knowledge/test-healing-patterns.md
+++ b/src/modules/bmm/testarch/knowledge/test-healing-patterns.md
@@ -0,0 +1,644 @@
+# Test Healing Patterns
+
+## Principle
+
+Common test failures follow predictable patterns (stale selectors, race conditions, dynamic data assertions, network errors, hard waits). **Automated healing** identifies failure signatures and applies pattern-based fixes. Manual healing captures these patterns for future automation.
+
+## Rationale
+
+**The Problem**: Test failures waste developer time on repetitive debugging. Teams manually fix the same selector issues, timing bugs, and data mismatches repeatedly across test suites.
+
+**The Solution**: Catalog common failure patterns with diagnostic signatures and automated fixes. When a test fails, match the error message/stack trace against known patterns and apply the corresponding fix. This transforms test maintenance from reactive debugging to proactive pattern application.
+
+**Why This Matters**:
+
+- Reduces test maintenance time by 60-80% (pattern-based fixes vs manual debugging)
+- Prevents flakiness regression (same bug fixed once, applied everywhere)
+- Builds institutional knowledge (failure catalog grows over time)
+- Enables self-healing test suites (automate workflow validates and heals)
+
+## Pattern Examples
+
+### Example 1: Common Failure Pattern - Stale Selectors (Element Not Found)
+
+**Context**: Test fails with "Element not found" or "Locator resolved to 0 elements" errors
+
+**Diagnostic Signature**:
+
+```typescript
+// src/testing/healing/selector-healing.ts
+
+export type SelectorFailure = {
+  errorMessage: string;
+  stackTrace: string;
+  selector: string;
+  testFile: string;
+  lineNumber: number;
+};
+
+/**
+ * Detect stale selector failures
+ */
+export function isSelectorFailure(error: Error): boolean {
+  const patterns = [
+    /locator.*resolved to 0 elements/i,
+    /element not found/i,
+    /waiting for locator.*to be visible/i,
+    /selector.*did not match any elements/i,
+    /unable to find element/i,
+  ];
+
+  return patterns.some((pattern) => pattern.test(error.message));
+}
+
+/**
+ * Extract selector from error message
+ */
+export function extractSelector(errorMessage: string): string | null {
+  // Playwright: "locator('button[type=\"submit\"]') resolved to 0 elements"
+  const playwrightMatch = errorMessage.match(/locator\('([^']+)'\)/);
+  if (playwrightMatch) return playwrightMatch[1];
+
+  // Cypress: "Timed out retrying: Expected to find element: '.submit-button'"
+  const cypressMatch = errorMessage.match(/Expected to find element: ['"]([^'"]+)['"]/i);
+  if (cypressMatch) return cypressMatch[1];
+
+  return null;
+}
+
+/**
+ * Suggest better selector based on hierarchy
+ */
+export function suggestBetterSelector(badSelector: string): string {
+  // If using CSS class → suggest data-testid
+  if (badSelector.startsWith('.') || badSelector.includes('class=')) {
+    const elementName = badSelector.match(/class=["']([^"']+)["']/)?.[1] || badSelector.slice(1);
+    return `page.getByTestId('${elementName}') // Prefer data-testid over CSS class`;
+  }
+
+  // If using ID → suggest data-testid
+  if (badSelector.startsWith('#')) {
+    return `page.getByTestId('${badSelector.slice(1)}') // Prefer data-testid over ID`;
+  }
+
+  // If using nth() → suggest filter() or more specific selector
+  if (badSelector.includes('.nth(')) {
+    return `page.locator('${badSelector.split('.nth(')[0]}').filter({ hasText: 'specific text' }) // Avoid brittle nth(), use filter()`;
+  }
+
+  // If using complex CSS → suggest ARIA role
+  if (badSelector.includes('>') || badSelector.includes('+')) {
+    return `page.getByRole('button', { name: 'Submit' }) // Prefer ARIA roles over complex CSS`;
+  }
+
+  return `page.getByTestId('...') // Add data-testid attribute to element`;
+}
+```
+
+**Healing Implementation**:
+
+```typescript
+// tests/healing/selector-healing.spec.ts
+import { test, expect } from '@playwright/test';
+import { isSelectorFailure, extractSelector, suggestBetterSelector } from '../../src/testing/healing/selector-healing';
+
+test('heal stale selector failures automatically', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  try {
+    // Original test with brittle CSS selector
+    await page.locator('.btn-primary').click();
+  } catch (error: any) {
+    if (isSelectorFailure(error)) {
+      const badSelector = extractSelector(error.message);
+      const suggestion = badSelector ? suggestBetterSelector(badSelector) : null;
+
+      console.log('HEALING SUGGESTION:', suggestion);
+
+      // Apply healed selector
+      await page.getByTestId('submit-button').click(); // Fixed!
+    } else {
+      throw error; // Not a selector issue, rethrow
+    }
+  }
+
+  await expect(page.getByText('Success')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Diagnosis: Error message contains "locator resolved to 0 elements" or "element not found"
+- Fix: Replace brittle selector (CSS class, ID, nth) with robust alternative (data-testid, ARIA role)
+- Prevention: Follow selector hierarchy (data-testid > ARIA > text > CSS)
+- Automation: Pattern matching on error message + stack trace
+
+---
+
+### Example 2: Common Failure Pattern - Race Conditions (Timing Errors)
+
+**Context**: Test fails with "timeout waiting for element" or "element not visible" errors
+
+**Diagnostic Signature**:
+
+```typescript
+// src/testing/healing/timing-healing.ts
+
+export type TimingFailure = {
+  errorMessage: string;
+  testFile: string;
+  lineNumber: number;
+  actionType: 'click' | 'fill' | 'waitFor' | 'expect';
+};
+
+/**
+ * Detect race condition failures
+ */
+export function isTimingFailure(error: Error): boolean {
+  const patterns = [
+    /timeout.*waiting for/i,
+    /element is not visible/i,
+    /element is not attached to the dom/i,
+    /waiting for element to be visible.*exceeded/i,
+    /timed out retrying/i,
+    /waitForLoadState.*timeout/i,
+  ];
+
+  return patterns.some((pattern) => pattern.test(error.message));
+}
+
+/**
+ * Detect hard wait anti-pattern
+ */
+export function hasHardWait(testCode: string): boolean {
+  const hardWaitPatterns = [/page\.waitForTimeout\(/, /cy\.wait\(\d+\)/, /await.*sleep\(/, /setTimeout\(/];
+
+  return hardWaitPatterns.some((pattern) => pattern.test(testCode));
+}
+
+/**
+ * Suggest deterministic wait replacement
+ */
+export function suggestDeterministicWait(testCode: string): string {
+  if (testCode.includes('page.waitForTimeout')) {
+    return `
+// ❌ Bad: Hard wait (flaky)
+// await page.waitForTimeout(3000)
+
+// ✅ Good: Wait for network response
+await page.waitForResponse(resp => resp.url().includes('/api/data') && resp.status() === 200)
+
+// OR wait for element state
+await page.getByTestId('loading-spinner').waitFor({ state: 'detached' })
+    `.trim();
+  }
+
+  if (testCode.includes('cy.wait(') && /cy\.wait\(\d+\)/.test(testCode)) {
+    return `
+// ❌ Bad: Hard wait (flaky)
+// cy.wait(3000)
+
+// ✅ Good: Wait for aliased network request
+cy.intercept('GET', '/api/data').as('getData')
+cy.visit('/page')
+cy.wait('@getData')
+    `.trim();
+  }
+
+  return `
+// Add network-first interception BEFORE navigation:
+await page.route('**/api/**', route => route.continue())
+const responsePromise = page.waitForResponse('**/api/data')
+await page.goto('/page')
+await responsePromise
+  `.trim();
+}
+```
+
+**Healing Implementation**:
+
+```typescript
+// tests/healing/timing-healing.spec.ts
+import { test, expect } from '@playwright/test';
+import { isTimingFailure, hasHardWait, suggestDeterministicWait } from '../../src/testing/healing/timing-healing';
+
+test('heal race condition with network-first pattern', async ({ page, context }) => {
+  // Setup interception BEFORE navigation (prevent race)
+  await context.route('**/api/products', (route) => {
+    route.fulfill({
+      status: 200,
+      body: JSON.stringify({ products: [{ id: 1, name: 'Product A' }] }),
+    });
+  });
+
+  const responsePromise = page.waitForResponse('**/api/products');
+
+  await page.goto('/products');
+  await responsePromise; // Deterministic wait
+
+  // Element now reliably visible (no race condition)
+  await expect(page.getByText('Product A')).toBeVisible();
+});
+
+test('heal hard wait with event-based wait', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // ❌ Original (flaky): await page.waitForTimeout(3000)
+
+  // ✅ Healed: Wait for spinner to disappear
+  await page.getByTestId('loading-spinner').waitFor({ state: 'detached' });
+
+  // Element now reliably visible
+  await expect(page.getByText('Dashboard loaded')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Diagnosis: Error contains "timeout" or "not visible", often after navigation
+- Fix: Replace hard waits with network-first pattern or element state waits
+- Prevention: ALWAYS intercept before navigate, use waitForResponse()
+- Automation: Detect `page.waitForTimeout()` or `cy.wait(number)` in test code
+
+---
+
+### Example 3: Common Failure Pattern - Dynamic Data Assertions (Non-Deterministic IDs)
+
+**Context**: Test fails with "Expected 'User 123' but received 'User 456'" or timestamp mismatches
+
+**Diagnostic Signature**:
+
+```typescript
+// src/testing/healing/data-healing.ts
+
+export type DataFailure = {
+  errorMessage: string;
+  expectedValue: string;
+  actualValue: string;
+  testFile: string;
+  lineNumber: number;
+};
+
+/**
+ * Detect dynamic data assertion failures
+ */
+export function isDynamicDataFailure(error: Error): boolean {
+  const patterns = [
+    /expected.*\d+.*received.*\d+/i, // ID mismatches
+    /expected.*\d{4}-\d{2}-\d{2}.*received/i, // Date mismatches
+    /expected.*user.*\d+/i, // Dynamic user IDs
+    /expected.*order.*\d+/i, // Dynamic order IDs
+    /expected.*to.*contain.*\d+/i, // Numeric assertions
+  ];
+
+  return patterns.some((pattern) => pattern.test(error.message));
+}
+
+/**
+ * Suggest flexible assertion pattern
+ */
+export function suggestFlexibleAssertion(errorMessage: string): string {
+  if (/expected.*user.*\d+/i.test(errorMessage)) {
+    return `
+// ❌ Bad: Hardcoded ID
+// await expect(page.getByText('User 123')).toBeVisible()
+
+// ✅ Good: Regex pattern for any user ID
+await expect(page.getByText(/User \\d+/)).toBeVisible()
+
+// OR use partial match
+await expect(page.locator('[data-testid="user-name"]')).toContainText('User')
+    `.trim();
+  }
+
+  if (/expected.*\d{4}-\d{2}-\d{2}/i.test(errorMessage)) {
+    return `
+// ❌ Bad: Hardcoded date
+// await expect(page.getByText('2024-01-15')).toBeVisible()
+
+// ✅ Good: Dynamic date validation
+const today = new Date().toISOString().split('T')[0]
+await expect(page.getByTestId('created-date')).toHaveText(today)
+
+// OR use date format regex
+await expect(page.getByTestId('created-date')).toHaveText(/\\d{4}-\\d{2}-\\d{2}/)
+    `.trim();
+  }
+
+  if (/expected.*order.*\d+/i.test(errorMessage)) {
+    return `
+// ❌ Bad: Hardcoded order ID
+// const orderId = '12345'
+
+// ✅ Good: Capture dynamic order ID
+const orderText = await page.getByTestId('order-id').textContent()
+const orderId = orderText?.match(/Order #(\\d+)/)?.[1]
+expect(orderId).toBeTruthy()
+
+// Use captured ID in later assertions
+await expect(page.getByText(\`Order #\${orderId} confirmed\`)).toBeVisible()
+    `.trim();
+  }
+
+  return `Use regex patterns, partial matching, or capture dynamic values instead of hardcoding`;
+}
+```
+
+**Healing Implementation**:
+
+```typescript
+// tests/healing/data-healing.spec.ts
+import { test, expect } from '@playwright/test';
+
+test('heal dynamic ID assertion with regex', async ({ page }) => {
+  await page.goto('/users');
+
+  // ❌ Original (fails with random IDs): await expect(page.getByText('User 123')).toBeVisible()
+
+  // ✅ Healed: Regex pattern matches any user ID
+  await expect(page.getByText(/User \d+/)).toBeVisible();
+});
+
+test('heal timestamp assertion with dynamic generation', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // ❌ Original (fails daily): await expect(page.getByText('2024-01-15')).toBeVisible()
+
+  // ✅ Healed: Generate expected date dynamically
+  const today = new Date().toISOString().split('T')[0];
+  await expect(page.getByTestId('last-updated')).toContainText(today);
+});
+
+test('heal order ID assertion with capture', async ({ page, request }) => {
+  // Create order via API (dynamic ID)
+  const response = await request.post('/api/orders', {
+    data: { productId: '123', quantity: 1 },
+  });
+  const { orderId } = await response.json();
+
+  // ✅ Healed: Use captured dynamic ID
+  await page.goto(`/orders/${orderId}`);
+  await expect(page.getByText(`Order #${orderId}`)).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Diagnosis: Error message shows expected vs actual value mismatch with IDs/timestamps
+- Fix: Use regex patterns (`/User \d+/`), partial matching, or capture dynamic values
+- Prevention: Never hardcode IDs, timestamps, or random data in assertions
+- Automation: Parse error message for expected/actual values, suggest regex patterns
+
+---
+
+### Example 4: Common Failure Pattern - Network Errors (Missing Route Interception)
+
+**Context**: Test fails with "API call failed" or "500 error" during test execution
+
+**Diagnostic Signature**:
+
+```typescript
+// src/testing/healing/network-healing.ts
+
+export type NetworkFailure = {
+  errorMessage: string;
+  url: string;
+  statusCode: number;
+  method: string;
+};
+
+/**
+ * Detect network failure
+ */
+export function isNetworkFailure(error: Error): boolean {
+  const patterns = [
+    /api.*call.*failed/i,
+    /request.*failed/i,
+    /network.*error/i,
+    /500.*internal server error/i,
+    /503.*service unavailable/i,
+    /fetch.*failed/i,
+  ];
+
+  return patterns.some((pattern) => pattern.test(error.message));
+}
+
+/**
+ * Suggest route interception
+ */
+export function suggestRouteInterception(url: string, method: string): string {
+  return `
+// ❌ Bad: Real API call (unreliable, slow, external dependency)
+
+// ✅ Good: Mock API response with route interception
+await page.route('${url}', route => {
+  route.fulfill({
+    status: 200,
+    contentType: 'application/json',
+    body: JSON.stringify({
+      // Mock response data
+      id: 1,
+      name: 'Test User',
+      email: 'test@example.com'
+    })
+  })
+})
+
+// Then perform action
+await page.goto('/page')
+  `.trim();
+}
+```
+
+**Healing Implementation**:
+
+```typescript
+// tests/healing/network-healing.spec.ts
+import { test, expect } from '@playwright/test';
+
+test('heal network failure with route mocking', async ({ page, context }) => {
+  // ✅ Healed: Mock API to prevent real network calls
+  await context.route('**/api/products', (route) => {
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      body: JSON.stringify({
+        products: [
+          { id: 1, name: 'Product A', price: 29.99 },
+          { id: 2, name: 'Product B', price: 49.99 },
+        ],
+      }),
+    });
+  });
+
+  await page.goto('/products');
+
+  // Test now reliable (no external API dependency)
+  await expect(page.getByText('Product A')).toBeVisible();
+  await expect(page.getByText('$29.99')).toBeVisible();
+});
+
+test('heal 500 error with error state mocking', async ({ page, context }) => {
+  // Mock API failure scenario
+  await context.route('**/api/products', (route) => {
+    route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
+  });
+
+  await page.goto('/products');
+
+  // Verify error handling (not crash)
+  await expect(page.getByText('Unable to load products')).toBeVisible();
+  await expect(page.getByRole('button', { name: 'Retry' })).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- Diagnosis: Error message contains "API call failed", "500 error", or network-related failures
+- Fix: Add `page.route()` or `cy.intercept()` to mock API responses
+- Prevention: Mock ALL external dependencies (APIs, third-party services)
+- Automation: Extract URL from error message, generate route interception code
+
+---
+
+### Example 5: Common Failure Pattern - Hard Waits (Unreliable Timing)
+
+**Context**: Test fails intermittently with "timeout exceeded" or passes/fails randomly
+
+**Diagnostic Signature**:
+
+```typescript
+// src/testing/healing/hard-wait-healing.ts
+
+/**
+ * Detect hard wait anti-pattern in test code
+ */
+export function detectHardWaits(testCode: string): Array<{ line: number; code: string }> {
+  const lines = testCode.split('\n');
+  const violations: Array<{ line: number; code: string }> = [];
+
+  lines.forEach((line, index) => {
+    if (line.includes('page.waitForTimeout(') || /cy\.wait\(\d+\)/.test(line) || line.includes('sleep(') || line.includes('setTimeout(')) {
+      violations.push({ line: index + 1, code: line.trim() });
+    }
+  });
+
+  return violations;
+}
+
+/**
+ * Suggest event-based wait replacement
+ */
+export function suggestEventBasedWait(hardWaitLine: string): string {
+  if (hardWaitLine.includes('page.waitForTimeout')) {
+    return `
+// ❌ Bad: Hard wait (flaky)
+${hardWaitLine}
+
+// ✅ Good: Wait for network response
+await page.waitForResponse(resp => resp.url().includes('/api/') && resp.ok())
+
+// OR wait for element state change
+await page.getByTestId('loading-spinner').waitFor({ state: 'detached' })
+await page.getByTestId('content').waitFor({ state: 'visible' })
+    `.trim();
+  }
+
+  if (/cy\.wait\(\d+\)/.test(hardWaitLine)) {
+    return `
+// ❌ Bad: Hard wait (flaky)
+${hardWaitLine}
+
+// ✅ Good: Wait for aliased request
+cy.intercept('GET', '/api/data').as('getData')
+cy.visit('/page')
+cy.wait('@getData') // Deterministic
+    `.trim();
+  }
+
+  return 'Replace hard waits with event-based waits (waitForResponse, waitFor state changes)';
+}
+```
+
+**Healing Implementation**:
+
+```typescript
+// tests/healing/hard-wait-healing.spec.ts
+import { test, expect } from '@playwright/test';
+
+test('heal hard wait with deterministic wait', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // ❌ Original (flaky): await page.waitForTimeout(3000)
+
+  // ✅ Healed: Wait for loading spinner to disappear
+  await page.getByTestId('loading-spinner').waitFor({ state: 'detached' });
+
+  // OR wait for specific network response
+  await page.waitForResponse((resp) => resp.url().includes('/api/dashboard') && resp.ok());
+
+  await expect(page.getByText('Dashboard ready')).toBeVisible();
+});
+
+test('heal implicit wait with explicit network wait', async ({ page }) => {
+  const responsePromise = page.waitForResponse('**/api/products');
+
+  await page.goto('/products');
+
+  // ❌ Original (race condition): await page.getByText('Product A').click()
+
+  // ✅ Healed: Wait for network first
+  await responsePromise;
+  await page.getByText('Product A').click();
+
+  await expect(page).toHaveURL(/\/products\/\d+/);
+});
+```
+
+**Key Points**:
+
+- Diagnosis: Test code contains `page.waitForTimeout()` or `cy.wait(number)`
+- Fix: Replace with `waitForResponse()`, `waitFor({ state })`, or aliased intercepts
+- Prevention: NEVER use hard waits, always use event-based/response-based waits
+- Automation: Scan test code for hard wait patterns, suggest deterministic replacements
+
+---
+
+## Healing Pattern Catalog
+
+| Failure Type   | Diagnostic Signature                          | Healing Strategy                      | Prevention Pattern                        |
+| -------------- | --------------------------------------------- | ------------------------------------- | ----------------------------------------- |
+| Stale Selector | "locator resolved to 0 elements"              | Replace with data-testid or ARIA role | Selector hierarchy (testid > ARIA > text) |
+| Race Condition | "timeout waiting for element"                 | Add network-first interception        | Intercept before navigate                 |
+| Dynamic Data   | "Expected 'User 123' but got 'User 456'"      | Use regex or capture dynamic values   | Never hardcode IDs/timestamps             |
+| Network Error  | "API call failed", "500 error"                | Add route mocking                     | Mock all external dependencies            |
+| Hard Wait      | Test contains `waitForTimeout()` or `wait(n)` | Replace with event-based waits        | Always use deterministic waits            |
+
+## Healing Workflow
+
+1. **Run test** → Capture failure
+2. **Identify pattern** → Match error against diagnostic signatures
+3. **Apply fix** → Use pattern-based healing strategy
+4. **Re-run test** → Validate fix (max 3 iterations)
+5. **Mark unfixable** → Use `test.fixme()` if healing fails after 3 attempts
+
+## Healing Checklist
+
+Before enabling auto-healing in workflows:
+
+- [ ] **Failure catalog documented**: Common patterns identified (selectors, timing, data, network, hard waits)
+- [ ] **Diagnostic signatures defined**: Error message patterns for each failure type
+- [ ] **Healing strategies documented**: Fix patterns for each failure type
+- [ ] **Prevention patterns documented**: Best practices to avoid recurrence
+- [ ] **Healing iteration limit set**: Max 3 attempts before marking test.fixme()
+- [ ] **MCP integration optional**: Graceful degradation without Playwright MCP
+- [ ] **Pattern-based fallback**: Use knowledge base patterns when MCP unavailable
+- [ ] **Healing report generated**: Document what was healed and how
+
+## Integration Points
+
+- **Used in workflows**: `*automate` (auto-healing after test generation), `*atdd` (optional healing for acceptance tests)
+- **Related fragments**: `selector-resilience.md` (selector debugging), `timing-debugging.md` (race condition fixes), `network-first.md` (interception patterns), `data-factories.md` (dynamic data handling)
+- **Tools**: Error message parsing, AST analysis for code patterns, Playwright MCP (optional), pattern matching
+
+_Source: Playwright test-healer patterns, production test failure analysis, common anti-patterns from test-resources-for-ai_
--- a/src/modules/bmm/testarch/knowledge/test-levels-framework.md
+++ b/src/modules/bmm/testarch/knowledge/test-levels-framework.md
@@ -146,3 +146,328 @@ Examples:
 - `1.3-UNIT-001`
 - `1.3-INT-002`
 - `1.3-E2E-001`
+
+## Real Code Examples
+
+### Example 1: E2E Test (Full User Journey)
+
+**Scenario**: User logs in, navigates to dashboard, and places an order.
+
+```typescript
+// tests/e2e/checkout-flow.spec.ts
+import { test, expect } from '@playwright/test';
+import { createUser, createProduct } from '../test-utils/factories';
+
+test.describe('Checkout Flow', () => {
+  test('user can complete purchase with saved payment method', async ({ page, apiRequest }) => {
+    // Setup: Seed data via API (fast!)
+    const user = createUser({ email: 'buyer@example.com', hasSavedCard: true });
+    const product = createProduct({ name: 'Widget', price: 29.99, stock: 10 });
+
+    await apiRequest.post('/api/users', { data: user });
+    await apiRequest.post('/api/products', { data: product });
+
+    // Network-first: Intercept BEFORE action
+    const loginPromise = page.waitForResponse('**/api/auth/login');
+    const cartPromise = page.waitForResponse('**/api/cart');
+    const orderPromise = page.waitForResponse('**/api/orders');
+
+    // Step 1: Login
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', 'password123');
+    await page.click('[data-testid="login-button"]');
+    await loginPromise;
+
+    // Assert: Dashboard visible
+    await expect(page).toHaveURL('/dashboard');
+    await expect(page.getByText(`Welcome, ${user.name}`)).toBeVisible();
+
+    // Step 2: Add product to cart
+    await page.goto(`/products/${product.id}`);
+    await page.click('[data-testid="add-to-cart"]');
+    await cartPromise;
+    await expect(page.getByText('Added to cart')).toBeVisible();
+
+    // Step 3: Checkout with saved payment
+    await page.goto('/checkout');
+    await expect(page.getByText('Visa ending in 1234')).toBeVisible(); // Saved card
+    await page.click('[data-testid="use-saved-card"]');
+    await page.click('[data-testid="place-order"]');
+    await orderPromise;
+
+    // Assert: Order confirmation
+    await expect(page.getByText('Order Confirmed')).toBeVisible();
+    await expect(page.getByText(/Order #\d+/)).toBeVisible();
+    await expect(page.getByText('$29.99')).toBeVisible();
+  });
+});
+```
+
+**Key Points (E2E)**:
+
+- Tests complete user journey across multiple pages
+- API setup for data (fast), UI for assertions (user-centric)
+- Network-first interception to prevent flakiness
+- Validates critical revenue path end-to-end
+
+### Example 2: Integration Test (API/Service Layer)
+
+**Scenario**: UserService creates user and assigns role via AuthRepository.
+
+```typescript
+// tests/integration/user-service.spec.ts
+import { test, expect } from '@playwright/test';
+import { createUser } from '../test-utils/factories';
+
+test.describe('UserService Integration', () => {
+  test('should create user with admin role via API', async ({ request }) => {
+    const userData = createUser({ role: 'admin' });
+
+    // Direct API call (no UI)
+    const response = await request.post('/api/users', {
+      data: userData,
+    });
+
+    expect(response.status()).toBe(201);
+
+    const createdUser = await response.json();
+    expect(createdUser.id).toBeTruthy();
+    expect(createdUser.email).toBe(userData.email);
+    expect(createdUser.role).toBe('admin');
+
+    // Verify database state
+    const getResponse = await request.get(`/api/users/${createdUser.id}`);
+    expect(getResponse.status()).toBe(200);
+
+    const fetchedUser = await getResponse.json();
+    expect(fetchedUser.role).toBe('admin');
+    expect(fetchedUser.permissions).toContain('user:delete');
+    expect(fetchedUser.permissions).toContain('user:update');
+
+    // Cleanup
+    await request.delete(`/api/users/${createdUser.id}`);
+  });
+
+  test('should validate email uniqueness constraint', async ({ request }) => {
+    const userData = createUser({ email: 'duplicate@example.com' });
+
+    // Create first user
+    const response1 = await request.post('/api/users', { data: userData });
+    expect(response1.status()).toBe(201);
+
+    const user1 = await response1.json();
+
+    // Attempt duplicate email
+    const response2 = await request.post('/api/users', { data: userData });
+    expect(response2.status()).toBe(409); // Conflict
+    const error = await response2.json();
+    expect(error.message).toContain('Email already exists');
+
+    // Cleanup
+    await request.delete(`/api/users/${user1.id}`);
+  });
+});
+```
+
+**Key Points (Integration)**:
+
+- Tests service layer + database interaction
+- No UI involved—pure API validation
+- Business logic focus (role assignment, constraints)
+- Faster than E2E, more realistic than unit tests
+
+### Example 3: Component Test (Isolated UI Component)
+
+**Scenario**: Test button component in isolation with props and user interactions.
+
+```typescript
+// src/components/Button.cy.tsx (Cypress Component Test)
+import { Button } from './Button';
+
+describe('Button Component', () => {
+  it('should render with correct label', () => {
+    cy.mount(<Button label="Click Me" />);
+    cy.contains('Click Me').should('be.visible');
+  });
+
+  it('should call onClick handler when clicked', () => {
+    const onClickSpy = cy.stub().as('onClick');
+    cy.mount(<Button label="Submit" onClick={onClickSpy} />);
+
+    cy.get('button').click();
+    cy.get('@onClick').should('have.been.calledOnce');
+  });
+
+  it('should be disabled when disabled prop is true', () => {
+    cy.mount(<Button label="Disabled" disabled={true} />);
+    cy.get('button').should('be.disabled');
+    cy.get('button').should('have.attr', 'aria-disabled', 'true');
+  });
+
+  it('should show loading spinner when loading', () => {
+    cy.mount(<Button label="Loading" loading={true} />);
+    cy.get('[data-testid="spinner"]').should('be.visible');
+    cy.get('button').should('be.disabled');
+  });
+
+  it('should apply variant styles correctly', () => {
+    cy.mount(<Button label="Primary" variant="primary" />);
+    cy.get('button').should('have.class', 'btn-primary');
+
+    cy.mount(<Button label="Secondary" variant="secondary" />);
+    cy.get('button').should('have.class', 'btn-secondary');
+  });
+});
+
+// Playwright Component Test equivalent
+import { test, expect } from '@playwright/experimental-ct-react';
+import { Button } from './Button';
+
+test.describe('Button Component', () => {
+  test('should call onClick handler when clicked', async ({ mount }) => {
+    let clicked = false;
+    const component = await mount(
+      <Button label="Submit" onClick={() => { clicked = true; }} />
+    );
+
+    await component.getByRole('button').click();
+    expect(clicked).toBe(true);
+  });
+
+  test('should be disabled when loading', async ({ mount }) => {
+    const component = await mount(<Button label="Loading" loading={true} />);
+    await expect(component.getByRole('button')).toBeDisabled();
+    await expect(component.getByTestId('spinner')).toBeVisible();
+  });
+});
+```
+
+**Key Points (Component)**:
+
+- Tests UI component in isolation (no full app)
+- Props + user interactions + visual states
+- Faster than E2E, more realistic than unit tests for UI
+- Great for design system components
+
+### Example 4: Unit Test (Pure Function)
+
+**Scenario**: Test pure business logic function without framework dependencies.
+
+```typescript
+// src/utils/price-calculator.test.ts (Jest/Vitest)
+import { calculateDiscount, applyTaxes, calculateTotal } from './price-calculator';
+
+describe('PriceCalculator', () => {
+  describe('calculateDiscount', () => {
+    it('should apply percentage discount correctly', () => {
+      const result = calculateDiscount(100, { type: 'percentage', value: 20 });
+      expect(result).toBe(80);
+    });
+
+    it('should apply fixed amount discount correctly', () => {
+      const result = calculateDiscount(100, { type: 'fixed', value: 15 });
+      expect(result).toBe(85);
+    });
+
+    it('should not apply discount below zero', () => {
+      const result = calculateDiscount(10, { type: 'fixed', value: 20 });
+      expect(result).toBe(0);
+    });
+
+    it('should handle no discount', () => {
+      const result = calculateDiscount(100, { type: 'none', value: 0 });
+      expect(result).toBe(100);
+    });
+  });
+
+  describe('applyTaxes', () => {
+    it('should calculate tax correctly for US', () => {
+      const result = applyTaxes(100, { country: 'US', rate: 0.08 });
+      expect(result).toBe(108);
+    });
+
+    it('should calculate tax correctly for EU (VAT)', () => {
+      const result = applyTaxes(100, { country: 'DE', rate: 0.19 });
+      expect(result).toBe(119);
+    });
+
+    it('should handle zero tax rate', () => {
+      const result = applyTaxes(100, { country: 'US', rate: 0 });
+      expect(result).toBe(100);
+    });
+  });
+
+  describe('calculateTotal', () => {
+    it('should calculate total with discount and taxes', () => {
+      const items = [
+        { price: 50, quantity: 2 }, // 100
+        { price: 30, quantity: 1 }, // 30
+      ];
+      const discount = { type: 'percentage', value: 10 }; // -13
+      const tax = { country: 'US', rate: 0.08 }; // +9.36
+
+      const result = calculateTotal(items, discount, tax);
+      expect(result).toBeCloseTo(126.36, 2);
+    });
+
+    it('should handle empty items array', () => {
+      const result = calculateTotal([], { type: 'none', value: 0 }, { country: 'US', rate: 0 });
+      expect(result).toBe(0);
+    });
+
+    it('should calculate correctly without discount or tax', () => {
+      const items = [{ price: 25, quantity: 4 }];
+      const result = calculateTotal(items, { type: 'none', value: 0 }, { country: 'US', rate: 0 });
+      expect(result).toBe(100);
+    });
+  });
+});
+```
+
+**Key Points (Unit)**:
+
+- Pure function testing—no framework dependencies
+- Fast execution (milliseconds)
+- Edge case coverage (zero, negative, empty inputs)
+- High cyclomatic complexity handled at unit level
+
+## When to Use Which Level
+
+| Scenario               | Unit          | Integration       | E2E           |
+| ---------------------- | ------------- | ----------------- | ------------- |
+| Pure business logic    | ✅ Primary    | ❌ Overkill       | ❌ Overkill   |
+| Database operations    | ❌ Can't test | ✅ Primary        | ❌ Overkill   |
+| API contracts          | ❌ Can't test | ✅ Primary        | ⚠️ Supplement |
+| User journeys          | ❌ Can't test | ❌ Can't test     | ✅ Primary    |
+| Component props/events | ✅ Partial    | ⚠️ Component test | ❌ Overkill   |
+| Visual regression      | ❌ Can't test | ⚠️ Component test | ✅ Primary    |
+| Error handling (logic) | ✅ Primary    | ⚠️ Integration    | ❌ Overkill   |
+| Error handling (UI)    | ❌ Partial    | ⚠️ Component test | ✅ Primary    |
+
+## Anti-Pattern Examples
+
+**❌ BAD: E2E test for business logic**
+
+```typescript
+// DON'T DO THIS
+test('calculate discount via UI', async ({ page }) => {
+  await page.goto('/calculator');
+  await page.fill('[data-testid="price"]', '100');
+  await page.fill('[data-testid="discount"]', '20');
+  await page.click('[data-testid="calculate"]');
+  await expect(page.getByText('$80')).toBeVisible();
+});
+// Problem: Slow, brittle, tests logic that should be unit tested
+```
+
+**✅ GOOD: Unit test for business logic**
+
+```typescript
+test('calculate discount', () => {
+  expect(calculateDiscount(100, 20)).toBe(80);
+});
+// Fast, reliable, isolated
+```
+
+_Source: Murat Testing Philosophy (test pyramid), existing test-levels-framework.md structure._
--- a/src/modules/bmm/testarch/knowledge/test-priorities-matrix.md
+++ b/src/modules/bmm/testarch/knowledge/test-priorities-matrix.md
@@ -172,3 +172,202 @@ Review and adjust priorities based on:
 - Usage analytics
 - Test failure history
 - Business priority changes
+
+---
+
+## Automated Priority Classification
+
+### Example: Priority Calculator (Risk-Based Automation)
+
+```typescript
+// src/testing/priority-calculator.ts
+
+export type Priority = 'P0' | 'P1' | 'P2' | 'P3';
+
+export type PriorityFactors = {
+  revenueImpact: 'critical' | 'high' | 'medium' | 'low' | 'none';
+  userImpact: 'all' | 'majority' | 'some' | 'few' | 'minimal';
+  securityRisk: boolean;
+  complianceRequired: boolean;
+  previousFailure: boolean;
+  complexity: 'high' | 'medium' | 'low';
+  usage: 'frequent' | 'regular' | 'occasional' | 'rare';
+};
+
+/**
+ * Calculate test priority based on multiple factors
+ * Mirrors the priority decision tree with objective criteria
+ */
+export function calculatePriority(factors: PriorityFactors): Priority {
+  const { revenueImpact, userImpact, securityRisk, complianceRequired, previousFailure, complexity, usage } = factors;
+
+  // P0: Revenue-critical, security, or compliance
+  if (revenueImpact === 'critical' || securityRisk || complianceRequired || (previousFailure && revenueImpact === 'high')) {
+    return 'P0';
+  }
+
+  // P0: High revenue + high complexity + frequent usage
+  if (revenueImpact === 'high' && complexity === 'high' && usage === 'frequent') {
+    return 'P0';
+  }
+
+  // P1: Core user journey (majority impacted + frequent usage)
+  if (userImpact === 'all' || userImpact === 'majority') {
+    if (usage === 'frequent' || complexity === 'high') {
+      return 'P1';
+    }
+  }
+
+  // P1: High revenue OR high complexity with regular usage
+  if ((revenueImpact === 'high' && usage === 'regular') || (complexity === 'high' && usage === 'frequent')) {
+    return 'P1';
+  }
+
+  // P2: Secondary features (some impact, occasional usage)
+  if (userImpact === 'some' || usage === 'occasional') {
+    return 'P2';
+  }
+
+  // P3: Rarely used, low impact
+  return 'P3';
+}
+
+/**
+ * Generate priority justification (for audit trail)
+ */
+export function justifyPriority(factors: PriorityFactors): string {
+  const priority = calculatePriority(factors);
+  const reasons: string[] = [];
+
+  if (factors.revenueImpact === 'critical') reasons.push('critical revenue impact');
+  if (factors.securityRisk) reasons.push('security-critical');
+  if (factors.complianceRequired) reasons.push('compliance requirement');
+  if (factors.previousFailure) reasons.push('regression prevention');
+  if (factors.userImpact === 'all' || factors.userImpact === 'majority') {
+    reasons.push(`impacts ${factors.userImpact} users`);
+  }
+  if (factors.complexity === 'high') reasons.push('high complexity');
+  if (factors.usage === 'frequent') reasons.push('frequently used');
+
+  return `${priority}: ${reasons.join(', ')}`;
+}
+
+/**
+ * Example: Payment scenario priority calculation
+ */
+const paymentScenario: PriorityFactors = {
+  revenueImpact: 'critical',
+  userImpact: 'all',
+  securityRisk: true,
+  complianceRequired: true,
+  previousFailure: false,
+  complexity: 'high',
+  usage: 'frequent',
+};
+
+console.log(calculatePriority(paymentScenario)); // 'P0'
+console.log(justifyPriority(paymentScenario));
+// 'P0: critical revenue impact, security-critical, compliance requirement, impacts all users, high complexity, frequently used'
+```
+
+### Example: Test Suite Tagging Strategy
+
+```typescript
+// tests/e2e/checkout.spec.ts
+import { test, expect } from '@playwright/test';
+
+// Tag tests with priority for selective execution
+test.describe('Checkout Flow', () => {
+  test('valid payment completes successfully @p0 @smoke @revenue', async ({ page }) => {
+    // P0: Revenue-critical happy path
+    await page.goto('/checkout');
+    await page.getByTestId('payment-method').selectOption('credit-card');
+    await page.getByTestId('card-number').fill('4242424242424242');
+    await page.getByRole('button', { name: 'Place Order' }).click();
+
+    await expect(page.getByText('Order confirmed')).toBeVisible();
+  });
+
+  test('expired card shows user-friendly error @p1 @error-handling', async ({ page }) => {
+    // P1: Core error scenario (frequent user impact)
+    await page.goto('/checkout');
+    await page.getByTestId('payment-method').selectOption('credit-card');
+    await page.getByTestId('card-number').fill('4000000000000069'); // Test card: expired
+    await page.getByRole('button', { name: 'Place Order' }).click();
+
+    await expect(page.getByText('Card expired. Please use a different card.')).toBeVisible();
+  });
+
+  test('coupon code applies discount correctly @p2', async ({ page }) => {
+    // P2: Secondary feature (nice-to-have)
+    await page.goto('/checkout');
+    await page.getByTestId('coupon-code').fill('SAVE10');
+    await page.getByRole('button', { name: 'Apply' }).click();
+
+    await expect(page.getByText('10% discount applied')).toBeVisible();
+  });
+
+  test('gift message formatting preserved @p3', async ({ page }) => {
+    // P3: Cosmetic feature (rarely used)
+    await page.goto('/checkout');
+    await page.getByTestId('gift-message').fill('Happy Birthday!\n\nWith love.');
+    await page.getByRole('button', { name: 'Place Order' }).click();
+
+    // Message formatting preserved (linebreaks intact)
+    await expect(page.getByTestId('order-summary')).toContainText('Happy Birthday!');
+  });
+});
+```
+
+**Run tests by priority:**
+
+```bash
+# P0 only (smoke tests, 2-5 min)
+npx playwright test --grep @p0
+
+# P0 + P1 (core functionality, 10-15 min)
+npx playwright test --grep "@p0|@p1"
+
+# Full regression (all priorities, 30+ min)
+npx playwright test
+```
+
+---
+
+## Integration with Risk Scoring
+
+Priority should align with risk score from `probability-impact.md`:
+
+| Risk Score | Typical Priority | Rationale                                  |
+| ---------- | ---------------- | ------------------------------------------ |
+| 9          | P0               | Critical blocker (probability=3, impact=3) |
+| 6-8        | P0 or P1         | High risk (requires mitigation)            |
+| 4-5        | P1 or P2         | Medium risk (monitor closely)              |
+| 1-3        | P2 or P3         | Low risk (document and defer)              |
+
+**Example**: Risk score 9 (checkout API failure) → P0 priority → comprehensive coverage required.
+
+---
+
+## Priority Checklist
+
+Before finalizing test priorities:
+
+- [ ] **Revenue impact assessed**: Payment, subscription, billing features → P0
+- [ ] **Security risks identified**: Auth, data exposure, injection attacks → P0
+- [ ] **Compliance requirements documented**: GDPR, PCI-DSS, SOC2 → P0
+- [ ] **User impact quantified**: >50% users → P0/P1, <10% → P2/P3
+- [ ] **Previous failures reviewed**: Regression prevention → increase priority
+- [ ] **Complexity evaluated**: >500 LOC or multiple dependencies → increase priority
+- [ ] **Usage metrics consulted**: Frequent use → P0/P1, rare use → P2/P3
+- [ ] **Monitoring coverage confirmed**: Strong monitoring → can decrease priority
+- [ ] **Rollback capability verified**: Easy rollback → can decrease priority
+- [ ] **Priorities tagged in tests**: @p0, @p1, @p2, @p3 for selective execution
+
+## Integration Points
+
+- **Used in workflows**: `*automate` (priority-based test generation), `*test-design` (scenario prioritization), `*trace` (coverage validation by priority)
+- **Related fragments**: `risk-governance.md` (risk scoring), `probability-impact.md` (impact assessment), `selective-testing.md` (tag-based execution)
+- **Tools**: Playwright/Cypress grep for tag filtering, CI scripts for priority-based execution
+
+_Source: Risk-based testing practices, test prioritization strategies, production incident analysis_
--- a/src/modules/bmm/testarch/knowledge/test-quality.md
+++ b/src/modules/bmm/testarch/knowledge/test-quality.md
@@ -1,10 +1,664 @@
 # Test Quality Definition of Done

- No hard waits (`waitForTimeout`, `cy.wait(ms)`); rely on deterministic waits or event hooks.
- Each spec <300 lines and executes in ≤1.5 minutes.
- Tests are isolated, parallel-safe, and self-cleaning (seed via API/tasks, teardown after run).
- Assertions stay visible in test bodies; avoid conditional logic controlling test flow.
- Suites must pass locally and in CI with the same commands.
- Promote new tests only after they have failed for the intended reason at least once.
+## Principle

-_Source: Murat quality checklist._
+Tests must be deterministic, isolated, explicit, focused, and fast. Every test should execute in under 1.5 minutes, contain fewer than 300 lines, avoid hard waits and conditionals, keep assertions visible in test bodies, and clean up after itself for parallel execution.
+
+## Rationale
+
+Quality tests provide reliable signal about application health. Flaky tests erode confidence and waste engineering time. Tests that use hard waits (`waitForTimeout(3000)`) are non-deterministic and slow. Tests with hidden assertions or conditional logic become unmaintainable. Large tests (>300 lines) are hard to understand and debug. Slow tests (>1.5 min) block CI pipelines. Self-cleaning tests prevent state pollution in parallel runs.
+
+## Pattern Examples
+
+### Example 1: Deterministic Test Pattern
+
+**Context**: When writing tests, eliminate all sources of non-determinism: hard waits, conditionals controlling flow, try-catch for flow control, and random data without seeds.
+
+**Implementation**:
+
+```typescript
+// ❌ BAD: Non-deterministic test with conditionals and hard waits
+test('user can view dashboard - FLAKY', async ({ page }) => {
+  await page.goto('/dashboard');
+  await page.waitForTimeout(3000); // NEVER - arbitrary wait
+
+  // Conditional flow control - test behavior varies
+  if (await page.locator('[data-testid="welcome-banner"]').isVisible()) {
+    await page.click('[data-testid="dismiss-banner"]');
+    await page.waitForTimeout(500);
+  }
+
+  // Try-catch for flow control - hides real issues
+  try {
+    await page.click('[data-testid="load-more"]');
+  } catch (e) {
+    // Silently continue - test passes even if button missing
+  }
+
+  // Random data without control
+  const randomEmail = `user${Math.random()}@example.com`;
+  await expect(page.getByText(randomEmail)).toBeVisible(); // Will fail randomly
+});
+
+// ✅ GOOD: Deterministic test with explicit waits
+test('user can view dashboard', async ({ page, apiRequest }) => {
+  const user = createUser({ email: 'test@example.com', hasSeenWelcome: true });
+
+  // Setup via API (fast, controlled)
+  await apiRequest.post('/api/users', { data: user });
+
+  // Network-first: Intercept BEFORE navigate
+  const dashboardPromise = page.waitForResponse((resp) => resp.url().includes('/api/dashboard') && resp.status() === 200);
+
+  await page.goto('/dashboard');
+
+  // Wait for actual response, not arbitrary time
+  const dashboardResponse = await dashboardPromise;
+  const dashboard = await dashboardResponse.json();
+
+  // Explicit assertions with controlled data
+  await expect(page.getByText(`Welcome, ${user.name}`)).toBeVisible();
+  await expect(page.getByTestId('dashboard-items')).toHaveCount(dashboard.items.length);
+
+  // No conditionals - test always executes same path
+  // No try-catch - failures bubble up clearly
+});
+
+// Cypress equivalent
+describe('Dashboard', () => {
+  it('should display user dashboard', () => {
+    const user = createUser({ email: 'test@example.com', hasSeenWelcome: true });
+
+    // Setup via task (fast, controlled)
+    cy.task('db:seed', { users: [user] });
+
+    // Network-first interception
+    cy.intercept('GET', '**/api/dashboard').as('getDashboard');
+
+    cy.visit('/dashboard');
+
+    // Deterministic wait for response
+    cy.wait('@getDashboard').then((interception) => {
+      const dashboard = interception.response.body;
+
+      // Explicit assertions
+      cy.contains(`Welcome, ${user.name}`).should('be.visible');
+      cy.get('[data-cy="dashboard-items"]').should('have.length', dashboard.items.length);
+    });
+  });
+});
+```
+
+**Key Points**:
+
+- Replace `waitForTimeout()` with `waitForResponse()` or element state checks
+- Never use if/else to control test flow - tests should be deterministic
+- Avoid try-catch for flow control - let failures bubble up clearly
+- Use factory functions with controlled data, not `Math.random()`
+- Network-first pattern prevents race conditions
+
+### Example 2: Isolated Test with Cleanup
+
+**Context**: When tests create data, they must clean up after themselves to prevent state pollution in parallel runs. Use fixture auto-cleanup or explicit teardown.
+
+**Implementation**:
+
+```typescript
+// ❌ BAD: Test leaves data behind, pollutes other tests
+test('admin can create user - POLLUTES STATE', async ({ page, apiRequest }) => {
+  await page.goto('/admin/users');
+
+  // Hardcoded email - collides in parallel runs
+  await page.fill('[data-testid="email"]', 'newuser@example.com');
+  await page.fill('[data-testid="name"]', 'New User');
+  await page.click('[data-testid="create-user"]');
+
+  await expect(page.getByText('User created')).toBeVisible();
+
+  // NO CLEANUP - user remains in database
+  // Next test run fails: "Email already exists"
+});
+
+// ✅ GOOD: Test cleans up with fixture auto-cleanup
+// playwright/support/fixtures/database-fixture.ts
+import { test as base } from '@playwright/test';
+import { deleteRecord, seedDatabase } from '../helpers/db-helpers';
+
+type DatabaseFixture = {
+  seedUser: (userData: Partial<User>) => Promise<User>;
+};
+
+export const test = base.extend<DatabaseFixture>({
+  seedUser: async ({}, use) => {
+    const createdUsers: string[] = [];
+
+    const seedUser = async (userData: Partial<User>) => {
+      const user = await seedDatabase('users', userData);
+      createdUsers.push(user.id); // Track for cleanup
+      return user;
+    };
+
+    await use(seedUser);
+
+    // Auto-cleanup: Delete all users created during test
+    for (const userId of createdUsers) {
+      await deleteRecord('users', userId);
+    }
+    createdUsers.length = 0;
+  },
+});
+
+// Use the fixture
+test('admin can create user', async ({ page, seedUser }) => {
+  // Create admin with unique data
+  const admin = await seedUser({
+    email: faker.internet.email(), // Unique each run
+    role: 'admin',
+  });
+
+  await page.goto('/admin/users');
+
+  const newUserEmail = faker.internet.email(); // Unique
+  await page.fill('[data-testid="email"]', newUserEmail);
+  await page.fill('[data-testid="name"]', 'New User');
+  await page.click('[data-testid="create-user"]');
+
+  await expect(page.getByText('User created')).toBeVisible();
+
+  // Verify in database
+  const createdUser = await seedUser({ email: newUserEmail });
+  expect(createdUser.email).toBe(newUserEmail);
+
+  // Auto-cleanup happens via fixture teardown
+});
+
+// Cypress equivalent with explicit cleanup
+describe('Admin User Management', () => {
+  const createdUserIds: string[] = [];
+
+  afterEach(() => {
+    // Cleanup: Delete all users created during test
+    createdUserIds.forEach((userId) => {
+      cy.task('db:delete', { table: 'users', id: userId });
+    });
+    createdUserIds.length = 0;
+  });
+
+  it('should create user', () => {
+    const admin = createUser({ role: 'admin' });
+    const newUser = createUser(); // Unique data via faker
+
+    cy.task('db:seed', { users: [admin] }).then((result: any) => {
+      createdUserIds.push(result.users[0].id);
+    });
+
+    cy.visit('/admin/users');
+    cy.get('[data-cy="email"]').type(newUser.email);
+    cy.get('[data-cy="name"]').type(newUser.name);
+    cy.get('[data-cy="create-user"]').click();
+
+    cy.contains('User created').should('be.visible');
+
+    // Track for cleanup
+    cy.task('db:findByEmail', newUser.email).then((user: any) => {
+      createdUserIds.push(user.id);
+    });
+  });
+});
+```
+
+**Key Points**:
+
+- Use fixtures with auto-cleanup via teardown (after `use()`)
+- Track all created resources in array during test execution
+- Use `faker` for unique data - prevents parallel collisions
+- Cypress: Use `afterEach()` with explicit cleanup
+- Never hardcode IDs or emails - always generate unique values
+
+### Example 3: Explicit Assertions in Tests
+
+**Context**: When validating test results, keep assertions visible in test bodies. Never hide assertions in helper functions - this obscures test intent and makes failures harder to diagnose.
+
+**Implementation**:
+
+```typescript
+// ❌ BAD: Assertions hidden in helper functions
+// helpers/api-validators.ts
+export async function validateUserCreation(response: Response, expectedEmail: string) {
+  const user = await response.json();
+  expect(response.status()).toBe(201);
+  expect(user.email).toBe(expectedEmail);
+  expect(user.id).toBeTruthy();
+  expect(user.createdAt).toBeTruthy();
+  // Hidden assertions - not visible in test
+}
+
+test('create user via API - OPAQUE', async ({ request }) => {
+  const userData = createUser({ email: 'test@example.com' });
+
+  const response = await request.post('/api/users', { data: userData });
+
+  // What assertions are running? Have to check helper.
+  await validateUserCreation(response, userData.email);
+  // When this fails, error is: "validateUserCreation failed" - NOT helpful
+});
+
+// ✅ GOOD: Assertions explicit in test
+test('create user via API', async ({ request }) => {
+  const userData = createUser({ email: 'test@example.com' });
+
+  const response = await request.post('/api/users', { data: userData });
+
+  // All assertions visible - clear test intent
+  expect(response.status()).toBe(201);
+
+  const createdUser = await response.json();
+  expect(createdUser.id).toBeTruthy();
+  expect(createdUser.email).toBe(userData.email);
+  expect(createdUser.name).toBe(userData.name);
+  expect(createdUser.role).toBe('user');
+  expect(createdUser.createdAt).toBeTruthy();
+  expect(createdUser.isActive).toBe(true);
+
+  // When this fails, error is: "Expected role to be 'user', got 'admin'" - HELPFUL
+});
+
+// ✅ ACCEPTABLE: Helper for data extraction, NOT assertions
+// helpers/api-extractors.ts
+export async function extractUserFromResponse(response: Response): Promise<User> {
+  const user = await response.json();
+  return user; // Just extracts, no assertions
+}
+
+test('create user with extraction helper', async ({ request }) => {
+  const userData = createUser({ email: 'test@example.com' });
+
+  const response = await request.post('/api/users', { data: userData });
+
+  // Extract data with helper (OK)
+  const createdUser = await extractUserFromResponse(response);
+
+  // But keep assertions in test (REQUIRED)
+  expect(response.status()).toBe(201);
+  expect(createdUser.email).toBe(userData.email);
+  expect(createdUser.role).toBe('user');
+});
+
+// Cypress equivalent
+describe('User API', () => {
+  it('should create user with explicit assertions', () => {
+    const userData = createUser({ email: 'test@example.com' });
+
+    cy.request('POST', '/api/users', userData).then((response) => {
+      // All assertions visible in test
+      expect(response.status).to.equal(201);
+      expect(response.body.id).to.exist;
+      expect(response.body.email).to.equal(userData.email);
+      expect(response.body.name).to.equal(userData.name);
+      expect(response.body.role).to.equal('user');
+      expect(response.body.createdAt).to.exist;
+      expect(response.body.isActive).to.be.true;
+    });
+  });
+});
+
+// ✅ GOOD: Parametrized tests for soft assertions (bulk validation)
+test.describe('User creation validation', () => {
+  const testCases = [
+    { field: 'email', value: 'test@example.com', expected: 'test@example.com' },
+    { field: 'name', value: 'Test User', expected: 'Test User' },
+    { field: 'role', value: 'admin', expected: 'admin' },
+    { field: 'isActive', value: true, expected: true },
+  ];
+
+  for (const { field, value, expected } of testCases) {
+    test(`should set ${field} correctly`, async ({ request }) => {
+      const userData = createUser({ [field]: value });
+
+      const response = await request.post('/api/users', { data: userData });
+      const user = await response.json();
+
+      // Parametrized assertion - still explicit
+      expect(user[field]).toBe(expected);
+    });
+  }
+});
+```
+
+**Key Points**:
+
+- Never hide `expect()` calls in helper functions
+- Helpers can extract/transform data, but assertions stay in tests
+- Parametrized tests are acceptable for bulk validation (still explicit)
+- Explicit assertions make failures actionable: "Expected X, got Y"
+- Hidden assertions produce vague failures: "Helper function failed"
+
+### Example 4: Test Length Limits
+
+**Context**: When tests grow beyond 300 lines, they become hard to understand, debug, and maintain. Refactor long tests by extracting setup helpers, splitting scenarios, or using fixtures.
+
+**Implementation**:
+
+```typescript
+// ❌ BAD: 400-line monolithic test (truncated for example)
+test('complete user journey - TOO LONG', async ({ page, request }) => {
+  // 50 lines of setup
+  const admin = createUser({ role: 'admin' });
+  await request.post('/api/users', { data: admin });
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', admin.email);
+  await page.fill('[data-testid="password"]', 'password123');
+  await page.click('[data-testid="login"]');
+  await expect(page).toHaveURL('/dashboard');
+
+  // 100 lines of user creation
+  await page.goto('/admin/users');
+  const newUser = createUser();
+  await page.fill('[data-testid="email"]', newUser.email);
+  // ... 95 more lines of form filling, validation, etc.
+
+  // 100 lines of permissions assignment
+  await page.click('[data-testid="assign-permissions"]');
+  // ... 95 more lines
+
+  // 100 lines of notification preferences
+  await page.click('[data-testid="notification-settings"]');
+  // ... 95 more lines
+
+  // 50 lines of cleanup
+  await request.delete(`/api/users/${newUser.id}`);
+  // ... 45 more lines
+
+  // TOTAL: 400 lines - impossible to understand or debug
+});
+
+// ✅ GOOD: Split into focused tests with shared fixture
+// playwright/support/fixtures/admin-fixture.ts
+export const test = base.extend({
+  adminPage: async ({ page, request }, use) => {
+    // Shared setup: Login as admin
+    const admin = createUser({ role: 'admin' });
+    await request.post('/api/users', { data: admin });
+
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', admin.email);
+    await page.fill('[data-testid="password"]', 'password123');
+    await page.click('[data-testid="login"]');
+    await expect(page).toHaveURL('/dashboard');
+
+    await use(page); // Provide logged-in page
+
+    // Cleanup handled by fixture
+  },
+});
+
+// Test 1: User creation (50 lines)
+test('admin can create user', async ({ adminPage, seedUser }) => {
+  await adminPage.goto('/admin/users');
+
+  const newUser = createUser();
+  await adminPage.fill('[data-testid="email"]', newUser.email);
+  await adminPage.fill('[data-testid="name"]', newUser.name);
+  await adminPage.click('[data-testid="role-dropdown"]');
+  await adminPage.click('[data-testid="role-user"]');
+  await adminPage.click('[data-testid="create-user"]');
+
+  await expect(adminPage.getByText('User created')).toBeVisible();
+  await expect(adminPage.getByText(newUser.email)).toBeVisible();
+
+  // Verify in database
+  const created = await seedUser({ email: newUser.email });
+  expect(created.role).toBe('user');
+});
+
+// Test 2: Permission assignment (60 lines)
+test('admin can assign permissions', async ({ adminPage, seedUser }) => {
+  const user = await seedUser({ email: faker.internet.email() });
+
+  await adminPage.goto(`/admin/users/${user.id}`);
+  await adminPage.click('[data-testid="assign-permissions"]');
+  await adminPage.check('[data-testid="permission-read"]');
+  await adminPage.check('[data-testid="permission-write"]');
+  await adminPage.click('[data-testid="save-permissions"]');
+
+  await expect(adminPage.getByText('Permissions updated')).toBeVisible();
+
+  // Verify permissions assigned
+  const response = await adminPage.request.get(`/api/users/${user.id}`);
+  const updated = await response.json();
+  expect(updated.permissions).toContain('read');
+  expect(updated.permissions).toContain('write');
+});
+
+// Test 3: Notification preferences (70 lines)
+test('admin can update notification preferences', async ({ adminPage, seedUser }) => {
+  const user = await seedUser({ email: faker.internet.email() });
+
+  await adminPage.goto(`/admin/users/${user.id}/notifications`);
+  await adminPage.check('[data-testid="email-notifications"]');
+  await adminPage.uncheck('[data-testid="sms-notifications"]');
+  await adminPage.selectOption('[data-testid="frequency"]', 'daily');
+  await adminPage.click('[data-testid="save-preferences"]');
+
+  await expect(adminPage.getByText('Preferences saved')).toBeVisible();
+
+  // Verify preferences
+  const response = await adminPage.request.get(`/api/users/${user.id}/preferences`);
+  const prefs = await response.json();
+  expect(prefs.emailEnabled).toBe(true);
+  expect(prefs.smsEnabled).toBe(false);
+  expect(prefs.frequency).toBe('daily');
+});
+
+// TOTAL: 3 tests × 60 lines avg = 180 lines
+// Each test is focused, debuggable, and under 300 lines
+```
+
+**Key Points**:
+
+- Split monolithic tests into focused scenarios (<300 lines each)
+- Extract common setup into fixtures (auto-runs for each test)
+- Each test validates one concern (user creation, permissions, preferences)
+- Failures are easier to diagnose: "Permission assignment failed" vs "Complete journey failed"
+- Tests can run in parallel (isolated concerns)
+
+### Example 5: Execution Time Optimization
+
+**Context**: When tests take longer than 1.5 minutes, they slow CI pipelines and feedback loops. Optimize by using API setup instead of UI navigation, parallelizing independent operations, and avoiding unnecessary waits.
+
+**Implementation**:
+
+```typescript
+// ❌ BAD: 4-minute test (slow setup, sequential operations)
+test('user completes order - SLOW (4 min)', async ({ page }) => {
+  // Step 1: Manual signup via UI (90 seconds)
+  await page.goto('/signup');
+  await page.fill('[data-testid="email"]', 'buyer@example.com');
+  await page.fill('[data-testid="password"]', 'password123');
+  await page.fill('[data-testid="confirm-password"]', 'password123');
+  await page.fill('[data-testid="name"]', 'Buyer User');
+  await page.click('[data-testid="signup"]');
+  await page.waitForURL('/verify-email'); // Wait for email verification
+  // ... manual email verification flow
+
+  // Step 2: Manual product creation via UI (60 seconds)
+  await page.goto('/admin/products');
+  await page.fill('[data-testid="product-name"]', 'Widget');
+  // ... 20 more fields
+  await page.click('[data-testid="create-product"]');
+
+  // Step 3: Navigate to checkout (30 seconds)
+  await page.goto('/products');
+  await page.waitForTimeout(5000); // Unnecessary hard wait
+  await page.click('[data-testid="product-widget"]');
+  await page.waitForTimeout(3000); // Unnecessary
+  await page.click('[data-testid="add-to-cart"]');
+  await page.waitForTimeout(2000); // Unnecessary
+
+  // Step 4: Complete checkout (40 seconds)
+  await page.goto('/checkout');
+  await page.waitForTimeout(5000); // Unnecessary
+  await page.fill('[data-testid="credit-card"]', '4111111111111111');
+  // ... more form filling
+  await page.click('[data-testid="submit-order"]');
+  await page.waitForTimeout(10000); // Unnecessary
+
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+
+  // TOTAL: ~240 seconds (4 minutes)
+});
+
+// ✅ GOOD: 45-second test (API setup, parallel ops, deterministic waits)
+test('user completes order', async ({ page, apiRequest }) => {
+  // Step 1: API setup (parallel, 5 seconds total)
+  const [user, product] = await Promise.all([
+    // Create user via API (fast)
+    apiRequest
+      .post('/api/users', {
+        data: createUser({
+          email: 'buyer@example.com',
+          emailVerified: true, // Skip verification
+        }),
+      })
+      .then((r) => r.json()),
+
+    // Create product via API (fast)
+    apiRequest
+      .post('/api/products', {
+        data: createProduct({
+          name: 'Widget',
+          price: 29.99,
+          stock: 10,
+        }),
+      })
+      .then((r) => r.json()),
+  ]);
+
+  // Step 2: Auth setup via storage state (instant, 0 seconds)
+  await page.context().addCookies([
+    {
+      name: 'auth_token',
+      value: user.token,
+      domain: 'localhost',
+      path: '/',
+    },
+  ]);
+
+  // Step 3: Network-first interception BEFORE navigation (10 seconds)
+  const cartPromise = page.waitForResponse('**/api/cart');
+  const orderPromise = page.waitForResponse('**/api/orders');
+
+  await page.goto(`/products/${product.id}`);
+  await page.click('[data-testid="add-to-cart"]');
+  await cartPromise; // Deterministic wait (no hard wait)
+
+  // Step 4: Checkout with network waits (30 seconds)
+  await page.goto('/checkout');
+  await page.fill('[data-testid="credit-card"]', '4111111111111111');
+  await page.fill('[data-testid="cvv"]', '123');
+  await page.fill('[data-testid="expiry"]', '12/25');
+  await page.click('[data-testid="submit-order"]');
+  await orderPromise; // Deterministic wait (no hard wait)
+
+  await expect(page.getByText('Order Confirmed')).toBeVisible();
+  await expect(page.getByText(`Order #${product.id}`)).toBeVisible();
+
+  // TOTAL: ~45 seconds (6x faster)
+});
+
+// Cypress equivalent
+describe('Order Flow', () => {
+  it('should complete purchase quickly', () => {
+    // Step 1: API setup (parallel, fast)
+    const user = createUser({ emailVerified: true });
+    const product = createProduct({ name: 'Widget', price: 29.99 });
+
+    cy.task('db:seed', { users: [user], products: [product] });
+
+    // Step 2: Auth setup via session (instant)
+    cy.setCookie('auth_token', user.token);
+
+    // Step 3: Network-first interception
+    cy.intercept('POST', '**/api/cart').as('addToCart');
+    cy.intercept('POST', '**/api/orders').as('createOrder');
+
+    cy.visit(`/products/${product.id}`);
+    cy.get('[data-cy="add-to-cart"]').click();
+    cy.wait('@addToCart'); // Deterministic wait
+
+    // Step 4: Checkout
+    cy.visit('/checkout');
+    cy.get('[data-cy="credit-card"]').type('4111111111111111');
+    cy.get('[data-cy="cvv"]').type('123');
+    cy.get('[data-cy="expiry"]').type('12/25');
+    cy.get('[data-cy="submit-order"]').click();
+    cy.wait('@createOrder'); // Deterministic wait
+
+    cy.contains('Order Confirmed').should('be.visible');
+    cy.contains(`Order #${product.id}`).should('be.visible');
+  });
+});
+
+// Additional optimization: Shared auth state (0 seconds per test)
+// playwright/support/global-setup.ts
+export default async function globalSetup() {
+  const browser = await chromium.launch();
+  const page = await browser.newPage();
+
+  // Create admin user once for all tests
+  const admin = createUser({ role: 'admin', emailVerified: true });
+  await page.request.post('/api/users', { data: admin });
+
+  // Login once, save session
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', admin.email);
+  await page.fill('[data-testid="password"]', 'password123');
+  await page.click('[data-testid="login"]');
+
+  // Save auth state for reuse
+  await page.context().storageState({ path: 'playwright/.auth/admin.json' });
+
+  await browser.close();
+}
+
+// Use shared auth in tests (instant)
+test.use({ storageState: 'playwright/.auth/admin.json' });
+
+test('admin action', async ({ page }) => {
+  // Already logged in - no auth overhead (0 seconds)
+  await page.goto('/admin');
+  // ... test logic
+});
+```
+
+**Key Points**:
+
+- Use API for data setup (10-50x faster than UI)
+- Run independent operations in parallel (`Promise.all`)
+- Replace hard waits with deterministic waits (`waitForResponse`)
+- Reuse auth sessions via `storageState` (Playwright) or `setCookie` (Cypress)
+- Skip unnecessary flows (email verification, multi-step signups)
+
+## Integration Points
+
+- **Used in workflows**: `*atdd` (test generation quality), `*automate` (test expansion quality), `*test-review` (quality validation)
+- **Related fragments**:
+  - `network-first.md` - Deterministic waiting strategies
+  - `data-factories.md` - Isolated, parallel-safe data patterns
+  - `fixture-architecture.md` - Setup extraction and cleanup
+  - `test-levels-framework.md` - Choosing appropriate test granularity for speed
+
+## Core Quality Checklist
+
+Every test must pass these criteria:
+
+- [ ] **No Hard Waits** - Use `waitForResponse`, `waitForLoadState`, or element state (not `waitForTimeout`)
+- [ ] **No Conditionals** - Tests execute the same path every time (no if/else, try/catch for flow control)
+- [ ] **< 300 Lines** - Keep tests focused; split large tests or extract setup to fixtures
+- [ ] **< 1.5 Minutes** - Optimize with API setup, parallel operations, and shared auth
+- [ ] **Self-Cleaning** - Use fixtures with auto-cleanup or explicit `afterEach()` teardown
+- [ ] **Explicit Assertions** - Keep `expect()` calls in test bodies, not hidden in helpers
+- [ ] **Unique Data** - Use `faker` for dynamic data; never hardcode IDs or emails
+- [ ] **Parallel-Safe** - Tests don't share state; run successfully with `--workers=4`
+
+_Source: Murat quality checklist, Definition of Done requirements (lines 370-381, 406-422)._
--- a/src/modules/bmm/testarch/knowledge/timing-debugging.md
+++ b/src/modules/bmm/testarch/knowledge/timing-debugging.md
@@ -0,0 +1,372 @@
+# Timing Debugging and Race Condition Fixes
+
+## Principle
+
+Race conditions arise when tests make assumptions about asynchronous timing (network, animations, state updates). **Deterministic waiting** eliminates flakiness by explicitly waiting for observable events (network responses, element state changes) instead of arbitrary timeouts.
+
+## Rationale
+
+**The Problem**: Tests pass locally but fail in CI (different timing), or pass/fail randomly (race conditions). Hard waits (`waitForTimeout`, `sleep`) mask timing issues without solving them.
+
+**The Solution**: Replace all hard waits with event-based waits (`waitForResponse`, `waitFor({ state })`). Implement network-first pattern (intercept before navigate). Use explicit state checks (loading spinner detached, data loaded). This makes tests deterministic regardless of network speed or system load.
+
+**Why This Matters**:
+
+- Eliminates flaky tests (0 tolerance for timing-based failures)
+- Works consistently across environments (local, CI, production-like)
+- Faster test execution (no unnecessary waits)
+- Clearer test intent (explicit about what we're waiting for)
+
+## Pattern Examples
+
+### Example 1: Race Condition Identification (Network-First Pattern)
+
+**Context**: Prevent race conditions by intercepting network requests before navigation
+
+**Implementation**:
+
+```typescript
+// tests/timing/race-condition-prevention.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Race Condition Prevention Patterns', () => {
+  test('❌ Anti-Pattern: Navigate then intercept (race condition)', async ({ page, context }) => {
+    // BAD: Navigation starts before interception ready
+    await page.goto('/products'); // ⚠️ Race! API might load before route is set
+
+    await context.route('**/api/products', (route) => {
+      route.fulfill({ status: 200, body: JSON.stringify({ products: [] }) });
+    });
+
+    // Test may see real API response or mock (non-deterministic)
+  });
+
+  test('✅ Pattern: Intercept BEFORE navigate (deterministic)', async ({ page, context }) => {
+    // GOOD: Interception ready before navigation
+    await context.route('**/api/products', (route) => {
+      route.fulfill({
+        status: 200,
+        contentType: 'application/json',
+        body: JSON.stringify({
+          products: [
+            { id: 1, name: 'Product A', price: 29.99 },
+            { id: 2, name: 'Product B', price: 49.99 },
+          ],
+        }),
+      });
+    });
+
+    const responsePromise = page.waitForResponse('**/api/products');
+
+    await page.goto('/products'); // Navigation happens AFTER route is ready
+    await responsePromise; // Explicit wait for network
+
+    // Test sees mock response reliably (deterministic)
+    await expect(page.getByText('Product A')).toBeVisible();
+  });
+
+  test('✅ Pattern: Wait for element state change (loading → loaded)', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // Wait for loading indicator to appear (confirms load started)
+    await page.getByTestId('loading-spinner').waitFor({ state: 'visible' });
+
+    // Wait for loading indicator to disappear (confirms load complete)
+    await page.getByTestId('loading-spinner').waitFor({ state: 'detached' });
+
+    // Content now reliably visible
+    await expect(page.getByTestId('dashboard-data')).toBeVisible();
+  });
+
+  test('✅ Pattern: Explicit visibility check (not just presence)', async ({ page }) => {
+    await page.goto('/modal-demo');
+
+    await page.getByRole('button', { name: 'Open Modal' }).click();
+
+    // ❌ Bad: Element exists but may not be visible yet
+    // await expect(page.getByTestId('modal')).toBeAttached()
+
+    // ✅ Good: Wait for visibility (accounts for animations)
+    await expect(page.getByTestId('modal')).toBeVisible();
+    await expect(page.getByRole('heading', { name: 'Modal Title' })).toBeVisible();
+  });
+
+  test('❌ Anti-Pattern: waitForLoadState("networkidle") in SPAs', async ({ page }) => {
+    // ⚠️ Deprecated for SPAs (WebSocket connections never idle)
+    // await page.goto('/dashboard')
+    // await page.waitForLoadState('networkidle') // May timeout in SPAs
+
+    // ✅ Better: Wait for specific API response
+    const responsePromise = page.waitForResponse('**/api/dashboard');
+    await page.goto('/dashboard');
+    await responsePromise;
+
+    await expect(page.getByText('Dashboard loaded')).toBeVisible();
+  });
+});
+```
+
+**Key Points**:
+
+- Network-first: ALWAYS intercept before navigate (prevents race conditions)
+- State changes: Wait for loading spinner detached (explicit load completion)
+- Visibility vs presence: `toBeVisible()` accounts for animations, `toBeAttached()` doesn't
+- Avoid networkidle: Unreliable in SPAs (WebSocket, polling connections)
+- Explicit waits: Document exactly what we're waiting for
+
+---
+
+### Example 2: Deterministic Waiting Patterns (Event-Based, Not Time-Based)
+
+**Context**: Replace all hard waits with observable event waits
+
+**Implementation**:
+
+```typescript
+// tests/timing/deterministic-waits.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Deterministic Waiting Patterns', () => {
+  test('waitForResponse() with URL pattern', async ({ page }) => {
+    const responsePromise = page.waitForResponse('**/api/products');
+
+    await page.goto('/products');
+    await responsePromise; // Deterministic (waits for exact API call)
+
+    await expect(page.getByText('Products loaded')).toBeVisible();
+  });
+
+  test('waitForResponse() with predicate function', async ({ page }) => {
+    const responsePromise = page.waitForResponse((resp) => resp.url().includes('/api/search') && resp.status() === 200);
+
+    await page.goto('/search');
+    await page.getByPlaceholder('Search').fill('laptop');
+    await page.getByRole('button', { name: 'Search' }).click();
+
+    await responsePromise; // Wait for successful search response
+
+    await expect(page.getByTestId('search-results')).toBeVisible();
+  });
+
+  test('waitForFunction() for custom conditions', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // Wait for custom JavaScript condition
+    await page.waitForFunction(() => {
+      const element = document.querySelector('[data-testid="user-count"]');
+      return element && parseInt(element.textContent || '0') > 0;
+    });
+
+    // User count now loaded
+    await expect(page.getByTestId('user-count')).not.toHaveText('0');
+  });
+
+  test('waitFor() element state (attached, visible, hidden, detached)', async ({ page }) => {
+    await page.goto('/products');
+
+    // Wait for element to be attached to DOM
+    await page.getByTestId('product-list').waitFor({ state: 'attached' });
+
+    // Wait for element to be visible (animations complete)
+    await page.getByTestId('product-list').waitFor({ state: 'visible' });
+
+    // Perform action
+    await page.getByText('Product A').click();
+
+    // Wait for modal to be hidden (close animation complete)
+    await page.getByTestId('modal').waitFor({ state: 'hidden' });
+  });
+
+  test('Cypress: cy.wait() with aliased intercepts', async () => {
+    // Cypress example (not Playwright)
+    /*
+    cy.intercept('GET', '/api/products').as('getProducts')
+    cy.visit('/products')
+    cy.wait('@getProducts') // Deterministic wait for specific request
+
+    cy.get('[data-testid="product-list"]').should('be.visible')
+    */
+  });
+});
+```
+
+**Key Points**:
+
+- `waitForResponse()`: Wait for specific API calls (URL pattern or predicate)
+- `waitForFunction()`: Wait for custom JavaScript conditions
+- `waitFor({ state })`: Wait for element state changes (attached, visible, hidden, detached)
+- Cypress `cy.wait('@alias')`: Deterministic wait for aliased intercepts
+- All waits are event-based (not time-based)
+
+---
+
+### Example 3: Timing Anti-Patterns (What NEVER to Do)
+
+**Context**: Common timing mistakes that cause flakiness
+
+**Problem Examples**:
+
+```typescript
+// tests/timing/anti-patterns.spec.ts
+import { test, expect } from '@playwright/test';
+
+test.describe('Timing Anti-Patterns to Avoid', () => {
+  test('❌ NEVER: page.waitForTimeout() (arbitrary delay)', async ({ page }) => {
+    await page.goto('/dashboard');
+
+    // ❌ Bad: Arbitrary 3-second wait (flaky)
+    // await page.waitForTimeout(3000)
+    // Problem: Might be too short (CI slower) or too long (wastes time)
+
+    // ✅ Good: Wait for observable event
+    await page.waitForResponse('**/api/dashboard');
+    await expect(page.getByText('Dashboard loaded')).toBeVisible();
+  });
+
+  test('❌ NEVER: cy.wait(number) without alias (arbitrary delay)', async () => {
+    // Cypress example
+    /*
+    // ❌ Bad: Arbitrary delay
+    cy.visit('/products')
+    cy.wait(2000) // Flaky!
+
+    // ✅ Good: Wait for specific request
+    cy.intercept('GET', '/api/products').as('getProducts')
+    cy.visit('/products')
+    cy.wait('@getProducts') // Deterministic
+    */
+  });
+
+  test('❌ NEVER: Multiple hard waits in sequence (compounding delays)', async ({ page }) => {
+    await page.goto('/checkout');
+
+    // ❌ Bad: Stacked hard waits (6+ seconds wasted)
+    // await page.waitForTimeout(2000) // Wait for form
+    // await page.getByTestId('email').fill('test@example.com')
+    // await page.waitForTimeout(1000) // Wait for validation
+    // await page.getByTestId('submit').click()
+    // await page.waitForTimeout(3000) // Wait for redirect
+
+    // ✅ Good: Event-based waits (no wasted time)
+    await page.getByTestId('checkout-form').waitFor({ state: 'visible' });
+    await page.getByTestId('email').fill('test@example.com');
+    await page.waitForResponse('**/api/validate-email');
+    await page.getByTestId('submit').click();
+    await page.waitForURL('**/confirmation');
+  });
+
+  test('❌ NEVER: waitForLoadState("networkidle") in SPAs', async ({ page }) => {
+    // ❌ Bad: Unreliable in SPAs (WebSocket connections never idle)
+    // await page.goto('/dashboard')
+    // await page.waitForLoadState('networkidle') // Timeout in SPAs!
+
+    // ✅ Good: Wait for specific API responses
+    await page.goto('/dashboard');
+    await page.waitForResponse('**/api/dashboard');
+    await page.waitForResponse('**/api/user');
+    await expect(page.getByTestId('dashboard-content')).toBeVisible();
+  });
+
+  test('❌ NEVER: Sleep/setTimeout in tests', async ({ page }) => {
+    await page.goto('/products');
+
+    // ❌ Bad: Node.js sleep (blocks test thread)
+    // await new Promise(resolve => setTimeout(resolve, 2000))
+
+    // ✅ Good: Playwright auto-waits for element
+    await expect(page.getByText('Products loaded')).toBeVisible();
+  });
+});
+```
+
+**Why These Fail**:
+
+- **Hard waits**: Arbitrary timeouts (too short → flaky, too long → slow)
+- **Stacked waits**: Compound delays (wasteful, unreliable)
+- **networkidle**: Broken in SPAs (WebSocket/polling never idle)
+- **Sleep**: Blocks execution (wastes time, doesn't solve race conditions)
+
+**Better Approach**: Use event-based waits from examples above
+
+---
+
+## Async Debugging Techniques
+
+### Technique 1: Promise Chain Analysis
+
+```typescript
+test('debug async waterfall with console logs', async ({ page }) => {
+  console.log('1. Starting navigation...');
+  await page.goto('/products');
+
+  console.log('2. Waiting for API response...');
+  const response = await page.waitForResponse('**/api/products');
+  console.log('3. API responded:', response.status());
+
+  console.log('4. Waiting for UI update...');
+  await expect(page.getByText('Products loaded')).toBeVisible();
+  console.log('5. Test complete');
+
+  // Console output shows exactly where timing issue occurs
+});
+```
+
+### Technique 2: Network Waterfall Inspection (DevTools)
+
+```typescript
+test('inspect network timing with trace viewer', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // Generate trace for analysis
+  // npx playwright test --trace on
+  // npx playwright show-trace trace.zip
+
+  // In trace viewer:
+  // 1. Check Network tab for API call timing
+  // 2. Identify slow requests (>1s response time)
+  // 3. Find race conditions (overlapping requests)
+  // 4. Verify request order (dependencies)
+});
+```
+
+### Technique 3: Trace Viewer for Timing Visualization
+
+```typescript
+test('use trace viewer to debug timing', async ({ page }) => {
+  // Run with trace: npx playwright test --trace on
+
+  await page.goto('/checkout');
+  await page.getByTestId('submit').click();
+
+  // In trace viewer, examine:
+  // - Timeline: See exact timing of each action
+  // - Snapshots: Hover to see DOM state at each moment
+  // - Network: Identify slow/failed requests
+  // - Console: Check for async errors
+
+  await expect(page.getByText('Success')).toBeVisible();
+});
+```
+
+---
+
+## Race Condition Checklist
+
+Before deploying tests:
+
+- [ ] **Network-first pattern**: All routes intercepted BEFORE navigation (no race conditions)
+- [ ] **Explicit waits**: Every navigation followed by `waitForResponse()` or state check
+- [ ] **No hard waits**: Zero instances of `waitForTimeout()`, `cy.wait(number)`, `sleep()`
+- [ ] **Element state waits**: Loading spinners use `waitFor({ state: 'detached' })`
+- [ ] **Visibility checks**: Use `toBeVisible()` (accounts for animations), not just `toBeAttached()`
+- [ ] **Response validation**: Wait for successful responses (`resp.ok()` or `status === 200`)
+- [ ] **Trace viewer analysis**: Generate traces to identify timing issues (network waterfall, console errors)
+- [ ] **CI/local parity**: Tests pass reliably in both environments (no timing assumptions)
+
+## Integration Points
+
+- **Used in workflows**: `*automate` (healing timing failures), `*test-review` (detect hard wait anti-patterns), `*framework` (configure timeout standards)
+- **Related fragments**: `test-healing-patterns.md` (race condition diagnosis), `network-first.md` (interception patterns), `playwright-config.md` (timeout configuration), `visual-debugging.md` (trace viewer analysis)
+- **Tools**: Playwright Inspector (`--debug`), Trace Viewer (`--trace on`), DevTools Network tab
+
+_Source: Playwright timing best practices, network-first pattern from test-resources-for-ai, production race condition debugging_
--- a/src/modules/bmm/testarch/knowledge/visual-debugging.md
+++ b/src/modules/bmm/testarch/knowledge/visual-debugging.md
@@ -1,9 +1,524 @@
 # Visual Debugging and Developer Ergonomics

- Keep Playwright trace viewer, Cypress runner, and Storybook accessible in CI artifacts to speed up reproduction.
- Record short screen captures only-on-failure; pair them with HAR or console logs to avoid guesswork.
- Document common trace navigation steps (network tab, action timeline) so new contributors diagnose issues quickly.
- Encourage live-debug sessions with component harnesses to validate behaviour before writing full E2E specs.
- Integrate accessibility tooling (axe, Playwright audits) into the same debug workflow to catch regressions early.
+## Principle

-_Source: Murat DX blog posts, Playwright book appendix on debugging._
+Fast feedback loops and transparent debugging artifacts are critical for maintaining test reliability and developer confidence. Visual debugging tools (trace viewers, screenshots, videos, HAR files) turn cryptic test failures into actionable insights, reducing triage time from hours to minutes.
+
+## Rationale
+
+**The Problem**: CI failures often provide minimal context—a timeout, a selector mismatch, or a network error—forcing developers to reproduce issues locally (if they can). This wastes time and discourages test maintenance.
+
+**The Solution**: Capture rich debugging artifacts **only on failure** to balance storage costs with diagnostic value. Modern tools like Playwright Trace Viewer, Cypress Debug UI, and HAR recordings provide interactive, time-travel debugging that reveals exactly what the test saw at each step.
+
+**Why This Matters**:
+
+- Reduces failure triage time by 80-90% (visual context vs logs alone)
+- Enables debugging without local reproduction
+- Improves test maintenance confidence (clear failure root cause)
+- Catches timing/race conditions that are hard to reproduce locally
+
+## Pattern Examples
+
+### Example 1: Playwright Trace Viewer Configuration (Production Pattern)
+
+**Context**: Capture traces on first retry only (balances storage and diagnostics)
+
+**Implementation**:
+
+```typescript
+// playwright.config.ts
+import { defineConfig } from '@playwright/test';
+
+export default defineConfig({
+  use: {
+    // Visual debugging artifacts (space-efficient)
+    trace: 'on-first-retry', // Only when test fails once
+    screenshot: 'only-on-failure', // Not on success
+    video: 'retain-on-failure', // Delete on pass
+
+    // Context for debugging
+    baseURL: process.env.BASE_URL || 'http://localhost:3000',
+
+    // Timeout context
+    actionTimeout: 15_000, // 15s for clicks/fills
+    navigationTimeout: 30_000, // 30s for page loads
+  },
+
+  // CI-specific artifact retention
+  reporter: [
+    ['html', { outputFolder: 'playwright-report', open: 'never' }],
+    ['junit', { outputFile: 'results.xml' }],
+    ['list'], // Console output
+  ],
+
+  // Failure handling
+  retries: process.env.CI ? 2 : 0, // Retry in CI to capture trace
+  workers: process.env.CI ? 1 : undefined,
+});
+```
+
+**Opening and Using Trace Viewer**:
+
+```bash
+# After test failure in CI, download trace artifact
+# Then open locally:
+npx playwright show-trace path/to/trace.zip
+
+# Or serve trace viewer:
+npx playwright show-report
+```
+
+**Key Features to Use in Trace Viewer**:
+
+1. **Timeline**: See each action (click, navigate, assertion) with timing
+2. **Snapshots**: Hover over timeline to see DOM state at that moment
+3. **Network Tab**: Inspect all API calls, headers, payloads, timing
+4. **Console Tab**: View console.log/error messages
+5. **Source Tab**: See test code with execution markers
+6. **Metadata**: Browser, OS, test duration, screenshots
+
+**Why This Works**:
+
+- `on-first-retry` avoids capturing traces for flaky passes (saves storage)
+- Screenshots + video give visual context without trace overhead
+- Interactive timeline makes timing issues obvious (race conditions, slow API)
+
+---
+
+### Example 2: HAR File Recording for Network Debugging
+
+**Context**: Capture all network activity for reproducible API debugging
+
+**Implementation**:
+
+```typescript
+// tests/e2e/checkout-with-har.spec.ts
+import { test, expect } from '@playwright/test';
+import path from 'path';
+
+test.describe('Checkout Flow with HAR Recording', () => {
+  test('should complete payment with full network capture', async ({ page, context }) => {
+    // Start HAR recording BEFORE navigation
+    await context.routeFromHAR(path.join(__dirname, '../fixtures/checkout.har'), {
+      url: '**/api/**', // Only capture API calls
+      update: true, // Update HAR if file exists
+    });
+
+    await page.goto('/checkout');
+
+    // Interact with page
+    await page.getByTestId('payment-method').selectOption('credit-card');
+    await page.getByTestId('card-number').fill('4242424242424242');
+    await page.getByTestId('submit-payment').click();
+
+    // Wait for payment confirmation
+    await expect(page.getByTestId('success-message')).toBeVisible();
+
+    // HAR file saved to fixtures/checkout.har
+    // Contains all network requests/responses for replay
+  });
+});
+```
+
+**Using HAR for Deterministic Mocking**:
+
+```typescript
+// tests/e2e/checkout-replay-har.spec.ts
+import { test, expect } from '@playwright/test';
+import path from 'path';
+
+test('should replay checkout flow from HAR', async ({ page, context }) => {
+  // Replay network from HAR (no real API calls)
+  await context.routeFromHAR(path.join(__dirname, '../fixtures/checkout.har'), {
+    url: '**/api/**',
+    update: false, // Read-only mode
+  });
+
+  await page.goto('/checkout');
+
+  // Same test, but network responses come from HAR file
+  await page.getByTestId('payment-method').selectOption('credit-card');
+  await page.getByTestId('card-number').fill('4242424242424242');
+  await page.getByTestId('submit-payment').click();
+
+  await expect(page.getByTestId('success-message')).toBeVisible();
+});
+```
+
+**Key Points**:
+
+- **`update: true`** records new HAR or updates existing (for flaky API debugging)
+- **`update: false`** replays from HAR (deterministic, no real API)
+- Filter by URL pattern (`**/api/**`) to avoid capturing static assets
+- HAR files are human-readable JSON (easy to inspect/modify)
+
+**When to Use HAR**:
+
+- Debugging flaky tests caused by API timing/responses
+- Creating deterministic mocks for integration tests
+- Analyzing third-party API behavior (Stripe, Auth0)
+- Reproducing production issues locally (record HAR in staging)
+
+---
+
+### Example 3: Custom Artifact Capture (Console Logs + Network on Failure)
+
+**Context**: Capture additional debugging context automatically on test failure
+
+**Implementation**:
+
+```typescript
+// playwright/support/fixtures/debug-fixture.ts
+import { test as base } from '@playwright/test';
+import fs from 'fs';
+import path from 'path';
+
+type DebugFixture = {
+  captureDebugArtifacts: () => Promise<void>;
+};
+
+export const test = base.extend<DebugFixture>({
+  captureDebugArtifacts: async ({ page }, use, testInfo) => {
+    const consoleLogs: string[] = [];
+    const networkRequests: Array<{ url: string; status: number; method: string }> = [];
+
+    // Capture console messages
+    page.on('console', (msg) => {
+      consoleLogs.push(`[${msg.type()}] ${msg.text()}`);
+    });
+
+    // Capture network requests
+    page.on('request', (request) => {
+      networkRequests.push({
+        url: request.url(),
+        method: request.method(),
+        status: 0, // Will be updated on response
+      });
+    });
+
+    page.on('response', (response) => {
+      const req = networkRequests.find((r) => r.url === response.url());
+      if (req) req.status = response.status();
+    });
+
+    await use(async () => {
+      // This function can be called manually in tests
+      // But it also runs automatically on failure via afterEach
+    });
+
+    // After test completes, save artifacts if failed
+    if (testInfo.status !== testInfo.expectedStatus) {
+      const artifactDir = path.join(testInfo.outputDir, 'debug-artifacts');
+      fs.mkdirSync(artifactDir, { recursive: true });
+
+      // Save console logs
+      fs.writeFileSync(path.join(artifactDir, 'console.log'), consoleLogs.join('\n'), 'utf-8');
+
+      // Save network summary
+      fs.writeFileSync(path.join(artifactDir, 'network.json'), JSON.stringify(networkRequests, null, 2), 'utf-8');
+
+      console.log(`Debug artifacts saved to: ${artifactDir}`);
+    }
+  },
+});
+```
+
+**Usage in Tests**:
+
+```typescript
+// tests/e2e/payment-with-debug.spec.ts
+import { test, expect } from '../support/fixtures/debug-fixture';
+
+test('payment flow captures debug artifacts on failure', async ({ page, captureDebugArtifacts }) => {
+  await page.goto('/checkout');
+
+  // Test will automatically capture console + network on failure
+  await page.getByTestId('submit-payment').click();
+  await expect(page.getByTestId('success-message')).toBeVisible({ timeout: 5000 });
+
+  // If this fails, console.log and network.json saved automatically
+});
+```
+
+**CI Integration (GitHub Actions)**:
+
+```yaml
+# .github/workflows/e2e.yml
+name: E2E Tests with Artifacts
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version-file: '.nvmrc'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run Playwright tests
+        run: npm run test:e2e
+        continue-on-error: true # Capture artifacts even on failure
+
+      - name: Upload test artifacts on failure
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-artifacts
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+```
+
+**Key Points**:
+
+- Fixtures automatically capture context without polluting test code
+- Only saves artifacts on failure (storage-efficient)
+- CI uploads artifacts for post-mortem analysis
+- `continue-on-error: true` ensures artifact upload even when tests fail
+
+---
+
+### Example 4: Accessibility Debugging Integration (axe-core in Trace Viewer)
+
+**Context**: Catch accessibility regressions during visual debugging
+
+**Implementation**:
+
+```typescript
+// playwright/support/fixtures/a11y-fixture.ts
+import { test as base } from '@playwright/test';
+import AxeBuilder from '@axe-core/playwright';
+
+type A11yFixture = {
+  checkA11y: () => Promise<void>;
+};
+
+export const test = base.extend<A11yFixture>({
+  checkA11y: async ({ page }, use) => {
+    await use(async () => {
+      // Run axe accessibility scan
+      const results = await new AxeBuilder({ page }).analyze();
+
+      // Attach results to test report (visible in trace viewer)
+      if (results.violations.length > 0) {
+        console.log(`Found ${results.violations.length} accessibility violations:`);
+        results.violations.forEach((violation) => {
+          console.log(`- [${violation.impact}] ${violation.id}: ${violation.description}`);
+          console.log(`  Help: ${violation.helpUrl}`);
+        });
+
+        throw new Error(`Accessibility violations found: ${results.violations.length}`);
+      }
+    });
+  },
+});
+```
+
+**Usage with Visual Debugging**:
+
+```typescript
+// tests/e2e/checkout-a11y.spec.ts
+import { test, expect } from '../support/fixtures/a11y-fixture';
+
+test('checkout page is accessible', async ({ page, checkA11y }) => {
+  await page.goto('/checkout');
+
+  // Verify page loaded
+  await expect(page.getByRole('heading', { name: 'Checkout' })).toBeVisible();
+
+  // Run accessibility check
+  await checkA11y();
+
+  // If violations found, test fails and trace captures:
+  // - Screenshot showing the problematic element
+  // - Console log with violation details
+  // - Network tab showing any failed resource loads
+});
+```
+
+**Trace Viewer Benefits**:
+
+- **Screenshot shows visual context** of accessibility issue (contrast, missing labels)
+- **Console tab shows axe-core violations** with impact level and helpUrl
+- **DOM snapshot** allows inspecting ARIA attributes at failure point
+- **Network tab** reveals if icon fonts or images failed (common a11y issue)
+
+**Cypress Equivalent**:
+
+```javascript
+// cypress/support/commands.ts
+import 'cypress-axe';
+
+Cypress.Commands.add('checkA11y', (context = null, options = {}) => {
+  cy.injectAxe(); // Inject axe-core
+  cy.checkA11y(context, options, (violations) => {
+    if (violations.length) {
+      cy.task('log', `Found ${violations.length} accessibility violations`);
+      violations.forEach((violation) => {
+        cy.task('log', `- [${violation.impact}] ${violation.id}: ${violation.description}`);
+      });
+    }
+  });
+});
+
+// tests/e2e/checkout-a11y.cy.ts
+describe('Checkout Accessibility', () => {
+  it('should have no a11y violations', () => {
+    cy.visit('/checkout');
+    cy.injectAxe();
+    cy.checkA11y();
+    // On failure, Cypress UI shows:
+    // - Screenshot of page
+    // - Console log with violation details
+    // - Network tab with API calls
+  });
+});
+```
+
+**Key Points**:
+
+- Accessibility checks integrate seamlessly with visual debugging
+- Violations are captured in trace viewer/Cypress UI automatically
+- Provides actionable links (helpUrl) to fix issues
+- Screenshots show visual context (contrast, layout)
+
+---
+
+### Example 5: Time-Travel Debugging Workflow (Playwright Inspector)
+
+**Context**: Debug tests interactively with step-through execution
+
+**Implementation**:
+
+```typescript
+// tests/e2e/checkout-debug.spec.ts
+import { test, expect } from '@playwright/test';
+
+test('debug checkout flow step-by-step', async ({ page }) => {
+  // Set breakpoint by uncommenting this:
+  // await page.pause()
+
+  await page.goto('/checkout');
+
+  // Use Playwright Inspector to:
+  // 1. Step through each action
+  // 2. Inspect DOM at each step
+  // 3. View network calls per action
+  // 4. Take screenshots manually
+
+  await page.getByTestId('payment-method').selectOption('credit-card');
+
+  // Pause here to inspect form state
+  // await page.pause()
+
+  await page.getByTestId('card-number').fill('4242424242424242');
+  await page.getByTestId('submit-payment').click();
+
+  await expect(page.getByTestId('success-message')).toBeVisible();
+});
+```
+
+**Running with Inspector**:
+
+```bash
+# Open Playwright Inspector (GUI debugger)
+npx playwright test --debug
+
+# Or use headed mode with slowMo
+npx playwright test --headed --slow-mo=1000
+
+# Debug specific test
+npx playwright test checkout-debug.spec.ts --debug
+
+# Set environment variable for persistent debugging
+PWDEBUG=1 npx playwright test
+```
+
+**Inspector Features**:
+
+1. **Step-through execution**: Click "Next" to execute one action at a time
+2. **DOM inspector**: Hover over elements to see selectors
+3. **Network panel**: See API calls with timing
+4. **Console panel**: View console.log output
+5. **Pick locator**: Click element in browser to get selector
+6. **Record mode**: Record interactions to generate test code
+
+**Common Debugging Patterns**:
+
+```typescript
+// Pattern 1: Debug selector issues
+test('debug selector', async ({ page }) => {
+  await page.goto('/dashboard');
+  await page.pause(); // Inspector opens
+
+  // In Inspector console, test selectors:
+  // page.getByTestId('user-menu') ✅
+  // page.getByRole('button', { name: 'Profile' }) ✅
+  // page.locator('.btn-primary') ❌ (fragile)
+});
+
+// Pattern 2: Debug timing issues
+test('debug network timing', async ({ page }) => {
+  await page.goto('/dashboard');
+
+  // Set up network listener BEFORE interaction
+  const responsePromise = page.waitForResponse('**/api/users');
+  await page.getByTestId('load-users').click();
+
+  await page.pause(); // Check network panel for timing
+
+  const response = await responsePromise;
+  expect(response.status()).toBe(200);
+});
+
+// Pattern 3: Debug state changes
+test('debug state mutation', async ({ page }) => {
+  await page.goto('/cart');
+
+  // Check initial state
+  await expect(page.getByTestId('cart-count')).toHaveText('0');
+
+  await page.pause(); // Inspect DOM
+
+  await page.getByTestId('add-to-cart').click();
+
+  await page.pause(); // Inspect DOM again (compare state)
+
+  await expect(page.getByTestId('cart-count')).toHaveText('1');
+});
+```
+
+**Key Points**:
+
+- `page.pause()` opens Inspector at that exact moment
+- Inspector shows DOM state, network activity, console at pause point
+- "Pick locator" feature helps find robust selectors
+- Record mode generates test code from manual interactions
+
+---
+
+## Visual Debugging Checklist
+
+Before deploying tests to CI, ensure:
+
+- [ ] **Artifact configuration**: `trace: 'on-first-retry'`, `screenshot: 'only-on-failure'`, `video: 'retain-on-failure'`
+- [ ] **CI artifact upload**: GitHub Actions/GitLab CI configured to upload `test-results/` and `playwright-report/`
+- [ ] **HAR recording**: Set up for flaky API tests (record once, replay deterministically)
+- [ ] **Custom debug fixtures**: Console logs + network summary captured on failure
+- [ ] **Accessibility integration**: axe-core violations visible in trace viewer
+- [ ] **Trace viewer docs**: README explains how to open traces locally (`npx playwright show-trace`)
+- [ ] **Inspector workflow**: Document `--debug` flag for interactive debugging
+- [ ] **Storage optimization**: Artifacts deleted after 30 days (CI retention policy)
+
+## Integration Points
+
+- **Used in workflows**: `*framework` (initial setup), `*ci` (artifact upload), `*test-review` (validate artifact config)
+- **Related fragments**: `playwright-config.md` (artifact configuration), `ci-burn-in.md` (CI artifact upload), `test-quality.md` (debugging best practices)
+- **Tools**: Playwright Trace Viewer, Cypress Debug UI, axe-core, HAR files
+
+_Source: Playwright official docs, Murat testing philosophy (visual debugging manifesto), SEON production debugging patterns_
--- a/src/modules/bmm/testarch/tea-index.csv
+++ b/src/modules/bmm/testarch/tea-index.csv
@@ -17,3 +17,6 @@ test-quality,Test Quality Definition of Done,"Execution limits, isolation rules,
 nfr-criteria,NFR Review Criteria,"Security, performance, reliability, maintainability status definitions","nfr,assessment,quality",knowledge/nfr-criteria.md
 test-levels,Test Levels Framework,"Guidelines for choosing unit, integration, or end-to-end coverage","testing,levels,selection",knowledge/test-levels-framework.md
 test-priorities,Test Priorities Matrix,"P0–P3 criteria, coverage targets, execution ordering","testing,prioritization,risk",knowledge/test-priorities-matrix.md
+test-healing-patterns,Test Healing Patterns,"Common failure patterns and automated fixes","healing,debugging,patterns",knowledge/test-healing-patterns.md
+selector-resilience,Selector Resilience,"Robust selector strategies and debugging techniques","selectors,locators,debugging",knowledge/selector-resilience.md
+timing-debugging,Timing Debugging,"Race condition identification and deterministic wait fixes","timing,async,debugging",knowledge/timing-debugging.md
--- a/src/modules/bmm/workflows/testarch/README.md
+++ b/src/modules/bmm/workflows/testarch/README.md
@@ -9,13 +9,18 @@ This directory houses the per-command workflows used by the Test Architect agent
 - `automate` – expands regression coverage after implementation.
 - `ci` – bootstraps CI/CD pipelines aligned with TEA practices.
 - `test-design` – combines risk assessment and coverage planning.
- `trace` – maps requirements to implemented automated tests.
+- `trace` – maps requirements to tests (Phase 1) and makes quality gate decisions (Phase 2).
 - `nfr-assess` – evaluates non-functional requirements.
- `gate` – records the release decision in the gate file.
+- `test-review` – reviews test quality using knowledge base patterns and generates quality score.
+
+**Note**: The `gate` workflow has been merged into `trace` as Phase 2. The `*trace` command now performs both requirements-to-tests traceability mapping AND quality gate decision (PASS/CONCERNS/FAIL/WAIVED) in a single atomic operation.

 Each subdirectory contains:

- `instructions.md` – the slim workflow instructions.
- `workflow.yaml` – metadata consumed by the BMAD workflow runner.
+- `README.md` – comprehensive workflow documentation with usage, inputs, outputs, and integration notes.
+- `instructions.md` – detailed workflow steps in pure markdown v4.0 format.
+- `workflow.yaml` – metadata, variables, and configuration for BMAD workflow runner.
+- `checklist.md` – validation checklist for quality assurance and completeness verification.
+- `template.md` – output template for workflow deliverables (where applicable).

 The TEA agent now invokes these workflows via `run-workflow` rather than executing instruction files directly.
--- a/src/modules/bmm/workflows/testarch/atdd/README.md
+++ b/src/modules/bmm/workflows/testarch/atdd/README.md
@@ -0,0 +1,672 @@
+# ATDD (Acceptance Test-Driven Development) Workflow
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. Creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development toward passing tests.
+
+**Core Principle**: Tests fail first (red phase), guide development to green, then enable confident refactoring.
+
+## Usage
+
+```bash
+bmad tea *atdd
+```
+
+The TEA agent runs this workflow when:
+
+- User story is approved with clear acceptance criteria
+- Development is about to begin (before any implementation code)
+- Team is practicing Test-Driven Development (TDD)
+- Need to establish test-first contract with DEV team
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria, functional requirements, and technical constraints
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) from framework workflow
+
+**Workflow Variables:**
+
+- `story_file`: Path to story markdown with acceptance criteria (required)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `test_framework`: Detected from framework workflow (playwright or cypress)
+- `test_levels`: Which test levels to generate (default: "e2e,api,component")
+- `primary_level`: Primary test level for acceptance criteria (default: "e2e")
+- `start_failing`: Tests must fail initially - red phase (default: true)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `network_first`: Route interception before navigation to prevent race conditions (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `generate_factories`: Create data factory stubs using faker (default: true)
+- `generate_fixtures`: Create fixture architecture with auto-cleanup (default: true)
+- `auto_cleanup`: Fixtures clean up their data automatically (default: true)
+- `include_data_testids`: List required data-testid attributes for DEV (default: true)
+- `include_mock_requirements`: Document mock/stub needs (default: true)
+- `auto_load_knowledge`: Load fixture-architecture, data-factories, component-tdd fragments (default: true)
+- `share_with_dev`: Provide implementation checklist to DEV agent (default: true)
+- `output_checklist`: Path for implementation checklist (default: `{output_folder}/atdd-checklist-{story_id}.md`)
+
+**Optional Context:**
+
+- **Test design document**: For risk/priority context alignment (P0-P3 scenarios)
+- **Existing fixtures/helpers**: For consistency with established patterns
+- **Architecture documents**: For understanding system boundaries and integration points
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **ATDD Checklist** (`atdd-checklist-{story_id}.md`): Implementation guide containing:
+  - Story summary and acceptance criteria breakdown
+  - Test files created with paths and line counts
+  - Data factories created with patterns
+  - Fixtures created with auto-cleanup logic
+  - Mock requirements for external services
+  - Required data-testid attributes list
+  - Implementation checklist mapping tests to code tasks
+  - Red-green-refactor workflow guidance
+  - Execution commands for running tests
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Full user journey tests for critical paths
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and service contract tests
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI component behavior tests
+
+**Supporting Infrastructure:**
+
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Factory functions using @faker-js/faker for generating test data with overrides support
+- **Test fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Playwright fixtures with setup/teardown and auto-cleanup
+- **Mock/stub documentation**: Requirements for external service mocking (payment gateways, email services, etc.)
+- **data-testid requirements**: List of required test IDs for stable selectors in UI implementation
+
+**Validation Safeguards:**
+
+- All tests must fail initially (red phase verified by local test run)
+- Failure messages are clear and actionable
+- Tests use Given-When-Then format for readability
+- Network-first pattern applied (route interception before navigation)
+- One assertion per test (atomic test design)
+- No hard waits or sleeps (explicit waits only)
+
+## Key Features
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA Agent responsibility):
+
+- Write failing tests first defining expected behavior
+- Tests fail for right reason (missing implementation, not test bugs)
+- All supporting infrastructure (factories, fixtures, mocks) created
+
+**GREEN Phase** (DEV Agent responsibility):
+
+- Implement minimal code to pass one test at a time
+- Use implementation checklist as guide
+- Run tests frequently to verify progress
+
+**REFACTOR Phase** (DEV Agent responsibility):
+
+- Improve code quality with confidence (tests provide safety net)
+- Extract duplications, optimize performance
+- Ensure tests still pass after changes
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- Visual regression and state management
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+**Selection Strategy**: Avoid duplicate coverage. Use E2E for critical happy path, API for business logic variations, component for UI edge cases, unit for pure logic.
+
+### Recording Mode (NEW - Phase 2.5)
+
+**atdd** can record complex UI interactions instead of AI generation.
+
+**Activation**: Automatic for complex UI when config.tea_use_mcp_enhancements is true and MCP available
+
+- Fallback: AI generation (silent, automatic)
+
+**When to Use Recording Mode:**
+
+- ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+- ✅ Visual workflows (modals, dialogs, animations)
+- ✅ Unclear requirements (exploratory, discovering expected behavior)
+- ✅ Multi-page flows (checkout, registration, onboarding)
+- ❌ NOT for simple CRUD (AI generation faster)
+- ❌ NOT for API-only tests (no UI to record)
+
+**When to Use AI Generation (Default):**
+
+- ✅ Clear acceptance criteria available
+- ✅ Standard patterns (login, CRUD, navigation)
+- ✅ Need many tests quickly
+- ✅ API/backend tests (no UI interaction)
+
+**How Test Generation Works (Default - AI-Based):**
+
+TEA generates tests using AI by:
+
+1. **Analyzing acceptance criteria** from story markdown
+2. **Inferring selectors** from requirement descriptions (e.g., "login button" → `[data-testid="login-button"]`)
+3. **Synthesizing test code** based on knowledge base patterns
+4. **Estimating interactions** using common UI patterns (click, type, verify)
+5. **Applying best practices** from knowledge fragments (Given-When-Then, network-first, fixtures)
+
+**This works well for:**
+
+- ✅ Clear requirements with known UI patterns
+- ✅ Standard workflows (login, CRUD, navigation)
+- ✅ When selectors follow conventions (data-testid attributes)
+
+**What MCP Adds (Interactive Verification & Enhancement):**
+
+When Playwright MCP is available, TEA **additionally**:
+
+1. **Verifies generated tests** by:
+   - **Launching real browser** with `generator_setup_page`
+   - **Executing generated test steps** with `browser_*` tools (`navigate`, `click`, `type`)
+   - **Seeing actual UI** with `browser_snapshot` (visual verification)
+   - **Discovering real selectors** with `browser_generate_locator` (auto-generate from live DOM)
+
+2. **Enhances AI-generated tests** by:
+   - **Validating selectors exist** in actual DOM (not just guesses)
+   - **Verifying behavior** with `browser_verify_text`, `browser_verify_visible`, `browser_verify_url`
+   - **Capturing actual interaction log** with `generator_read_log`
+   - **Refining test code** with real observed behavior
+
+3. **Catches issues early** by:
+   - **Finding missing selectors** before DEV implements (requirements clarification)
+   - **Discovering edge cases** not in requirements (loading states, error messages)
+   - **Validating assumptions** about UI structure and behavior
+
+**Key Benefits of MCP Enhancement:**
+
+- ✅ **AI generates tests** (fast, based on requirements) **+** **MCP verifies tests** (accurate, based on reality)
+- ✅ **Accurate selectors**: Validated against actual DOM, not just inferred
+- ✅ **Visual validation**: TEA sees what user sees (modals, animations, state changes)
+- ✅ **Complex flows**: Records multi-step interactions precisely
+- ✅ **Edge case discovery**: Observes actual app behavior beyond requirements
+- ✅ **Selector resilience**: MCP generates robust locators from live page (role-based, text-based, fallback chains)
+
+**Example Enhancement Flow:**
+
+```
+1. AI generates test based on acceptance criteria
+   → await page.click('[data-testid="submit-button"]')
+
+2. MCP verifies selector exists (browser_generate_locator)
+   → Found: button[type="submit"].btn-primary
+   → No data-testid attribute exists!
+
+3. TEA refines test with actual selector
+   → await page.locator('button[type="submit"]').click()
+   → Documents requirement: "Add data-testid='submit-button' to button"
+```
+
+**Recording Workflow (MCP-Based):**
+
+```
+1. Set generation_mode: "recording"
+2. Use generator_setup_page to init recording session
+3. For each acceptance criterion:
+   a. Execute scenario with browser_* tools:
+      - browser_navigate, browser_click, browser_type
+      - browser_select, browser_check
+   b. Add verifications with browser_verify_* tools:
+      - browser_verify_text, browser_verify_visible
+      - browser_verify_url
+   c. Capture log with generator_read_log
+   d. Generate test with generator_write_test
+4. Enhance generated tests with knowledge base patterns:
+   - Add Given-When-Then comments
+   - Replace selectors with data-testid
+   - Add network-first interception
+   - Add fixtures/factories
+5. Verify tests fail (RED phase)
+```
+
+**Example: Recording a Checkout Flow**
+
+```markdown
+Recording session for: "User completes checkout with credit card"
+
+Actions recorded:
+
+1. browser_navigate('/cart')
+2. browser_click('[data-testid="checkout-button"]')
+3. browser_type('[data-testid="card-number"]', '4242424242424242')
+4. browser_type('[data-testid="expiry"]', '12/25')
+5. browser_type('[data-testid="cvv"]', '123')
+6. browser_click('[data-testid="place-order"]')
+7. browser_verify_text('Order confirmed')
+8. browser_verify_url('/confirmation')
+
+Generated test (enhanced):
+
+- Given-When-Then structure added
+- data-testid selectors used
+- Network-first payment API mock added
+- Card factory created for test data
+- Test verified to FAIL (checkout not implemented)
+```
+
+**Graceful Degradation:**
+
+- Recording mode is OPTIONAL (default: AI generation)
+- Requires Playwright MCP (falls back to AI if unavailable)
+- Generated tests enhanced with knowledge base patterns
+- Same quality output regardless of generation method
+
+### Given-When-Then Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('should display error for invalid credentials', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits invalid credentials
+  await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+  await page.fill('[data-testid="password-input"]', 'wrongpassword');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: Error message is displayed
+  await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+});
+```
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', 'password123');
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user (automatic)
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures via mergeTests)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### One Assertion Per Test (Atomic Design)
+
+Each test should verify exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Implementation Checklist for DEV
+
+Maps each failing test to concrete implementation tasks:
+
+```markdown
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+- [ ] Implement error handling
+- [ ] Run test: `npm run test:e2e -- login.spec.ts`
+- [ ] ✅ Test passes (green phase)
+```
+
+Provides clear path from red to green for each test.
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Must run first to establish test framework architecture (Playwright or Cypress config, directory structure, base fixtures)
+- **test-design** workflow: Optional but recommended for P0-P3 priority alignment and risk assessment context
+
+**After this workflow:**
+
+- **DEV agent** implements features guided by failing tests and implementation checklist
+- **test-review** workflow: Review generated test quality before sharing with DEV team
+- **automate** workflow: After story completion, expand regression suite with additional edge case coverage
+
+**Coordinates with:**
+
+- **Story approval process**: ATDD runs after story is approved but before DEV begins implementation
+- **Quality gates**: Failing tests serve as acceptance criteria for story completion (all tests must pass)
+
+## Important Notes
+
+### ATDD is Test-First, Not Test-After
+
+**Critical timing**: Tests must be written BEFORE any implementation code. This ensures:
+
+- Tests define the contract (what needs to be built)
+- Implementation is guided by tests (no over-engineering)
+- Tests verify behavior, not implementation details
+- Confidence in refactoring (tests catch regressions)
+
+### All Tests Must Fail Initially
+
+**Red phase verification is mandatory**:
+
+- Run tests locally after creation to confirm RED phase
+- Failure should be due to missing implementation, not test bugs
+- Failure messages should be clear and actionable
+- Document expected failure messages in ATDD checklist
+
+If a test passes before implementation, it's not testing the right thing.
+
+### Use data-testid for Stable Selectors
+
+**Why data-testid?**
+
+- CSS classes change frequently (styling refactors)
+- IDs may not be unique or stable
+- Text content changes with localization
+- data-testid is explicit contract between tests and UI
+
+```typescript
+// ✅ CORRECT: Stable selector
+await page.click('[data-testid="login-button"]');
+
+// ❌ FRAGILE: Class-based selector
+await page.click('.btn.btn-primary.login-btn');
+```
+
+ATDD checklist includes complete list of required data-testid attributes for DEV team.
+
+### No Hard Waits or Sleeps
+
+**Use explicit waits only**:
+
+```typescript
+// ✅ CORRECT: Explicit wait for condition
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Hard wait (flaky, slow)
+await page.waitForTimeout(2000);
+```
+
+Playwright's auto-waiting is preferred (expect() automatically waits up to timeout).
+
+### Component Tests for Complex UI Only
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard navigation)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+Component tests are valuable but should complement, not replace, E2E and API tests.
+
+### Auto-Cleanup is Non-Negotiable
+
+**Every test must clean up its data**:
+
+- Use fixtures with automatic teardown
+- Never leave test data in database/storage
+- Each test should be isolated (no shared state)
+
+**Cleanup patterns:**
+
+- Fixtures: Cleanup in teardown function
+- Factories: Provide deletion helpers
+- Tests: Use `test.afterEach()` for manual cleanup if needed
+
+Without auto-cleanup, tests become flaky and depend on execution order.
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing (@playwright/experimental-ct-react)
+- **network-first.md** - Route interception patterns (intercept before navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping and additional references.
+
+## Example Output
+
+After running this workflow, the ATDD checklist will contain:
+
+````markdown
+# ATDD Checklist - Epic 3, Story 5: User Authentication
+
+## Story Summary
+
+As a user, I want to log in with email and password so that I can access my personalized dashboard.
+
+## Acceptance Criteria
+
+1. User can log in with valid credentials
+2. User sees error message with invalid credentials
+3. User is redirected to dashboard after successful login
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests (3 tests)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - ✅ should log in with valid credentials (RED - missing /login route)
+  - ✅ should display error for invalid credentials (RED - error message not implemented)
+  - ✅ should redirect to dashboard after login (RED - redirect logic missing)
+
+### API Tests (2 tests)
+
+- `tests/api/auth.api.spec.ts` (54 lines)
+  - ✅ POST /api/auth/login - should return token for valid credentials (RED - endpoint not implemented)
+  - ✅ POST /api/auth/login - should return 401 for invalid credentials (RED - validation missing)
+
+## Data Factories Created
+
+- `tests/support/factories/user.factory.ts` - createUser(), createUsers(count)
+
+## Fixtures Created
+
+- `tests/support/fixtures/auth.fixture.ts` - authenticatedUser fixture with auto-cleanup
+
+## Required data-testid Attributes
+
+### Login Page
+
+- `email-input` - Email input field
+- `password-input` - Password input field
+- `login-button` - Submit button
+- `error-message` - Error message container
+
+### Dashboard Page
+
+- `user-name` - User name display
+- `logout-button` - Logout button
+
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add data-testid attributes: `email-input`, `password-input`, `login-button`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Display Error for Invalid Credentials
+
+- [ ] Add error state management
+- [ ] Display error message UI
+- [ ] Add `data-testid="error-message"`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Redirect to Dashboard After Login
+
+- [ ] Implement redirect logic after successful auth
+- [ ] Verify authentication token stored
+- [ ] Add dashboard route protection
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+## Running Tests
+
+```bash
+# Run all failing tests
+npm run test:e2e
+
+# Run specific test file
+npm run test:e2e -- user-authentication.spec.ts
+
+# Run tests in headed mode (see browser)
+npm run test:e2e -- --headed
+
+# Debug specific test
+npm run test:e2e -- user-authentication.spec.ts --debug
+```
+````
+
+## Red-Green-Refactor Workflow
+
+**RED Phase** (Complete):
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created
+- ✅ data-testid requirements documented
+
+**GREEN Phase** (DEV Team - Next Steps):
+
+1. Pick one failing test from checklist
+2. Implement minimal code to make it pass
+3. Run test to verify green
+4. Check off task in checklist
+5. Move to next test
+6. Repeat until all tests pass
+
+**REFACTOR Phase** (DEV Team - After All Tests Pass):
+
+1. All tests passing (green)
+2. Improve code quality (extract functions, optimize)
+3. Remove duplications
+4. Ensure tests still pass after each refactor
+
+## Next Steps
+
+1. Review this checklist with team
+2. Run failing tests to confirm RED phase: `npm run test:e2e`
+3. Begin implementation using checklist as guide
+4. Share progress in daily standup
+5. When all tests pass, run `bmad sm story-approved` to move story to DONE
+
+```
+
+This comprehensive checklist guides DEV team from red to green with clear tasks and validation steps.
+```
--- a/src/modules/bmm/workflows/testarch/atdd/atdd-checklist-template.md
+++ b/src/modules/bmm/workflows/testarch/atdd/atdd-checklist-template.md
@@ -0,0 +1,363 @@
+# ATDD Checklist - Epic {epic_num}, Story {story_num}: {story_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Primary Test Level:** {primary_level}
+
+---
+
+## Story Summary
+
+{Brief 2-3 sentence summary of the user story}
+
+**As a** {user_role}
+**I want** {feature_description}
+**So that** {business_value}
+
+---
+
+## Acceptance Criteria
+
+{List all testable acceptance criteria from the story}
+
+1. {Acceptance criterion 1}
+2. {Acceptance criterion 2}
+3. {Acceptance criterion 3}
+
+---
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests ({e2e_test_count} tests)
+
+**File:** `{e2e_test_file_path}` ({line_count} lines)
+
+{List each E2E test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### API Tests ({api_test_count} tests)
+
+**File:** `{api_test_file_path}` ({line_count} lines)
+
+{List each API test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### Component Tests ({component_test_count} tests)
+
+**File:** `{component_test_file_path}` ({line_count} lines)
+
+{List each component test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+---
+
+## Data Factories Created
+
+{List all data factory files created with their exports}
+
+### {Entity} Factory
+
+**File:** `tests/support/factories/{entity}.factory.ts`
+
+**Exports:**
+
+- `create{Entity}(overrides?)` - Create single entity with optional overrides
+- `create{Entity}s(count)` - Create array of entities
+
+**Example Usage:**
+
+```typescript
+const user = createUser({ email: 'specific@example.com' });
+const users = createUsers(5); // Generate 5 random users
+```
+
+---
+
+## Fixtures Created
+
+{List all test fixture files created with their fixture names and descriptions}
+
+### {Feature} Fixtures
+
+**File:** `tests/support/fixtures/{feature}.fixture.ts`
+
+**Fixtures:**
+
+- `{fixtureName}` - {description_of_what_fixture_provides}
+  - **Setup:** {what_setup_does}
+  - **Provides:** {what_test_receives}
+  - **Cleanup:** {what_cleanup_does}
+
+**Example Usage:**
+
+```typescript
+import { test } from './fixtures/{feature}.fixture';
+
+test('should do something', async ({ {fixtureName} }) => {
+  // {fixtureName} is ready to use with auto-cleanup
+});
+```
+
+---
+
+## Mock Requirements
+
+{Document external services that need mocking and their requirements}
+
+### {Service Name} Mock
+
+**Endpoint:** `{HTTP_METHOD} {endpoint_url}`
+
+**Success Response:**
+
+```json
+{
+  {success_response_example}
+}
+```
+
+**Failure Response:**
+
+```json
+{
+  {failure_response_example}
+}
+```
+
+**Notes:** {any_special_mock_requirements}
+
+---
+
+## Required data-testid Attributes
+
+{List all data-testid attributes required in UI implementation for test stability}
+
+### {Page or Component Name}
+
+- `{data-testid-name}` - {description_of_element}
+- `{data-testid-name}` - {description_of_element}
+
+**Implementation Example:**
+
+```tsx
+<button data-testid="login-button">Log In</button>
+<input data-testid="email-input" type="email" />
+<div data-testid="error-message">{errorText}</div>
+```
+
+---
+
+## Implementation Checklist
+
+{Map each failing test to concrete implementation tasks that will make it pass}
+
+### Test: {test_name_1}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+### Test: {test_name_2}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+## Running Tests
+
+```bash
+# Run all failing tests for this story
+{test_command_all}
+
+# Run specific test file
+{test_command_specific_file}
+
+# Run tests in headed mode (see browser)
+{test_command_headed}
+
+# Debug specific test
+{test_command_debug}
+
+# Run tests with coverage
+{test_command_coverage}
+```
+
+---
+
+## Red-Green-Refactor Workflow
+
+### RED Phase (Complete) ✅
+
+**TEA Agent Responsibilities:**
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created with auto-cleanup
+- ✅ Mock requirements documented
+- ✅ data-testid requirements listed
+- ✅ Implementation checklist created
+
+**Verification:**
+
+- All tests run and fail as expected
+- Failure messages are clear and actionable
+- Tests fail due to missing implementation, not test bugs
+
+---
+
+### GREEN Phase (DEV Team - Next Steps)
+
+**DEV Agent Responsibilities:**
+
+1. **Pick one failing test** from implementation checklist (start with highest priority)
+2. **Read the test** to understand expected behavior
+3. **Implement minimal code** to make that specific test pass
+4. **Run the test** to verify it now passes (green)
+5. **Check off the task** in implementation checklist
+6. **Move to next test** and repeat
+
+**Key Principles:**
+
+- One test at a time (don't try to fix all at once)
+- Minimal implementation (don't over-engineer)
+- Run tests frequently (immediate feedback)
+- Use implementation checklist as roadmap
+
+**Progress Tracking:**
+
+- Check off tasks as you complete them
+- Share progress in daily standup
+- Mark story as IN PROGRESS in `bmm-workflow-status.md`
+
+---
+
+### REFACTOR Phase (DEV Team - After All Tests Pass)
+
+**DEV Agent Responsibilities:**
+
+1. **Verify all tests pass** (green phase complete)
+2. **Review code for quality** (readability, maintainability, performance)
+3. **Extract duplications** (DRY principle)
+4. **Optimize performance** (if needed)
+5. **Ensure tests still pass** after each refactor
+6. **Update documentation** (if API contracts change)
+
+**Key Principles:**
+
+- Tests provide safety net (refactor with confidence)
+- Make small refactors (easier to debug if tests fail)
+- Run tests after each change
+- Don't change test behavior (only implementation)
+
+**Completion:**
+
+- All tests pass
+- Code quality meets team standards
+- No duplications or code smells
+- Ready for code review and story approval
+
+---
+
+## Next Steps
+
+1. **Review this checklist** with team in standup or planning
+2. **Run failing tests** to confirm RED phase: `{test_command_all}`
+3. **Begin implementation** using implementation checklist as guide
+4. **Work one test at a time** (red → green for each)
+5. **Share progress** in daily standup
+6. **When all tests pass**, refactor code for quality
+7. **When refactoring complete**, run `bmad sm story-approved` to move story to DONE
+
+---
+
+## Knowledge Base References Applied
+
+This ATDD workflow consulted the following knowledge fragments:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's `test.extend()`
+- **data-factories.md** - Factory patterns using `@faker-js/faker` for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing
+- **network-first.md** - Route interception patterns (intercept BEFORE navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Test Execution Evidence
+
+### Initial Test Run (RED Phase Verification)
+
+**Command:** `{test_command_all}`
+
+**Results:**
+
+```
+{paste_test_run_output_showing_all_tests_failing}
+```
+
+**Summary:**
+
+- Total tests: {total_test_count}
+- Passing: 0 (expected)
+- Failing: {total_test_count} (expected)
+- Status: ✅ RED phase verified
+
+**Expected Failure Messages:**
+{list_expected_failure_messages_for_each_test}
+
+---
+
+## Notes
+
+{Any additional notes, context, or special considerations for this story}
+
+- {Note 1}
+- {Note 2}
+- {Note 3}
+
+---
+
+## Contact
+
+**Questions or Issues?**
+
+- Ask in team standup
+- Tag @{tea_agent_username} in Slack/Discord
+- Refer to `testarch/README.md` for workflow documentation
+- Consult `testarch/knowledge/` for testing best practices
+
+---
+
+**Generated by BMad TEA Agent** - {date}
--- a/src/modules/bmm/workflows/testarch/atdd/checklist.md
+++ b/src/modules/bmm/workflows/testarch/atdd/checklist.md
@@ -0,0 +1,373 @@
+# ATDD Workflow Validation Checklist
+
+Use this checklist to validate that the ATDD workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Story approved with clear acceptance criteria (AC must be testable)
+- [ ] Development sandbox/environment ready
+- [ ] Framework scaffolding exists (run `framework` workflow if missing)
+- [ ] Test framework configuration available (playwright.config.ts or cypress.config.ts)
+- [ ] Package.json has test dependencies installed (Playwright or Cypress)
+
+**Halt if missing:** Framework scaffolding or story acceptance criteria
+
+---
+
+## Step 1: Story Context and Requirements
+
+- [ ] Story markdown file loaded and parsed successfully
+- [ ] All acceptance criteria identified and extracted
+- [ ] Affected systems and components identified
+- [ ] Technical constraints documented
+- [ ] Framework configuration loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from config
+- [ ] Existing fixture patterns reviewed for consistency
+- [ ] Similar test patterns searched and found in `{test_dir}`
+- [ ] Knowledge base fragments loaded:
+  - [ ] `fixture-architecture.md`
+  - [ ] `data-factories.md`
+  - [ ] `component-tdd.md`
+  - [ ] `network-first.md`
+  - [ ] `test-quality.md`
+
+---
+
+## Step 2: Test Level Selection and Strategy
+
+- [ ] Each acceptance criterion analyzed for appropriate test level
+- [ ] Test level selection framework applied (E2E vs API vs Component vs Unit)
+- [ ] E2E tests: Critical user journeys and multi-system integration identified
+- [ ] API tests: Business logic and service contracts identified
+- [ ] Component tests: UI component behavior and interactions identified
+- [ ] Unit tests: Pure logic and edge cases identified (if applicable)
+- [ ] Duplicate coverage avoided (same behavior not tested at multiple levels unnecessarily)
+- [ ] Tests prioritized using P0-P3 framework (if test-design document exists)
+- [ ] Primary test level set in `primary_level` variable (typically E2E or API)
+- [ ] Test levels documented in ATDD checklist
+
+---
+
+## Step 3: Failing Tests Generated
+
+### Test File Structure Created
+
+- [ ] Test files organized in appropriate directories:
+  - [ ] `tests/e2e/` for end-to-end tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/support/` for infrastructure (fixtures, factories, helpers)
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] Tests use `data-testid` selectors (not CSS classes or fragile selectors)
+- [ ] One assertion per test (atomic test design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Tests fail initially (RED phase verified by local test run)
+- [ ] Failure messages are clear and actionable
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] Tests follow Given-When-Then format
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes all required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] Tests fail initially (RED phase verified)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] Tests follow Given-When-Then format
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management within component validated
+- [ ] Props and events tested
+- [ ] Tests fail initially (RED phase verified)
+
+### Test Quality Validation
+
+- [ ] All tests use Given-When-Then structure with clear comments
+- [ ] All tests have descriptive names explaining what they test
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+
+---
+
+## Step 4: Data Infrastructure Built
+
+### Data Factories Created
+
+- [ ] Factory files created in `tests/support/factories/`
+- [ ] All factories use `@faker-js/faker` for random data generation (no hardcoded values)
+- [ ] Factories support overrides for specific test scenarios
+- [ ] Factories generate complete valid objects matching API contracts
+- [ ] Helper functions for bulk creation provided (e.g., `createUsers(count)`)
+- [ ] Factory exports are properly typed (TypeScript)
+
+### Test Fixtures Created
+
+- [ ] Fixture files created in `tests/support/fixtures/`
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] Fixtures have setup phase (arrange test preconditions)
+- [ ] Fixtures provide data to tests via `await use(data)`
+- [ ] Fixtures have teardown phase with auto-cleanup (delete created data)
+- [ ] Fixtures are composable (can use other fixtures if needed)
+- [ ] Fixtures are isolated (each test gets fresh data)
+- [ ] Fixtures are type-safe (TypeScript types defined)
+
+### Mock Requirements Documented
+
+- [ ] External service mocking requirements identified
+- [ ] Mock endpoints documented with URLs and methods
+- [ ] Success response examples provided
+- [ ] Failure response examples provided
+- [ ] Mock requirements documented in ATDD checklist for DEV team
+
+### data-testid Requirements Listed
+
+- [ ] All required data-testid attributes identified from E2E tests
+- [ ] data-testid list organized by page or component
+- [ ] Each data-testid has clear description of element it targets
+- [ ] data-testid list included in ATDD checklist for DEV team
+
+---
+
+## Step 5: Implementation Checklist Created
+
+- [ ] Implementation checklist created with clear structure
+- [ ] Each failing test mapped to concrete implementation tasks
+- [ ] Tasks include:
+  - [ ] Route/component creation
+  - [ ] Business logic implementation
+  - [ ] API integration
+  - [ ] data-testid attribute additions
+  - [ ] Error handling
+  - [ ] Test execution command
+  - [ ] Completion checkbox
+- [ ] Red-Green-Refactor workflow documented in checklist
+- [ ] RED phase marked as complete (TEA responsibility)
+- [ ] GREEN phase tasks listed for DEV team
+- [ ] REFACTOR phase guidance provided
+- [ ] Execution commands provided:
+  - [ ] Run all tests: `npm run test:e2e`
+  - [ ] Run specific test file
+  - [ ] Run in headed mode
+  - [ ] Debug specific test
+- [ ] Estimated effort included (hours or story points)
+
+---
+
+## Step 6: Deliverables Generated
+
+### ATDD Checklist Document Created
+
+- [ ] Output file created at `{output_folder}/atdd-checklist-{story_id}.md`
+- [ ] Document follows template structure from `atdd-checklist-template.md`
+- [ ] Document includes all required sections:
+  - [ ] Story summary
+  - [ ] Acceptance criteria breakdown
+  - [ ] Failing tests created (paths and line counts)
+  - [ ] Data factories created
+  - [ ] Fixtures created
+  - [ ] Mock requirements
+  - [ ] Required data-testid attributes
+  - [ ] Implementation checklist
+  - [ ] Red-green-refactor workflow
+  - [ ] Execution commands
+  - [ ] Next steps for DEV team
+
+### All Tests Verified to Fail (RED Phase)
+
+- [ ] Full test suite run locally before finalizing
+- [ ] All tests fail as expected (RED phase confirmed)
+- [ ] No tests passing before implementation (if passing, test is invalid)
+- [ ] Failure messages documented in ATDD checklist
+- [ ] Failures are due to missing implementation, not test bugs
+- [ ] Test run output captured for reference
+
+### Summary Provided
+
+- [ ] Summary includes:
+  - [ ] Story ID
+  - [ ] Primary test level
+  - [ ] Test counts (E2E, API, Component)
+  - [ ] Test file paths
+  - [ ] Factory count
+  - [ ] Fixture count
+  - [ ] Mock requirements count
+  - [ ] data-testid count
+  - [ ] Implementation task count
+  - [ ] Estimated effort
+  - [ ] Next steps for DEV team
+  - [ ] Output file path
+  - [ ] Knowledge base references applied
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories and fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+
+### Knowledge Base Integration
+
+- [ ] fixture-architecture.md patterns applied to all fixtures
+- [ ] data-factories.md patterns applied to all factories
+- [ ] network-first.md patterns applied to E2E tests with network requests
+- [ ] component-tdd.md patterns applied to component tests
+- [ ] test-quality.md principles applied to all test design
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+
+---
+
+## Integration Points
+
+### With DEV Agent
+
+- [ ] ATDD checklist provides clear implementation guidance
+- [ ] Implementation tasks are granular and actionable
+- [ ] data-testid requirements are complete and clear
+- [ ] Mock requirements include all necessary details
+- [ ] Execution commands work correctly
+
+### With Story Workflow
+
+- [ ] Story ID correctly referenced in output files
+- [ ] Acceptance criteria from story accurately reflected in tests
+- [ ] Technical constraints from story considered in test design
+
+### With Framework Workflow
+
+- [ ] Test framework configuration correctly detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With test-design Workflow (If Available)
+
+- [ ] P0 scenarios from test-design prioritized in ATDD
+- [ ] Risk assessment from test-design considered in test coverage
+- [ ] Coverage strategy from test-design aligned with ATDD tests
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Story acceptance criteria analyzed** and mapped to appropriate test levels
+- [ ] **Failing tests created** at all appropriate levels (E2E, API, Component)
+- [ ] **Given-When-Then format** used consistently across all tests
+- [ ] **RED phase verified** by local test run (all tests failing as expected)
+- [ ] **Network-first pattern** applied to E2E tests with network requests
+- [ ] **Data factories created** using faker (no hardcoded test data)
+- [ ] **Fixtures created** with auto-cleanup in teardown
+- [ ] **Mock requirements documented** for external services
+- [ ] **data-testid attributes listed** for DEV team
+- [ ] **Implementation checklist created** mapping tests to code tasks
+- [ ] **Red-green-refactor workflow documented** in ATDD checklist
+- [ ] **Execution commands provided** and verified to work
+- [ ] **ATDD checklist document created** and saved to correct location
+- [ ] **Output file formatted correctly** using template structure
+- [ ] **Knowledge base references applied** and documented in summary
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: Tests pass before implementation
+
+**Problem:** A test passes even though no implementation code exists yet.
+
+**Resolution:**
+
+- Review test to ensure it's testing actual behavior, not mocked/stubbed behavior
+- Check if test is accidentally using existing functionality
+- Verify test assertions are correct and meaningful
+- Rewrite test to fail until implementation is complete
+
+### Issue: Network-first pattern not applied
+
+**Problem:** Route interception happens after navigation, causing race conditions.
+
+**Resolution:**
+
+- Move `await page.route()` calls BEFORE `await page.goto()`
+- Review `network-first.md` knowledge fragment
+- Update all E2E tests to follow network-first pattern
+
+### Issue: Hardcoded test data in tests
+
+**Problem:** Tests use hardcoded strings/numbers instead of factories.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use `faker` for all random data generation
+- Update data-factories to support all required test scenarios
+
+### Issue: Fixtures missing auto-cleanup
+
+**Problem:** Fixtures create data but don't clean it up in teardown.
+
+**Resolution:**
+
+- Add cleanup logic after `await use(data)` in fixture
+- Call deletion/cleanup functions in teardown
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests have multiple assertions
+
+**Problem:** Tests verify multiple behaviors in single test (not atomic).
+
+**Resolution:**
+
+- Split into separate tests (one assertion per test)
+- Each test should verify exactly one behavior
+- Use descriptive test names to clarify what each test verifies
+
+### Issue: Tests depend on execution order
+
+**Problem:** Tests fail when run in isolation or different order.
+
+**Resolution:**
+
+- Remove shared state between tests
+- Each test should create its own test data
+- Use fixtures for consistent setup across tests
+- Verify tests can run with `.only` flag
+
+---
+
+## Notes for TEA Agent
+
+- **Preflight halt is critical:** Do not proceed if story has no acceptance criteria or framework is missing
+- **RED phase verification is mandatory:** Tests must fail before sharing with DEV team
+- **Network-first pattern:** Route interception BEFORE navigation prevents race conditions
+- **One assertion per test:** Atomic tests provide clear failure diagnosis
+- **Auto-cleanup is non-negotiable:** Every fixture must clean up data in teardown
+- **Use knowledge base:** Load relevant fragments (fixture-architecture, data-factories, network-first, component-tdd, test-quality) for guidance
+- **Share with DEV agent:** ATDD checklist provides implementation roadmap from red to green
--- a/src/modules/bmm/workflows/testarch/atdd/instructions.md
+++ b/src/modules/bmm/workflows/testarch/atdd/instructions.md
@@ -1,43 +1,785 @@
 <!-- Powered by BMAD-CORE™ -->

-# Acceptance TDD v3.0
+# Acceptance Test-Driven Development (ATDD)

-```xml
-<task id="bmad/bmm/testarch/atdd" name="Acceptance Test Driven Development">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Story is approved with clear acceptance criteria.</i>
-    <i>- Development sandbox/environment is ready.</i>
-    <i>- Framework scaffolding exists (run `*framework` if missing).</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm each requirement above; halt if any are missing.</action>
-    </step>
-    <step n="2" title="Author Failing Acceptance Tests">
-      <action>Clarify acceptance criteria and affected systems.</action>
-      <action>Select appropriate test level (E2E/API/Component).</action>
-      <action>Create failing tests using Given-When-Then with network interception before navigation.</action>
-      <action>Build data factories and fixture stubs for required entities.</action>
-      <action>Outline mocks/fixtures infrastructure the dev team must provide.</action>
-      <action>Generate component tests for critical UI logic.</action>
-      <action>Compile an implementation checklist mapping each test to code work.</action>
-      <action>Share failing tests and checklist with the dev agent, maintaining red → green → refactor loop.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Output failing acceptance test files, component test stubs, fixture/mocks skeleton, implementation checklist, and data-testid requirements.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If acceptance criteria are ambiguous or the framework is missing, halt and request clarification/set up.</i>
-  </halt>
-  <notes>
-    <i>Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify ATDD-related fragments (fixture-architecture, data-factories, component-tdd) and load them from `knowledge/`.</i>
-    <i>Start red; one assertion per test; keep setup visible (no hidden shared state).</i>
-    <i>Remind devs to run tests before writing production code; update checklist as tests turn green.</i>
-  </notes>
-  <output>
-    <i>Failing acceptance/component test suite plus implementation checklist.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/atdd`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. This workflow creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development.
+
+**Core Principle**: Tests fail first (red phase), then guide development to green, then enable confident refactoring.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story approved with clear acceptance criteria
+- ✅ Development sandbox/environment ready
+- ✅ Framework scaffolding exists (run `framework` workflow if missing)
+- ✅ Test framework configuration available (playwright.config.ts or cypress.config.ts)
+
+---
+
+## Step 1: Load Story Context and Requirements
+
+### Actions
+
+1. **Read Story Markdown**
+   - Load story file from `{story_file}` variable
+   - Extract acceptance criteria (all testable requirements)
+   - Identify affected systems and components
+   - Note any technical constraints or dependencies
+
+2. **Load Framework Configuration**
+   - Read framework config (playwright.config.ts or cypress.config.ts)
+   - Identify test directory structure
+   - Check existing fixture patterns
+   - Note test runner capabilities
+
+3. **Load Existing Test Patterns**
+   - Search `{test_dir}` for similar tests
+   - Identify reusable fixtures and helpers
+   - Check data factory patterns
+   - Note naming conventions
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `fixture-architecture.md` - Test fixture patterns with auto-cleanup (pure function → fixture → mergeTests composition, 406 lines, 5 examples)
+   - `data-factories.md` - Factory patterns using faker (override patterns, nested factories, API seeding, 498 lines, 5 examples)
+   - `component-tdd.md` - Component test strategies (red-green-refactor, provider isolation, accessibility, visual regression, 480 lines, 4 examples)
+   - `network-first.md` - Route interception patterns (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
+   - `test-quality.md` - Test design principles (deterministic tests, isolated with cleanup, explicit assertions, length limits, execution time optimization, 658 lines, 5 examples)
+   - `test-healing-patterns.md` - Common failure patterns and healing strategies (stale selectors, race conditions, dynamic data, network errors, hard waits, 648 lines, 5 examples)
+   - `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, dynamic patterns, anti-patterns, 541 lines, 4 examples)
+   - `timing-debugging.md` - Race condition prevention and async debugging (network-first, deterministic waiting, anti-patterns, 370 lines, 3 examples)
+
+**Halt Condition:** If story has no acceptance criteria or framework is missing, HALT with message: "ATDD requires clear acceptance criteria and test framework setup"
+
+---
+
+## Step 1.5: Generation Mode Selection (NEW - Phase 2.5)
+
+### Actions
+
+1. **Detect Generation Mode**
+
+   Determine mode based on scenario complexity:
+
+   **AI Generation Mode (DEFAULT)**:
+   - Clear acceptance criteria with standard patterns
+   - Uses: AI-generated tests from requirements
+   - Appropriate for: CRUD, auth, navigation, API tests
+   - Fastest approach
+
+   **Recording Mode (OPTIONAL - Complex UI)**:
+   - Complex UI interactions (drag-drop, wizards, multi-page flows)
+   - Uses: Interactive test recording with Playwright MCP
+   - Appropriate for: Visual workflows, unclear requirements
+   - Only if config.tea_use_mcp_enhancements is true AND MCP available
+
+2. **AI Generation Mode (DEFAULT - Continue to Step 2)**
+
+   For standard scenarios:
+   - Continue with existing workflow (Step 2: Select Test Levels and Strategy)
+   - AI generates tests based on acceptance criteria from Step 1
+   - Use knowledge base patterns for test structure
+
+3. **Recording Mode (OPTIONAL - Complex UI Only)**
+
+   For complex UI scenarios AND config.tea_use_mcp_enhancements is true:
+
+   **A. Check MCP Availability**
+
+   If Playwright MCP tools are available in your IDE:
+   - Use MCP recording mode (Step 3.B)
+
+   If MCP unavailable:
+   - Fallback to AI generation mode (silent, automatic)
+   - Continue to Step 2
+
+   **B. Interactive Test Recording (MCP-Based)**
+
+   Use Playwright MCP test-generator tools:
+
+   **Setup:**
+
+   ```
+   1. Use generator_setup_page to initialize recording session
+   2. Navigate to application starting URL (from story context)
+   3. Ready to record user interactions
+   ```
+
+   **Recording Process (Per Acceptance Criterion):**
+
+   ```
+   4. Read acceptance criterion from story
+   5. Manually execute test scenario using browser_* tools:
+      - browser_navigate: Navigate to pages
+      - browser_click: Click buttons, links, elements
+      - browser_type: Fill form fields
+      - browser_select: Select dropdown options
+      - browser_check: Check/uncheck checkboxes
+   6. Add verification steps using browser_verify_* tools:
+      - browser_verify_text: Verify text content
+      - browser_verify_visible: Verify element visibility
+      - browser_verify_url: Verify URL navigation
+   7. Capture interaction log with generator_read_log
+   8. Generate test file with generator_write_test
+   9. Repeat for next acceptance criterion
+   ```
+
+   **Post-Recording Enhancement:**
+
+   ```
+   10. Review generated test code
+   11. Enhance with knowledge base patterns:
+       - Add Given-When-Then comments
+       - Replace recorded selectors with data-testid (if needed)
+       - Add network-first interception (from network-first.md)
+       - Add fixtures for auth/data setup (from fixture-architecture.md)
+       - Use factories for test data (from data-factories.md)
+   12. Verify tests fail (missing implementation)
+   13. Continue to Step 4 (Build Data Infrastructure)
+   ```
+
+   **When to Use Recording Mode:**
+   - ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+   - ✅ Visual workflows (modals, dialogs, animations)
+   - ✅ Unclear requirements (exploratory, discovering expected behavior)
+   - ✅ Multi-page flows (checkout, registration, onboarding)
+   - ❌ NOT for simple CRUD (AI generation faster)
+   - ❌ NOT for API-only tests (no UI to record)
+
+   **When to Use AI Generation (Default):**
+   - ✅ Clear acceptance criteria available
+   - ✅ Standard patterns (login, CRUD, navigation)
+   - ✅ Need many tests quickly
+   - ✅ API/backend tests (no UI interaction)
+
+4. **Proceed to Test Level Selection**
+
+   After mode selection:
+   - AI Generation: Continue to Step 2 (Select Test Levels and Strategy)
+   - Recording: Skip to Step 4 (Build Data Infrastructure) - tests already generated
+
+---
+
+## Step 2: Select Test Levels and Strategy
+
+### Actions
+
+1. **Analyze Acceptance Criteria**
+
+   For each acceptance criterion, determine:
+   - Does it require full user journey? → E2E test
+   - Does it test business logic/API contract? → API test
+   - Does it validate UI component behavior? → Component test
+   - Can it be unit tested? → Unit test
+
+2. **Apply Test Level Selection Framework**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   **E2E (End-to-End)**:
+   - Critical user journeys (login, checkout, core workflow)
+   - Multi-system integration
+   - User-facing acceptance criteria
+   - **Characteristics**: High confidence, slow execution, brittle
+
+   **API (Integration)**:
+   - Business logic validation
+   - Service contracts
+   - Data transformations
+   - **Characteristics**: Fast feedback, good balance, stable
+
+   **Component**:
+   - UI component behavior (buttons, forms, modals)
+   - Interaction testing
+   - Visual regression
+   - **Characteristics**: Fast, isolated, granular
+
+   **Unit**:
+   - Pure business logic
+   - Edge cases
+   - Error handling
+   - **Characteristics**: Fastest, most granular
+
+3. **Avoid Duplicate Coverage**
+
+   Don't test same behavior at multiple levels unless necessary:
+   - Use E2E for critical happy path only
+   - Use API tests for complex business logic variations
+   - Use component tests for UI interaction edge cases
+   - Use unit tests for pure logic edge cases
+
+4. **Prioritize Tests**
+
+   If test-design document exists, align with priority levels:
+   - P0 scenarios → Must cover in failing tests
+   - P1 scenarios → Should cover if time permits
+   - P2/P3 scenarios → Optional for this iteration
+
+**Decision Point:** Set `primary_level` variable to main test level for this story (typically E2E or API)
+
+---
+
+## Step 3: Generate Failing Tests
+
+### Actions
+
+1. **Create Test File Structure**
+
+   ```
+   tests/
+   ├── e2e/
+   │   └── {feature-name}.spec.ts        # E2E acceptance tests
+   ├── api/
+   │   └── {feature-name}.api.spec.ts    # API contract tests
+   ├── component/
+   │   └── {ComponentName}.test.tsx      # Component tests
+   └── support/
+       ├── fixtures/                      # Test fixtures
+       ├── factories/                     # Data factories
+       └── helpers/                       # Utility functions
+   ```
+
+2. **Write Failing E2E Tests (If Applicable)**
+
+   **Use Given-When-Then format:**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User Login', () => {
+     test('should display error for invalid credentials', async ({ page }) => {
+       // GIVEN: User is on login page
+       await page.goto('/login');
+
+       // WHEN: User submits invalid credentials
+       await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+       await page.fill('[data-testid="password-input"]', 'wrongpassword');
+       await page.click('[data-testid="login-button"]');
+
+       // THEN: Error message is displayed
+       await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+     });
+   });
+   ```
+
+   **Critical patterns:**
+   - One assertion per test (atomic tests)
+   - Explicit waits (no hard waits/sleeps)
+   - Network-first approach (route interception before navigation)
+   - data-testid selectors for stability
+   - Clear Given-When-Then structure
+
+3. **Apply Network-First Pattern**
+
+   **Knowledge Base Reference**: `network-first.md`
+
+   ```typescript
+   test('should load user dashboard after login', async ({ page }) => {
+     // CRITICAL: Intercept routes BEFORE navigation
+     await page.route('**/api/user', (route) =>
+       route.fulfill({
+         status: 200,
+         body: JSON.stringify({ id: 1, name: 'Test User' }),
+       }),
+     );
+
+     // NOW navigate
+     await page.goto('/dashboard');
+
+     await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+   });
+   ```
+
+4. **Write Failing API Tests (If Applicable)**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User API', () => {
+     test('POST /api/users - should create new user', async ({ request }) => {
+       // GIVEN: Valid user data
+       const userData = {
+         email: 'newuser@example.com',
+         name: 'New User',
+       };
+
+       // WHEN: Creating user via API
+       const response = await request.post('/api/users', {
+         data: userData,
+       });
+
+       // THEN: User is created successfully
+       expect(response.status()).toBe(201);
+       const body = await response.json();
+       expect(body).toMatchObject({
+         email: userData.email,
+         name: userData.name,
+         id: expect.any(Number),
+       });
+     });
+   });
+   ```
+
+5. **Write Failing Component Tests (If Applicable)**
+
+   **Knowledge Base Reference**: `component-tdd.md`
+
+   ```typescript
+   import { test, expect } from '@playwright/experimental-ct-react';
+   import { LoginForm } from './LoginForm';
+
+   test.describe('LoginForm Component', () => {
+     test('should disable submit button when fields are empty', async ({ mount }) => {
+       // GIVEN: LoginForm is mounted
+       const component = await mount(<LoginForm />);
+
+       // WHEN: Form is initially rendered
+       const submitButton = component.locator('button[type="submit"]');
+
+       // THEN: Submit button is disabled
+       await expect(submitButton).toBeDisabled();
+     });
+   });
+   ```
+
+6. **Verify Tests Fail Initially**
+
+   **Critical verification:**
+   - Run tests locally to confirm they fail
+   - Failure should be due to missing implementation, not test errors
+   - Failure messages should be clear and actionable
+   - All tests must be in RED phase before sharing with DEV
+
+**Important:** Tests MUST fail initially. If a test passes before implementation, it's not a valid acceptance test.
+
+---
+
+## Step 4: Build Data Infrastructure
+
+### Actions
+
+1. **Create Data Factories**
+
+   **Knowledge Base Reference**: `data-factories.md`
+
+   ```typescript
+   // tests/support/factories/user.factory.ts
+   import { faker } from '@faker-js/faker';
+
+   export const createUser = (overrides = {}) => ({
+     id: faker.number.int(),
+     email: faker.internet.email(),
+     name: faker.person.fullName(),
+     createdAt: faker.date.recent().toISOString(),
+     ...overrides,
+   });
+
+   export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+   ```
+
+   **Factory principles:**
+   - Use faker for random data (no hardcoded values)
+   - Support overrides for specific scenarios
+   - Generate complete valid objects
+   - Include helper functions for bulk creation
+
+2. **Create Test Fixtures**
+
+   **Knowledge Base Reference**: `fixture-architecture.md`
+
+   ```typescript
+   // tests/support/fixtures/auth.fixture.ts
+   import { test as base } from '@playwright/test';
+
+   export const test = base.extend({
+     authenticatedUser: async ({ page }, use) => {
+       // Setup: Create and authenticate user
+       const user = await createUser();
+       await page.goto('/login');
+       await page.fill('[data-testid="email"]', user.email);
+       await page.fill('[data-testid="password"]', 'password123');
+       await page.click('[data-testid="login-button"]');
+       await page.waitForURL('/dashboard');
+
+       // Provide to test
+       await use(user);
+
+       // Cleanup: Delete user
+       await deleteUser(user.id);
+     },
+   });
+   ```
+
+   **Fixture principles:**
+   - Auto-cleanup (always delete created data)
+   - Composable (fixtures can use other fixtures)
+   - Isolated (each test gets fresh data)
+   - Type-safe
+
+3. **Document Mock Requirements**
+
+   If external services need mocking, document requirements:
+
+   ```markdown
+   ### Mock Requirements for DEV Team
+
+   **Payment Gateway Mock**:
+
+   - Endpoint: `POST /api/payments`
+   - Success response: `{ status: 'success', transactionId: '123' }`
+   - Failure response: `{ status: 'failed', error: 'Insufficient funds' }`
+
+   **Email Service Mock**:
+
+   - Should not send real emails in test environment
+   - Log email contents for verification
+   ```
+
+4. **List Required data-testid Attributes**
+
+   ```markdown
+   ### Required data-testid Attributes
+
+   **Login Page**:
+
+   - `email-input` - Email input field
+   - `password-input` - Password input field
+   - `login-button` - Submit button
+   - `error-message` - Error message container
+
+   **Dashboard Page**:
+
+   - `user-name` - User name display
+   - `logout-button` - Logout button
+   ```
+
+---
+
+## Step 5: Create Implementation Checklist
+
+### Actions
+
+1. **Map Tests to Implementation Tasks**
+
+   For each failing test, create corresponding implementation task:
+
+   ```markdown
+   ## Implementation Checklist
+
+   ### Epic X - User Authentication
+
+   #### Test: User Login with Valid Credentials
+
+   - [ ] Create `/login` route
+   - [ ] Implement login form component
+   - [ ] Add email/password validation
+   - [ ] Integrate authentication API
+   - [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+   - [ ] Implement error handling
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+
+   #### Test: Display Error for Invalid Credentials
+
+   - [ ] Add error state management
+   - [ ] Display error message UI
+   - [ ] Add `data-testid="error-message"`
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+   ```
+
+2. **Include Red-Green-Refactor Guidance**
+
+   ```markdown
+   ## Red-Green-Refactor Workflow
+
+   **RED Phase** (Complete):
+
+   - ✅ All tests written and failing
+   - ✅ Fixtures and factories created
+   - ✅ Mock requirements documented
+
+   **GREEN Phase** (DEV Team):
+
+   1. Pick one failing test
+   2. Implement minimal code to make it pass
+   3. Run test to verify green
+   4. Move to next test
+   5. Repeat until all tests pass
+
+   **REFACTOR Phase** (DEV Team):
+
+   1. All tests passing (green)
+   2. Improve code quality
+   3. Extract duplications
+   4. Optimize performance
+   5. Ensure tests still pass
+   ```
+
+3. **Add Execution Commands**
+
+   ````markdown
+   ## Running Tests
+
+   ```bash
+   # Run all failing tests
+   npm run test:e2e
+
+   # Run specific test file
+   npm run test:e2e -- login.spec.ts
+
+   # Run tests in headed mode (see browser)
+   npm run test:e2e -- --headed
+
+   # Debug specific test
+   npm run test:e2e -- login.spec.ts --debug
+   ```
+   ````
+
+   ```
+
+   ```
+
+---
+
+## Step 6: Generate Deliverables
+
+### Actions
+
+1. **Create ATDD Checklist Document**
+
+   Use template structure at `{installed_path}/atdd-checklist-template.md`:
+   - Story summary
+   - Acceptance criteria breakdown
+   - Test files created (with paths)
+   - Data factories created
+   - Fixtures created
+   - Mock requirements
+   - Required data-testid attributes
+   - Implementation checklist
+   - Red-green-refactor workflow
+   - Execution commands
+
+2. **Verify All Tests Fail**
+
+   Before finalizing:
+   - Run full test suite locally
+   - Confirm all tests in RED phase
+   - Document expected failure messages
+   - Ensure failures are due to missing implementation, not test bugs
+
+3. **Write to Output File**
+
+   Save to `{output_folder}/atdd-checklist-{story_id}.md`
+
+---
+
+## Important Notes
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA responsibility):
+
+- Write failing tests first
+- Tests define expected behavior
+- Tests must fail for right reason (missing implementation)
+
+**GREEN Phase** (DEV responsibility):
+
+- Implement minimal code to pass tests
+- One test at a time
+- Don't over-engineer
+
+**REFACTOR Phase** (DEV responsibility):
+
+- Improve code quality with confidence
+- Tests provide safety net
+- Extract duplications, optimize
+
+### Given-When-Then Structure
+
+**GIVEN** (Setup):
+
+- Arrange test preconditions
+- Create necessary data
+- Navigate to starting point
+
+**WHEN** (Action):
+
+- Execute the behavior being tested
+- Single action per test
+
+**THEN** (Assertion):
+
+- Verify expected outcome
+- One assertion per test (atomic)
+
+### Network-First Testing
+
+**Critical pattern:**
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
 ```
+
+### Data Factory Best Practices
+
+**Use faker for all test data:**
+
+```typescript
+// ✅ CORRECT: Random data
+email: faker.internet.email();
+
+// ❌ WRONG: Hardcoded data (collisions, maintenance burden)
+email: 'test@example.com';
+```
+
+**Auto-cleanup principle:**
+
+- Every factory that creates data must provide cleanup
+- Fixtures automatically cleanup in teardown
+- No manual cleanup in test code
+
+### One Assertion Per Test
+
+**Atomic test design:**
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid.
+
+### Component Test Strategy
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard nav)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+### Knowledge Base Integration
+
+**Core Fragments (Auto-loaded in Step 1):**
+
+- `fixture-architecture.md` - Pure function → fixture → mergeTests patterns (406 lines, 5 examples)
+- `data-factories.md` - Factory patterns with faker, overrides, API seeding (498 lines, 5 examples)
+- `component-tdd.md` - Red-green-refactor, provider isolation, accessibility, visual regression (480 lines, 4 examples)
+- `network-first.md` - Intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
+- `test-quality.md` - Deterministic tests, cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
+- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
+- `selector-resilience.md` - Selector hierarchy (data-testid > ARIA > text > CSS), dynamic patterns, anti-patterns (541 lines, 4 examples)
+- `timing-debugging.md` - Race condition prevention, deterministic waiting, async debugging (370 lines, 3 examples)
+
+**Reference for Test Level Selection:**
+
+- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework (467 lines, 4 examples)
+
+**Manual Reference (Optional):**
+
+- Use `tea-index.csv` to find additional specialized fragments as needed
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## ATDD Complete - Tests in RED Phase
+
+**Story**: {story_id}
+**Primary Test Level**: {primary_level}
+
+**Failing Tests Created**:
+
+- E2E tests: {e2e_count} tests in {e2e_files}
+- API tests: {api_count} tests in {api_files}
+- Component tests: {component_count} tests in {component_files}
+
+**Supporting Infrastructure**:
+
+- Data factories: {factory_count} factories created
+- Fixtures: {fixture_count} fixtures with auto-cleanup
+- Mock requirements: {mock_count} services documented
+
+**Implementation Checklist**:
+
+- Total tasks: {task_count}
+- Estimated effort: {effort_estimate} hours
+
+**Required data-testid Attributes**: {data_testid_count} attributes documented
+
+**Next Steps for DEV Team**:
+
+1. Run failing tests: `npm run test:e2e`
+2. Review implementation checklist
+3. Implement one test at a time (RED → GREEN)
+4. Refactor with confidence (tests provide safety net)
+5. Share progress in daily standup
+
+**Output File**: {output_file}
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture patterns
+- Data factory patterns with faker
+- Network-first route interception
+- Component TDD strategies
+- Test quality principles
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Story acceptance criteria analyzed and mapped to tests
+- [ ] Appropriate test levels selected (E2E, API, Component)
+- [ ] All tests written in Given-When-Then format
+- [ ] All tests fail initially (RED phase verified)
+- [ ] Network-first pattern applied (route interception before navigation)
+- [ ] Data factories created with faker
+- [ ] Fixtures created with auto-cleanup
+- [ ] Mock requirements documented for DEV team
+- [ ] Required data-testid attributes listed
+- [ ] Implementation checklist created with clear tasks
+- [ ] Red-green-refactor workflow documented
+- [ ] Execution commands provided
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/atdd/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/atdd/workflow.yaml
@@ -1,25 +1,81 @@
 # Test Architect workflow: atdd
 name: testarch-atdd
-description: "Generate failing acceptance tests before implementation."
+description: "Generate failing acceptance tests before implementation using TDD red-green-refactor cycle"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/atdd"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/atdd-checklist-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Story context
+  story_file: "" # Path to story markdown with acceptance criteria
+  test_dir: "{project-root}/tests"
+  test_framework: "" # Detected from framework workflow (playwright, cypress)
+
+  # Test level selection
+  test_levels: "e2e,api,component" # Which levels to generate
+  primary_level: "e2e" # Primary test level for acceptance criteria
+  include_component_tests: true # Generate component tests for UI logic
+
+  # ATDD approach
+  start_failing: true # Tests must fail initially (red phase)
+  use_given_when_then: true # BDD-style test structure
+  network_first: true # Route interception before navigation
+  one_assertion_per_test: true # Atomic test design
+
+  # Data and fixtures
+  generate_factories: true # Create data factory stubs
+  generate_fixtures: true # Create fixture architecture
+  auto_cleanup: true # Fixtures clean up their data
+
+  # Output configuration
+  output_checklist: "{output_folder}/atdd-checklist-{story_id}.md"
+  include_data_testids: true # List required data-testid attributes
+  include_mock_requirements: true # Document mock/stub needs
+
+  # Advanced options
+  auto_load_knowledge: true # Load fixture-architecture, data-factories, component-tdd fragments
+  share_with_dev: true # Provide implementation checklist to DEV agent
+
+# Output configuration
+default_output_file: "{output_folder}/atdd-checklist-{story_id}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story markdown, framework config
+  - write_file # Create test files, checklist, factory stubs
+  - create_directory # Create test directories
+  - list_files # Find existing fixtures and helpers
+  - search_repo # Search for similar test patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+  - existing_fixtures: "Current fixture patterns for consistency"
+  - test_design: "Test design document (optional, for risk/priority context)"

 tags:
  - qa
  - atdd
  - test-architect
+  - tdd
+  - red-green-refactor

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/automate/README.md
+++ b/src/modules/bmm/workflows/testarch/automate/README.md
@@ -0,0 +1,869 @@
+# Automate Workflow
+
+Expands test automation coverage by generating comprehensive test suites at appropriate levels (E2E, API, Component, Unit) with supporting infrastructure. This workflow operates in **dual mode** - works seamlessly WITH or WITHOUT BMad artifacts.
+
+**Core Principle**: Generate prioritized, deterministic tests that avoid duplicate coverage and follow testing best practices.
+
+## Usage
+
+```bash
+bmad tea *automate
+```
+
+The TEA agent runs this workflow when:
+
+- **BMad-Integrated**: After story implementation to expand coverage beyond ATDD tests
+- **Standalone**: Point at any codebase/feature and generate tests independently ("work out of thin air")
+- **Auto-discover**: No targets specified - scans codebase for features needing tests
+
+## Inputs
+
+**Execution Modes:**
+
+1. **BMad-Integrated Mode** (story available) - OPTIONAL
+2. **Standalone Mode** (no BMad artifacts) - Direct code analysis
+3. **Auto-discover Mode** (no targets) - Scan for coverage gaps
+
+**Required Context Files:**
+
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) - REQUIRED
+
+**Optional Context (BMad-Integrated Mode):**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria (enhances coverage targeting but NOT required)
+- **Tech spec**: Technical specification (provides architectural context)
+- **Test design**: Risk/priority context (P0-P3 alignment)
+- **PRD**: Product requirements (business context)
+
+**Optional Context (Standalone Mode):**
+
+- **Source code**: Feature implementation to analyze
+- **Existing tests**: Current test suite for gap analysis
+
+**Workflow Variables:**
+
+- `standalone_mode`: Can work without BMad artifacts (default: true)
+- `story_file`: Path to story markdown (optional)
+- `target_feature`: Feature name or directory to analyze (e.g., "user-authentication" or "src/auth/")
+- `target_files`: Specific files to analyze (comma-separated paths)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `source_dir`: Source code directory (default: `{project-root}/src`)
+- `auto_discover_features`: Automatically find features needing tests (default: true)
+- `analyze_coverage`: Check existing test coverage gaps (default: true)
+- `coverage_target`: Coverage strategy - "critical-paths", "comprehensive", "selective" (default: "critical-paths")
+- `test_levels`: Which levels to generate - "e2e,api,component,unit" (default: all)
+- `avoid_duplicate_coverage`: Don't test same behavior at multiple levels (default: true)
+- `include_p0`: Include P0 critical path tests (default: true)
+- `include_p1`: Include P1 high priority tests (default: true)
+- `include_p2`: Include P2 medium priority tests (default: true)
+- `include_p3`: Include P3 low priority tests (default: false)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `network_first`: Route interception before navigation (default: true)
+- `deterministic_waits`: No hard waits or sleeps (default: true)
+- `generate_fixtures`: Create/enhance fixture architecture (default: true)
+- `generate_factories`: Create/enhance data factories (default: true)
+- `update_helpers`: Add utility functions (default: true)
+- `use_test_design`: Load test-design.md if exists (default: true)
+- `use_tech_spec`: Load tech-spec.md if exists (default: true)
+- `use_prd`: Load PRD.md if exists (default: true)
+- `update_readme`: Update test README with new specs (default: true)
+- `update_package_scripts`: Add test execution scripts (default: true)
+- `output_summary`: Path for automation summary (default: `{output_folder}/automation-summary.md`)
+- `max_test_duration`: Maximum seconds per test (default: 90)
+- `max_file_lines`: Maximum lines per test file (default: 300)
+- `require_self_cleaning`: All tests must clean up data (default: true)
+- `auto_load_knowledge`: Load relevant knowledge fragments (default: true)
+- `run_tests_after_generation`: Verify tests pass/fail as expected (default: true)
+- `auto_validate`: Run generated tests after creation (default: true) **NEW**
+- `auto_heal_failures`: Enable automatic healing (default: false, opt-in) **NEW**
+- `max_healing_iterations`: Maximum healing attempts per test (default: 3) **NEW**
+- `fail_on_unhealable`: Fail workflow if tests can't be healed (default: false) **NEW**
+- `mark_unhealable_as_fixme`: Mark unfixable tests with test.fixme() (default: true) **NEW**
+- `use_mcp_healing`: Use Playwright MCP if available (default: true) **NEW**
+- `healing_knowledge_fragments`: Healing patterns to load (default: "test-healing-patterns,selector-resilience,timing-debugging") **NEW**
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **Automation Summary** (`automation-summary.md`): Comprehensive report containing:
+  - Execution mode (BMad-Integrated, Standalone, Auto-discover)
+  - Feature analysis (source files analyzed, coverage gaps)
+  - Tests created (E2E, API, Component, Unit) with counts and paths
+  - Infrastructure created (fixtures, factories, helpers)
+  - Test execution instructions
+  - Coverage analysis (P0-P3 breakdown, coverage percentage)
+  - Definition of Done checklist
+  - Next steps and recommendations
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Critical user journeys (P0-P1)
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and contracts (P1-P2)
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI behavior (P1-P2)
+- **Unit tests** (`tests/unit/{module-name}.test.ts`): Pure logic (P2-P3)
+
+**Supporting Infrastructure:**
+
+- **Fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Setup/teardown with auto-cleanup
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Random test data using faker
+- **Helpers** (`tests/support/helpers/{utility}.ts`): Utility functions (waitFor, retry, etc.)
+
+**Documentation Updates:**
+
+- **Test README** (`tests/README.md`): Test suite overview, execution instructions, priority tagging, patterns
+- **package.json scripts**: Test execution commands (test:e2e, test:e2e:p0, test:api, etc.)
+
+**Validation Safeguards:**
+
+- All tests follow Given-When-Then format
+- All tests have priority tags ([P0], [P1], [P2], [P3])
+- All tests use data-testid selectors (stable, not CSS classes)
+- All tests are self-cleaning (fixtures with auto-cleanup)
+- No hard waits or flaky patterns (deterministic)
+- Test files under 300 lines (lean and focused)
+- Tests run under 1.5 minutes each (fast feedback)
+
+## Key Features
+
+### Dual-Mode Operation
+
+**BMad-Integrated Mode** (story available):
+
+- Uses story acceptance criteria for coverage targeting
+- Aligns with test-design risk/priority assessment
+- Expands ATDD tests with edge cases and negative paths
+- Optional - story enhances coverage but not required
+
+**Standalone Mode** (no story):
+
+- Analyzes source code independently
+- Identifies coverage gaps automatically
+- Generates tests based on code analysis
+- Works with any project (BMad or non-BMad)
+
+**Auto-discover Mode** (no targets):
+
+- Scans codebase for features needing tests
+- Prioritizes features with no coverage
+- Generates comprehensive test plan
+
+### Avoid Duplicate Coverage
+
+**Critical principle**: Don't test same behavior at multiple levels
+
+**Good coverage strategy:**
+
+- **E2E**: User can login → Dashboard loads (critical happy path only)
+- **API**: POST /auth/login returns correct status codes (variations: 200, 401, 400)
+- **Component**: LoginForm validates input (UI edge cases: empty fields, invalid format)
+- **Unit**: validateEmail() logic (pure function edge cases)
+
+**Bad coverage (duplicate):**
+
+- E2E: User can login → Dashboard loads
+- E2E: User can login with different emails → Dashboard loads (unnecessary duplication)
+- API: POST /auth/login returns 200 (already covered in E2E)
+
+Use E2E sparingly for critical paths. Use API/Component/Unit for variations and edge cases.
+
+### Healing Capabilities (NEW - Phase 2.5)
+
+**automate** automatically validates and heals test failures after generation.
+
+**Configuration**: Controlled by `config.tea_use_mcp_enhancements` (default: true)
+
+- If true + MCP available → MCP-assisted healing
+- If true + MCP unavailable → Pattern-based healing
+- If false → No healing, document failures for manual review
+
+**Constants**: Max 3 healing attempts, unfixable tests marked as `test.fixme()`
+
+**How Healing Works (Default - Pattern-Based):**
+
+TEA heals tests using pattern-based analysis by:
+
+1. **Parsing error messages** from test output logs
+2. **Matching patterns** against known failure signatures
+3. **Applying fixes** from healing knowledge fragments:
+   - `test-healing-patterns.md` - Common failure patterns (selectors, timing, data, network)
+   - `selector-resilience.md` - Selector refactoring (CSS → data-testid, nth() → filter())
+   - `timing-debugging.md` - Race condition fixes (hard waits → event-based waits)
+4. **Re-running tests** to verify fix (max 3 iterations)
+5. **Marking unfixable tests** as `test.fixme()` with detailed comments
+
+**This works well for:**
+
+- ✅ Common failure patterns (stale selectors, timing issues, dynamic data)
+- ✅ Text-based errors with clear signatures
+- ✅ Issues documented in knowledge base
+- ✅ Automated CI environments without browser access
+
+**What MCP Adds (Interactive Debugging Enhancement):**
+
+When Playwright MCP is available, TEA **additionally**:
+
+1. **Debugs failures interactively** before applying pattern-based fixes:
+   - **Pause test execution** with `playwright_test_debug_test` (step through, inspect state)
+   - **See visual failure context** with `browser_snapshot` (screenshot of failure state)
+   - **Inspect live DOM** with browser tools (find why selector doesn't match)
+   - **Analyze console logs** with `browser_console_messages` (JS errors, warnings, debug output)
+   - **Inspect network activity** with `browser_network_requests` (failed API calls, CORS errors, timeouts)
+
+2. **Enhances pattern-based fixes** with real-world data:
+   - **Pattern match identifies issue** (e.g., "stale selector")
+   - **MCP discovers actual selector** with `browser_generate_locator` from live page
+   - **TEA applies refined fix** using real DOM structure (not just pattern guess)
+   - **Verification happens in browser** (see if fix works visually)
+
+3. **Catches root causes** pattern matching might miss:
+   - **Network failures**: MCP shows 500 error on API call (not just timeout)
+   - **JS errors**: MCP shows `TypeError: undefined` in console (not just "element not found")
+   - **Timing issues**: MCP shows loading spinner still visible (not just "selector timeout")
+   - **State problems**: MCP shows modal blocking button (not just "not clickable")
+
+**Key Benefits of MCP Enhancement:**
+
+- ✅ **Pattern-based fixes** (fast, automated) **+** **MCP verification** (accurate, context-aware)
+- ✅ **Visual debugging**: See exactly what user sees when test fails
+- ✅ **DOM inspection**: Discover why selectors don't match (element missing, wrong attributes, dynamic IDs)
+- ✅ **Network visibility**: Identify API failures, slow requests, CORS issues
+- ✅ **Console analysis**: Catch JS errors that break page functionality
+- ✅ **Robust selectors**: Generate locators from actual DOM (role, text, testid hierarchy)
+- ✅ **Faster iteration**: Debug and fix in same browser session (no restart needed)
+- ✅ **Higher success rate**: MCP helps diagnose failures pattern matching can't solve
+
+**Example Enhancement Flow:**
+
+```
+1. Pattern-based healing identifies issue
+   → Error: "Locator '.submit-btn' resolved to 0 elements"
+   → Pattern match: Stale selector (CSS class)
+   → Suggested fix: Replace with data-testid
+
+2. MCP enhances diagnosis (if available)
+   → browser_snapshot shows button exists but has class ".submit-button" (not ".submit-btn")
+   → browser_generate_locator finds: button[type="submit"].submit-button
+   → browser_console_messages shows no errors
+
+3. TEA applies refined fix
+   → await page.locator('button[type="submit"]').click()
+   → (More accurate than pattern-based guess)
+```
+
+**Healing Modes:**
+
+1. **MCP-Enhanced Healing** (when Playwright MCP available):
+   - Pattern-based analysis **+** Interactive debugging
+   - Visual context with `browser_snapshot`
+   - Console log analysis with `browser_console_messages`
+   - Network inspection with `browser_network_requests`
+   - Live DOM inspection with `browser_generate_locator`
+   - Step-by-step debugging with `playwright_test_debug_test`
+
+2. **Pattern-Based Healing** (always available):
+   - Error message parsing and pattern matching
+   - Automated fixes from healing knowledge fragments
+   - Text-based analysis (no visual/DOM inspection)
+   - Works in CI without browser access
+
+**Healing Workflow:**
+
+```
+1. Generate tests → Run tests
+2. IF pass → Success ✅
+3. IF fail AND auto_heal_failures=false → Report failures ⚠️
+4. IF fail AND auto_heal_failures=true → Enter healing loop:
+   a. Identify failure pattern (selector, timing, data, network)
+   b. Apply automated fix from knowledge base
+   c. Re-run test (max 3 iterations)
+   d. IF healed → Success ✅
+   e. IF unhealable → Mark test.fixme() with detailed comment
+```
+
+**Example Healing Outcomes:**
+
+```typescript
+// ❌ Original (failing): CSS class selector
+await page.locator('.btn-primary').click();
+
+// ✅ Healed: data-testid selector
+await page.getByTestId('submit-button').click();
+
+// ❌ Original (failing): Hard wait
+await page.waitForTimeout(3000);
+
+// ✅ Healed: Network-first pattern
+await page.waitForResponse('**/api/data');
+
+// ❌ Original (failing): Hardcoded ID
+await expect(page.getByText('User 123')).toBeVisible();
+
+// ✅ Healed: Regex pattern
+await expect(page.getByText(/User \d+/)).toBeVisible();
+```
+
+**Unfixable Tests (Marked as test.fixme()):**
+
+```typescript
+test.fixme('[P1] should handle complex interaction', async ({ page }) => {
+  // FIXME: Test healing failed after 3 attempts
+  // Failure: "Locator 'button[data-action="submit"]' resolved to 0 elements"
+  // Attempted fixes:
+  //   1. Replaced with page.getByTestId('submit-button') - still failing
+  //   2. Replaced with page.getByRole('button', { name: 'Submit' }) - still failing
+  //   3. Added waitForLoadState('networkidle') - still failing
+  // Manual investigation needed: Selector may require application code changes
+  // TODO: Review with team, may need data-testid added to button component
+  // Original test code...
+});
+```
+
+**When to Enable Healing:**
+
+- ✅ Enable for greenfield projects (catch generated test issues early)
+- ✅ Enable for brownfield projects (auto-fix legacy selector patterns)
+- ❌ Disable if environment not ready (application not deployed/seeded)
+- ❌ Disable if preferring manual review of all generated tests
+
+**Healing Report Example:**
+
+```markdown
+## Test Healing Report
+
+**Auto-Heal Enabled**: true
+**Healing Mode**: Pattern-based
+**Iterations Allowed**: 3
+
+### Validation Results
+
+- **Total tests**: 10
+- **Passing**: 7
+- **Failing**: 3
+
+### Healing Outcomes
+
+**Successfully Healed (2 tests):**
+
+- `tests/e2e/login.spec.ts:15` - Stale selector (CSS class → data-testid)
+- `tests/e2e/checkout.spec.ts:42` - Race condition (added network-first interception)
+
+**Unable to Heal (1 test):**
+
+- `tests/e2e/complex-flow.spec.ts:67` - Marked as test.fixme()
+  - Requires application code changes (add data-testid to component)
+
+### Healing Patterns Applied
+
+- **Selector fixes**: 1
+- **Timing fixes**: 1
+```
+
+**Graceful Degradation:**
+
+- Healing is OPTIONAL (default: disabled)
+- Works without Playwright MCP (pattern-based fallback)
+- Unfixable tests marked clearly (not silently broken)
+- Manual investigation path documented
+
+### Recording Mode (NEW - Phase 2.5)
+
+**automate** can record complex UI interactions instead of AI generation.
+
+**Activation**: Automatic for complex UI scenarios when config.tea_use_mcp_enhancements is true and MCP available
+
+- Complex scenarios: drag-drop, wizards, multi-page flows
+- Fallback: AI generation (silent, automatic)
+
+**When to Use Recording Mode:**
+
+- ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+- ✅ Visual workflows (modals, dialogs, animations, transitions)
+- ✅ Unclear requirements (exploratory, discovering behavior)
+- ✅ Multi-page flows (checkout, registration, onboarding)
+- ❌ NOT for simple CRUD (AI generation faster)
+- ❌ NOT for API-only tests (no UI to record)
+
+**When to Use AI Generation (Default):**
+
+- ✅ Clear requirements available
+- ✅ Standard patterns (login, CRUD, navigation)
+- ✅ Need many tests quickly
+- ✅ API/backend tests (no UI interaction)
+
+**Recording Workflow (Same as atdd):**
+
+```
+1. Set generation_mode: "recording"
+2. Use generator_setup_page to init recording
+3. For each test scenario:
+   - Execute with browser_* tools (navigate, click, type, select)
+   - Add verifications with browser_verify_* tools
+   - Capture log and generate test file
+4. Enhance with knowledge base patterns:
+   - Given-When-Then structure
+   - data-testid selectors
+   - Network-first interception
+   - Fixtures/factories
+5. Validate (run tests if auto_validate enabled)
+6. Heal if needed (if auto_heal_failures enabled)
+```
+
+**Combination: Recording + Healing:**
+
+automate can use BOTH recording and healing together:
+
+- Generate tests via recording (complex flows captured interactively)
+- Run tests to validate (auto_validate)
+- Heal failures automatically (auto_heal_failures)
+
+This is particularly powerful for brownfield projects where:
+
+- Requirements unclear → Use recording to capture existing behavior
+- Application complex → Recording captures nuances AI might miss
+- Tests may fail → Healing fixes common issues automatically
+
+**Graceful Degradation:**
+
+- Recording mode is OPTIONAL (default: AI generation)
+- Requires Playwright MCP (falls back to AI if unavailable)
+- Works with or without healing enabled
+- Same quality output regardless of generation method
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- State management within component
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical - Every commit)**:
+
+- Critical user paths that must always work
+- Security-critical functionality (auth, permissions)
+- Data integrity scenarios
+- Run in pre-commit hooks or PR checks
+
+**P1 (High - PR to main)**:
+
+- Important features with high user impact
+- Integration points between systems
+- Error handling for common failures
+- Run before merging to main branch
+
+**P2 (Medium - Nightly)**:
+
+- Edge cases with moderate impact
+- Less-critical feature variations
+- Performance/load testing
+- Run in nightly CI builds
+
+**P3 (Low - On-demand)**:
+
+- Nice-to-have validations
+- Rarely-used features
+- Exploratory testing scenarios
+- Run manually or weekly
+
+**Priority tagging enables selective execution:**
+
+```bash
+npm run test:e2e:p0  # Run only P0 tests (critical paths)
+npm run test:e2e:p1  # Run P0 + P1 tests (pre-merge)
+```
+
+### Given-When-Then Test Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('[P0] should login with valid credentials and load dashboard', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits valid credentials
+  await page.fill('[data-testid="email-input"]', 'user@example.com');
+  await page.fill('[data-testid="password-input"]', 'Password123!');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: User is redirected to dashboard
+  await expect(page).toHaveURL('/dashboard');
+  await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+});
+```
+
+### One Assertion Per Test (Atomic Design)
+
+Each test verifies exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('[P0] should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('[P0] should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+test('should load user dashboard after login', async ({ page }) => {
+  // CRITICAL: Intercept routes BEFORE navigation
+  await page.route('**/api/user', (route) =>
+    route.fulfill({
+      status: 200,
+      body: JSON.stringify({ id: 1, name: 'Test User' }),
+    }),
+  );
+
+  // NOW navigate
+  await page.goto('/dashboard');
+
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+});
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+import { createUser, deleteUser } from '../factories/user.factory';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', user.password);
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user automatically
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  password: faker.internet.password(),
+  name: faker.person.fullName(),
+  role: 'user',
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+
+// API helper for cleanup
+export const deleteUser = async (userId: number) => {
+  await fetch(`/api/users/${userId}`, { method: 'DELETE' });
+};
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation and cleanup
+
+### No Page Objects
+
+**Do NOT create page object classes.** Keep tests simple and direct:
+
+```typescript
+// ✅ CORRECT: Direct test
+test('should login', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', 'user@example.com');
+  await page.click('[data-testid="login-button"]');
+  await expect(page).toHaveURL('/dashboard');
+});
+
+// ❌ WRONG: Page object abstraction
+class LoginPage {
+  async login(email, password) { ... }
+}
+```
+
+Use fixtures for setup/teardown, not page objects for actions.
+
+### Deterministic Tests Only
+
+**No flaky patterns allowed:**
+
+```typescript
+// ❌ WRONG: Hard wait
+await page.waitForTimeout(2000);
+
+// ✅ CORRECT: Explicit wait
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Conditional flow
+if (await element.isVisible()) {
+  await element.click();
+}
+
+// ✅ CORRECT: Deterministic assertion
+await expect(element).toBeVisible();
+await element.click();
+
+// ❌ WRONG: Try-catch for test logic
+try {
+  await element.click();
+} catch (e) {
+  // Test shouldn't catch errors
+}
+
+// ✅ CORRECT: Let test fail if element not found
+await element.click();
+```
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Establish test framework architecture (Playwright/Cypress config, directory structure) - REQUIRED
+- **test-design** workflow: Optional for P0-P3 priority alignment and risk assessment context (BMad-Integrated mode only)
+- **atdd** workflow: Optional - automate expands beyond ATDD tests with edge cases (BMad-Integrated mode only)
+
+**After this workflow:**
+
+- **trace** workflow: Update traceability matrix with new test coverage (Phase 1) and make quality gate decision (Phase 2)
+- **CI pipeline**: Run tests in burn-in loop to detect flaky patterns
+
+**Coordinates with:**
+
+- **DEV agent**: Tests validate implementation correctness
+- **Story workflow**: Tests cover acceptance criteria (BMad-Integrated mode only)
+
+## Important Notes
+
+### Works Out of Thin Air
+
+**automate does NOT require BMad artifacts:**
+
+- Can analyze any codebase independently
+- User can point TEA at a feature: "automate tests for src/auth/"
+- Works on non-BMad projects
+- BMad artifacts (story, tech-spec, PRD) are OPTIONAL enhancements, not requirements
+
+**Similar to:**
+
+- **framework**: Can scaffold tests on any project
+- **ci**: Can generate CI config without BMad context
+
+**Different from:**
+
+- **atdd**: REQUIRES story with acceptance criteria (halt if missing)
+- **test-design**: REQUIRES PRD/epic context (halt if missing)
+- **trace (Phase 2)**: REQUIRES test results for gate decision (halt if missing)
+
+### File Size Limits
+
+**Keep test files lean (under 300 lines):**
+
+- If file exceeds limit, split into multiple files by feature area
+- Group related tests in describe blocks
+- Extract common setup to fixtures
+
+### Quality Standards Enforced
+
+**Every test must:**
+
+- ✅ Use Given-When-Then format
+- ✅ Have clear, descriptive name with priority tag
+- ✅ One assertion per test (atomic)
+- ✅ No hard waits or sleeps
+- ✅ Use data-testid selectors (not CSS classes)
+- ✅ Self-cleaning (fixtures with auto-cleanup)
+- ✅ Deterministic (no flaky patterns)
+- ✅ Fast (under 90 seconds)
+
+**Forbidden patterns:**
+
+- ❌ Hard waits: `await page.waitForTimeout(2000)`
+- ❌ Conditional flow: `if (await element.isVisible()) { ... }`
+- ❌ Try-catch for test logic
+- ❌ Hardcoded test data (use factories with faker)
+- ❌ Page objects
+- ❌ Shared state between tests
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-levels-framework.md** - Test level selection (E2E vs API vs Component vs Unit) with characteristics and use cases
+- **test-priorities.md** - Priority classification (P0-P3) with execution timing and risk alignment
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides
+- **selective-testing.md** - Targeted test execution strategies for CI optimization
+- **ci-burn-in.md** - Flaky test detection patterns (10 iterations to catch intermittent failures)
+- **test-quality.md** - Test design principles (Given-When-Then, determinism, isolation, atomic assertions)
+
+**Healing Knowledge (If `auto_heal_failures` enabled):**
+
+- **test-healing-patterns.md** - Common failure patterns and automated fixes (selectors, timing, data, network, hard waits)
+- **selector-resilience.md** - Robust selector strategies and debugging (data-testid hierarchy, filter vs nth, anti-patterns)
+- **timing-debugging.md** - Race condition identification and deterministic wait fixes (network-first, event-based waits)
+
+See `tea-index.csv` for complete knowledge fragment mapping (22 fragments total).
+
+## Example Output
+
+### BMad-Integrated Mode
+
+````markdown
+# Automation Summary - User Authentication
+
+**Date:** 2025-10-14
+**Story:** Epic 3, Story 5
+**Coverage Target:** critical-paths
+
+## Tests Created
+
+### E2E Tests (2 tests, P0-P1)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - [P0] Login with valid credentials → Dashboard loads
+  - [P1] Display error for invalid credentials
+
+### API Tests (3 tests, P1-P2)
+
+- `tests/api/auth.api.spec.ts` (102 lines)
+  - [P1] POST /auth/login - valid credentials → 200 + token
+  - [P1] POST /auth/login - invalid credentials → 401 + error
+  - [P2] POST /auth/login - missing fields → 400 + validation
+
+### Component Tests (2 tests, P1)
+
+- `tests/component/LoginForm.test.tsx` (45 lines)
+  - [P1] Empty fields → submit button disabled
+  - [P1] Valid input → submit button enabled
+
+## Infrastructure Created
+
+- Fixtures: `tests/support/fixtures/auth.fixture.ts`
+- Factories: `tests/support/factories/user.factory.ts`
+
+## Test Execution
+
+```bash
+npm run test:e2e       # Run all tests
+npm run test:e2e:p0    # Critical paths only
+npm run test:e2e:p1    # P0 + P1 tests
+```
+````
+
+## Coverage Analysis
+
+**Total:** 7 tests (P0: 1, P1: 5, P2: 1)
+**Levels:** E2E: 2, API: 3, Component: 2
+
+✅ All acceptance criteria covered
+✅ Happy path (E2E + API)
+✅ Error cases (API)
+✅ UI validation (Component)
+
+````
+
+### Standalone Mode
+
+```markdown
+# Automation Summary - src/auth/
+
+**Date:** 2025-10-14
+**Target:** src/auth/ (standalone analysis)
+**Coverage Target:** critical-paths
+
+## Feature Analysis
+
+**Source Files Analyzed:**
+- `src/auth/login.ts`
+- `src/auth/session.ts`
+- `src/auth/validation.ts`
+
+**Existing Coverage:** 0 tests found
+
+**Coverage Gaps:**
+- ❌ No E2E tests for login flow
+- ❌ No API tests for /auth/login endpoint
+- ❌ No unit tests for validateEmail()
+
+## Tests Created
+
+{Same structure as BMad-Integrated mode}
+
+## Recommendations
+
+1. **High Priority (P0-P1):**
+   - Add E2E test for password reset flow
+   - Add API tests for token refresh endpoint
+
+2. **Medium Priority (P2):**
+   - Add unit tests for session timeout logic
+````
+
+Ready to continue?
--- a/src/modules/bmm/workflows/testarch/automate/checklist.md
+++ b/src/modules/bmm/workflows/testarch/automate/checklist.md
@@ -0,0 +1,580 @@
+# Automate Workflow Validation Checklist
+
+Use this checklist to validate that the automate workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Framework scaffolding configured (playwright.config.ts or cypress.config.ts exists)
+- [ ] Test directory structure exists (tests/ folder with subdirectories)
+- [ ] Package.json has test framework dependencies installed
+
+**Halt only if:** Framework scaffolding is completely missing (run `framework` workflow first)
+
+**Note:** BMad artifacts (story, tech-spec, PRD) are OPTIONAL - workflow can run without them
+
+---
+
+## Step 1: Execution Mode Determination and Context Loading
+
+### Mode Detection
+
+- [ ] Execution mode correctly determined:
+  - [ ] BMad-Integrated Mode (story_file variable set) OR
+  - [ ] Standalone Mode (target_feature or target_files set) OR
+  - [ ] Auto-discover Mode (no targets specified)
+
+### BMad Artifacts (If Available - OPTIONAL)
+
+- [ ] Story markdown loaded (if `{story_file}` provided)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Tech-spec.md loaded (if `{use_tech_spec}` true and file exists)
+- [ ] Test-design.md loaded (if `{use_test_design}` true and file exists)
+- [ ] PRD.md loaded (if `{use_prd}` true and file exists)
+- [ ] **Note**: Absence of BMad artifacts does NOT halt workflow
+
+### Framework Configuration
+
+- [ ] Test framework config loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from `{test_dir}`
+- [ ] Existing test patterns reviewed
+- [ ] Test runner capabilities noted (parallel execution, fixtures, etc.)
+
+### Coverage Analysis
+
+- [ ] Existing test files searched in `{test_dir}` (if `{analyze_coverage}` true)
+- [ ] Tested features vs untested features identified
+- [ ] Coverage gaps mapped (tests to source files)
+- [ ] Existing fixture and factory patterns checked
+
+### Knowledge Base Fragments Loaded
+
+- [ ] `test-levels-framework.md` - Test level selection
+- [ ] `test-priorities.md` - Priority classification (P0-P3)
+- [ ] `fixture-architecture.md` - Fixture patterns with auto-cleanup
+- [ ] `data-factories.md` - Factory patterns using faker
+- [ ] `selective-testing.md` - Targeted test execution strategies
+- [ ] `ci-burn-in.md` - Flaky test detection patterns
+- [ ] `test-quality.md` - Test design principles
+
+---
+
+## Step 2: Automation Targets Identification
+
+### Target Determination
+
+**BMad-Integrated Mode (if story available):**
+
+- [ ] Acceptance criteria mapped to test scenarios
+- [ ] Features implemented in story identified
+- [ ] Existing ATDD tests checked (if any)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+
+**Standalone Mode (if no story):**
+
+- [ ] Specific feature analyzed (if `{target_feature}` specified)
+- [ ] Specific files analyzed (if `{target_files}` specified)
+- [ ] Features auto-discovered (if `{auto_discover_features}` true)
+- [ ] Features prioritized by:
+  - [ ] No test coverage (highest priority)
+  - [ ] Complex business logic
+  - [ ] External integrations (API, database, auth)
+  - [ ] Critical user paths (login, checkout, etc.)
+
+### Test Level Selection
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] E2E tests identified: Critical user journeys, multi-system integration
+- [ ] API tests identified: Business logic, service contracts, data transformations
+- [ ] Component tests identified: UI behavior, interactions, state management
+- [ ] Unit tests identified: Pure logic, edge cases, error handling
+
+### Duplicate Coverage Avoidance
+
+- [ ] Same behavior NOT tested at multiple levels unnecessarily
+- [ ] E2E used for critical happy path only
+- [ ] API tests used for business logic variations
+- [ ] Component tests used for UI interaction edge cases
+- [ ] Unit tests used for pure logic edge cases
+
+### Priority Assignment
+
+- [ ] Test priorities assigned using `test-priorities.md` framework
+- [ ] P0 tests: Critical paths, security-critical, data integrity
+- [ ] P1 tests: Important features, integration points, error handling
+- [ ] P2 tests: Edge cases, less-critical variations, performance
+- [ ] P3 tests: Nice-to-have, rarely-used features, exploratory
+- [ ] Priority variables respected:
+  - [ ] `{include_p0}` = true (always include)
+  - [ ] `{include_p1}` = true (high priority)
+  - [ ] `{include_p2}` = true (medium priority)
+  - [ ] `{include_p3}` = false (low priority, skip by default)
+
+### Coverage Plan Created
+
+- [ ] Test coverage plan documented
+- [ ] What will be tested at each level listed
+- [ ] Priorities assigned to each test
+- [ ] Coverage strategy clear (critical-paths, comprehensive, or selective)
+
+---
+
+## Step 3: Test Infrastructure Generated
+
+### Fixture Architecture
+
+- [ ] Existing fixtures checked in `tests/support/fixtures/`
+- [ ] Fixture architecture created/enhanced (if `{generate_fixtures}` true)
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] All fixtures have auto-cleanup in teardown
+- [ ] Common fixtures created/enhanced:
+  - [ ] authenticatedUser (with auto-delete)
+  - [ ] apiRequest (authenticated client)
+  - [ ] mockNetwork (external service mocking)
+  - [ ] testDatabase (with auto-cleanup)
+
+### Data Factories
+
+- [ ] Existing factories checked in `tests/support/factories/`
+- [ ] Factory architecture created/enhanced (if `{generate_factories}` true)
+- [ ] All factories use `@faker-js/faker` for random data (no hardcoded values)
+- [ ] All factories support overrides for specific scenarios
+- [ ] Common factories created/enhanced:
+  - [ ] User factory (email, password, name, role)
+  - [ ] Product factory (name, price, SKU)
+  - [ ] Order factory (items, total, status)
+- [ ] Cleanup helpers provided (e.g., deleteUser(), deleteProduct())
+
+### Helper Utilities
+
+- [ ] Existing helpers checked in `tests/support/helpers/` (if `{update_helpers}` true)
+- [ ] Common utilities created/enhanced:
+  - [ ] waitFor (polling for complex conditions)
+  - [ ] retry (retry helper for flaky operations)
+  - [ ] testData (test data generation)
+  - [ ] assertions (custom assertion helpers)
+
+---
+
+## Step 4: Test Files Generated
+
+### Test File Structure
+
+- [ ] Test files organized correctly:
+  - [ ] `tests/e2e/` for E2E tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/unit/` for unit tests
+  - [ ] `tests/support/` for fixtures/factories/helpers
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags ([P0], [P1], [P2], [P3]) in test name
+- [ ] All tests use data-testid selectors (not CSS classes)
+- [ ] One assertion per test (atomic design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Clear Given-When-Then comments in test code
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] JWT token format validated (if auth tests)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management validated
+- [ ] Props and events tested
+
+### Unit Tests (If Applicable)
+
+- [ ] Unit test files created in `tests/unit/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Pure logic tested (no dependencies)
+- [ ] Edge cases covered
+- [ ] Error handling tested
+
+### Quality Standards Enforced
+
+- [ ] All tests use Given-When-Then format with clear comments
+- [ ] All tests have descriptive names with priority tags
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+- [ ] All tests use data-testid selectors (E2E tests)
+- [ ] No hard waits: `await page.waitForTimeout()` (forbidden)
+- [ ] No conditional flow: `if (await element.isVisible())` (forbidden)
+- [ ] No try-catch for test logic (only for cleanup)
+- [ ] No hardcoded test data (use factories with faker)
+- [ ] No page object classes (tests are direct and simple)
+- [ ] No shared state between tests
+
+### Network-First Pattern Applied
+
+- [ ] Route interception set up BEFORE navigation (E2E tests with network requests)
+- [ ] `page.route()` called before `page.goto()` to prevent race conditions
+- [ ] Network-first pattern verified in all E2E tests that make API calls
+
+---
+
+## Step 5: Test Validation and Healing (NEW - Phase 2.5)
+
+### Healing Configuration
+
+- [ ] Healing configuration checked:
+  - [ ] `{auto_validate}` setting noted (default: true)
+  - [ ] `{auto_heal_failures}` setting noted (default: false)
+  - [ ] `{max_healing_iterations}` setting noted (default: 3)
+  - [ ] `{use_mcp_healing}` setting noted (default: true)
+
+### Healing Knowledge Fragments Loaded (If Healing Enabled)
+
+- [ ] `test-healing-patterns.md` loaded (common failure patterns and fixes)
+- [ ] `selector-resilience.md` loaded (selector refactoring guide)
+- [ ] `timing-debugging.md` loaded (race condition fixes)
+
+### Test Execution and Validation
+
+- [ ] Generated tests executed (if `{auto_validate}` true)
+- [ ] Test results captured:
+  - [ ] Total tests run
+  - [ ] Passing tests count
+  - [ ] Failing tests count
+  - [ ] Error messages and stack traces captured
+
+### Healing Loop (If Enabled and Tests Failed)
+
+- [ ] Healing loop entered (if `{auto_heal_failures}` true AND tests failed)
+- [ ] For each failing test:
+  - [ ] Failure pattern identified (selector, timing, data, network, hard wait)
+  - [ ] Appropriate healing strategy applied:
+    - [ ] Stale selector → Replaced with data-testid or ARIA role
+    - [ ] Race condition → Added network-first interception or state waits
+    - [ ] Dynamic data → Replaced hardcoded values with regex/dynamic generation
+    - [ ] Network error → Added route mocking
+    - [ ] Hard wait → Replaced with event-based wait
+  - [ ] Healed test re-run to validate fix
+  - [ ] Iteration count tracked (max 3 attempts)
+
+### Unfixable Tests Handling
+
+- [ ] Tests that couldn't be healed after 3 iterations marked with `test.fixme()` (if `{mark_unhealable_as_fixme}` true)
+- [ ] Detailed comment added to test.fixme() tests:
+  - [ ] What failure occurred
+  - [ ] What healing was attempted (3 iterations)
+  - [ ] Why healing failed
+  - [ ] Manual investigation steps needed
+- [ ] Original test logic preserved in comments
+
+### Healing Report Generated
+
+- [ ] Healing report generated (if healing attempted)
+- [ ] Report includes:
+  - [ ] Auto-heal enabled status
+  - [ ] Healing mode (MCP-assisted or Pattern-based)
+  - [ ] Iterations allowed (max_healing_iterations)
+  - [ ] Validation results (total, passing, failing)
+  - [ ] Successfully healed tests (count, file:line, fix applied)
+  - [ ] Unable to heal tests (count, file:line, reason)
+  - [ ] Healing patterns applied (selector fixes, timing fixes, data fixes)
+  - [ ] Knowledge base references used
+
+---
+
+## Step 6: Documentation and Scripts Updated
+
+### Test README Updated
+
+- [ ] `tests/README.md` created or updated (if `{update_readme}` true)
+- [ ] Test suite structure overview included
+- [ ] Test execution instructions provided (all, specific files, by priority)
+- [ ] Fixture usage examples provided
+- [ ] Factory usage examples provided
+- [ ] Priority tagging convention explained ([P0], [P1], [P2], [P3])
+- [ ] How to write new tests documented
+- [ ] Common patterns documented
+- [ ] Anti-patterns documented (what to avoid)
+
+### package.json Scripts Updated
+
+- [ ] package.json scripts added/updated (if `{update_package_scripts}` true)
+- [ ] `test:e2e` script for all E2E tests
+- [ ] `test:e2e:p0` script for P0 tests only
+- [ ] `test:e2e:p1` script for P0 + P1 tests
+- [ ] `test:api` script for API tests
+- [ ] `test:component` script for component tests
+- [ ] `test:unit` script for unit tests (if applicable)
+
+### Test Suite Executed
+
+- [ ] Test suite run locally (if `{run_tests_after_generation}` true)
+- [ ] Test results captured (passing/failing counts)
+- [ ] No flaky patterns detected (tests are deterministic)
+- [ ] Setup requirements documented (if any)
+- [ ] Known issues documented (if any)
+
+---
+
+## Step 6: Automation Summary Generated
+
+### Automation Summary Document
+
+- [ ] Output file created at `{output_summary}`
+- [ ] Document includes execution mode (BMad-Integrated, Standalone, Auto-discover)
+- [ ] Feature analysis included (source files, coverage gaps) - Standalone mode
+- [ ] Tests created listed (E2E, API, Component, Unit) with counts and paths
+- [ ] Infrastructure created listed (fixtures, factories, helpers)
+- [ ] Test execution instructions provided
+- [ ] Coverage analysis included:
+  - [ ] Total test count
+  - [ ] Priority breakdown (P0, P1, P2, P3 counts)
+  - [ ] Test level breakdown (E2E, API, Component, Unit counts)
+  - [ ] Coverage percentage (if calculated)
+  - [ ] Coverage status (acceptance criteria covered, gaps identified)
+- [ ] Definition of Done checklist included
+- [ ] Next steps provided
+- [ ] Recommendations included (if Standalone mode)
+
+### Summary Provided to User
+
+- [ ] Concise summary output provided
+- [ ] Total tests created across test levels
+- [ ] Priority breakdown (P0, P1, P2, P3 counts)
+- [ ] Infrastructure counts (fixtures, factories, helpers)
+- [ ] Test execution command provided
+- [ ] Output file path provided
+- [ ] Next steps listed
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories/fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+- [ ] Tests are lean (files under {max_file_lines} lines)
+
+### Knowledge Base Integration
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] Priority classification applied (from `test-priorities.md`)
+- [ ] Fixture architecture patterns applied (from `fixture-architecture.md`)
+- [ ] Data factory patterns applied (from `data-factories.md`)
+- [ ] Selective testing strategies considered (from `selective-testing.md`)
+- [ ] Flaky test detection patterns considered (from `ci-burn-in.md`)
+- [ ] Test quality principles applied (from `test-quality.md`)
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+- [ ] No console.log or debug statements in test code
+
+---
+
+## Integration Points
+
+### With Framework Workflow
+
+- [ ] Test framework configuration detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With BMad Workflows (If Available - OPTIONAL)
+
+**With Story Workflow:**
+
+- [ ] Story ID correctly referenced in output (if story available)
+- [ ] Acceptance criteria from story reflected in tests (if story available)
+- [ ] Technical constraints from story considered (if story available)
+
+**With test-design Workflow:**
+
+- [ ] P0 scenarios from test-design prioritized (if test-design available)
+- [ ] Risk assessment from test-design considered (if test-design available)
+- [ ] Coverage strategy aligned with test-design (if test-design available)
+
+**With atdd Workflow:**
+
+- [ ] Existing ATDD tests checked (if story had ATDD workflow run)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+- [ ] No duplicate coverage with ATDD tests
+
+### With CI Pipeline
+
+- [ ] Tests can run in CI environment
+- [ ] Tests are parallelizable (no shared state)
+- [ ] Tests have appropriate timeouts
+- [ ] Tests clean up their data (no CI environment pollution)
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Execution mode determined** (BMad-Integrated, Standalone, or Auto-discover)
+- [ ] **Framework configuration loaded** and validated
+- [ ] **Coverage analysis completed** (gaps identified if analyze_coverage true)
+- [ ] **Automation targets identified** (what needs testing)
+- [ ] **Test levels selected** appropriately (E2E, API, Component, Unit)
+- [ ] **Duplicate coverage avoided** (same behavior not tested at multiple levels)
+- [ ] **Test priorities assigned** (P0, P1, P2, P3)
+- [ ] **Fixture architecture created/enhanced** with auto-cleanup
+- [ ] **Data factories created/enhanced** using faker (no hardcoded data)
+- [ ] **Helper utilities created/enhanced** (if needed)
+- [ ] **Test files generated** at appropriate levels (E2E, API, Component, Unit)
+- [ ] **Given-When-Then format used** consistently across all tests
+- [ ] **Priority tags added** to all test names ([P0], [P1], [P2], [P3])
+- [ ] **data-testid selectors used** in E2E tests (not CSS classes)
+- [ ] **Network-first pattern applied** (route interception before navigation)
+- [ ] **Quality standards enforced** (no hard waits, no flaky patterns, self-cleaning, deterministic)
+- [ ] **Test README updated** with execution instructions and patterns
+- [ ] **package.json scripts updated** with test execution commands
+- [ ] **Test suite run locally** (if run_tests_after_generation true)
+- [ ] **Tests validated** (if auto_validate enabled)
+- [ ] **Failures healed** (if auto_heal_failures enabled and tests failed)
+- [ ] **Healing report generated** (if healing attempted)
+- [ ] **Unfixable tests marked** with test.fixme() and detailed comments (if any)
+- [ ] **Automation summary created** and saved to correct location
+- [ ] **Output file formatted correctly**
+- [ ] **Knowledge base references applied** and documented (including healing fragments if used)
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data, page objects)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: BMad artifacts not found
+
+**Problem:** Story, tech-spec, or PRD files not found when variables are set.
+
+**Resolution:**
+
+- **automate does NOT require BMad artifacts** - they are OPTIONAL enhancements
+- If files not found, switch to Standalone Mode automatically
+- Analyze source code directly without BMad context
+- Continue workflow without halting
+
+### Issue: Framework configuration not found
+
+**Problem:** No playwright.config.ts or cypress.config.ts found.
+
+**Resolution:**
+
+- **HALT workflow** - framework is required
+- Message: "Framework scaffolding required. Run `bmad tea *framework` first."
+- User must run framework workflow before automate
+
+### Issue: No automation targets identified
+
+**Problem:** Neither story, target_feature, nor target_files specified, and auto-discover finds nothing.
+
+**Resolution:**
+
+- Check if source_dir variable is correct
+- Verify source code exists in project
+- Ask user to specify target_feature or target_files explicitly
+- Provide examples: `target_feature: "src/auth/"` or `target_files: "src/auth/login.ts,src/auth/session.ts"`
+
+### Issue: Duplicate coverage detected
+
+**Problem:** Same behavior tested at multiple levels (E2E + API + Component).
+
+**Resolution:**
+
+- Review test level selection framework (test-levels-framework.md)
+- Use E2E for critical happy path ONLY
+- Use API for business logic variations
+- Use Component for UI edge cases
+- Remove redundant tests that duplicate coverage
+
+### Issue: Tests have hardcoded data
+
+**Problem:** Tests use hardcoded email addresses, passwords, or other data.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use faker for all random data generation
+- Update data-factories to support all required test scenarios
+- Example: `createUser({ email: faker.internet.email() })`
+
+### Issue: Tests are flaky
+
+**Problem:** Tests fail intermittently, pass on retry.
+
+**Resolution:**
+
+- Remove all hard waits (`page.waitForTimeout()`)
+- Use explicit waits (`page.waitForSelector()`)
+- Apply network-first pattern (route interception before navigation)
+- Remove conditional flow (`if (await element.isVisible())`)
+- Ensure tests are deterministic (no race conditions)
+- Run burn-in loop (10 iterations) to detect flakiness
+
+### Issue: Fixtures don't clean up data
+
+**Problem:** Test data persists after test run, causing test pollution.
+
+**Resolution:**
+
+- Ensure all fixtures have cleanup in teardown phase
+- Cleanup happens AFTER `await use(data)`
+- Call deletion/cleanup functions (deleteUser, deleteProduct, etc.)
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests too slow
+
+**Problem:** Tests take longer than 90 seconds (max_test_duration).
+
+**Resolution:**
+
+- Remove unnecessary waits and delays
+- Use parallel execution where possible
+- Mock external services (don't make real API calls)
+- Use API tests instead of E2E for business logic
+- Optimize test data creation (use in-memory database, etc.)
+
+---
+
+## Notes for TEA Agent
+
+- **automate is flexible:** Can work with or without BMad artifacts (story, tech-spec, PRD are OPTIONAL)
+- **Standalone mode is powerful:** Analyze any codebase and generate tests independently
+- **Auto-discover mode:** Scan codebase for features needing tests when no targets specified
+- **Framework is the ONLY hard requirement:** HALT if framework config missing, otherwise proceed
+- **Avoid duplicate coverage:** E2E for critical paths only, API/Component for variations
+- **Priority tagging enables selective execution:** P0 tests run on every commit, P1 on PR, P2 nightly
+- **Network-first pattern prevents race conditions:** Route interception BEFORE navigation
+- **No page objects:** Keep tests simple, direct, and maintainable
+- **Use knowledge base:** Load relevant fragments (test-levels, test-priorities, fixture-architecture, data-factories, healing patterns) for guidance
+- **Deterministic tests only:** No hard waits, no conditional flow, no flaky patterns allowed
+- **Optional healing:** auto_heal_failures disabled by default (opt-in for automatic test healing)
+- **Graceful degradation:** Healing works without Playwright MCP (pattern-based fallback)
+- **Unfixable tests handled:** Mark with test.fixme() and detailed comments (not silently broken)
--- a/src/modules/bmm/workflows/testarch/automate/instructions.md
+++ b/src/modules/bmm/workflows/testarch/automate/instructions.md
--- a/src/modules/bmm/workflows/testarch/automate/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/automate/workflow.yaml
@@ -1,25 +1,110 @@
 # Test Architect workflow: automate
 name: testarch-automate
-description: "Expand automation coverage after implementation."
+description: "Expand test automation coverage after implementation or analyze existing codebase to generate comprehensive test suite"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/automate"
 instructions: "{installed_path}/instructions.md"
-
+validation: "{installed_path}/checklist.md"
 template: false

+# Variables and inputs
+variables:
+  # Execution mode
+  standalone_mode: true # Can work without BMad artifacts (true) or integrate with BMad (false)
+
+  # Target specification (flexible - can be story, feature, or directory)
+  story_file: "" # Path to story markdown (optional - only if BMad workflow)
+  target_feature: "" # Feature name or directory to analyze (e.g., "user-authentication" or "src/auth/")
+  target_files: "" # Specific files to analyze (comma-separated paths)
+
+  # Discovery and analysis
+  test_dir: "{project-root}/tests"
+  source_dir: "{project-root}/src"
+  auto_discover_features: true # Automatically find features needing tests
+  analyze_coverage: true # Check existing test coverage gaps
+
+  # Coverage strategy
+  coverage_target: "critical-paths" # critical-paths, comprehensive, selective
+  test_levels: "e2e,api,component,unit" # Which levels to generate (comma-separated)
+  avoid_duplicate_coverage: true # Don't test same behavior at multiple levels
+
+  # Test priorities (from test-priorities.md knowledge fragment)
+  include_p0: true # Critical paths (every commit)
+  include_p1: true # High priority (PR to main)
+  include_p2: true # Medium priority (nightly)
+  include_p3: false # Low priority (on-demand)
+
+  # Test design principles
+  use_given_when_then: true # BDD-style test structure
+  one_assertion_per_test: true # Atomic test design
+  network_first: true # Route interception before navigation
+  deterministic_waits: true # No hard waits or sleeps
+
+  # Infrastructure generation
+  generate_fixtures: true # Create/enhance fixture architecture
+  generate_factories: true # Create/enhance data factories
+  update_helpers: true # Add utility functions
+
+  # Integration with BMad artifacts (when available)
+  use_test_design: true # Load test-design.md if exists
+  use_tech_spec: true # Load tech-spec.md if exists
+  use_prd: true # Load PRD.md if exists
+
+  # Output configuration
+  update_readme: true # Update test README with new specs
+  update_package_scripts: true # Add test execution scripts
+  output_summary: "{output_folder}/automation-summary.md"
+
+  # Quality gates
+  max_test_duration: 90 # seconds (1.5 minutes per test)
+  max_file_lines: 300 # lines (keep tests lean)
+  require_self_cleaning: true # All tests must clean up data
+
+  # Advanced options
+  auto_load_knowledge: true # Load test-levels, test-priorities, fixture-architecture, selective-testing, ci-burn-in
+  run_tests_after_generation: true # Verify tests pass/fail as expected
+  auto_validate: true # Always validate generated tests
+
+# Output configuration
+default_output_file: "{output_folder}/automation-summary.md"
+
+# Required tools
+required_tools:
+  - read_file # Read source code, existing tests, BMad artifacts
+  - write_file # Create test files, fixtures, factories, summaries
+  - create_directory # Create test directories
+  - list_files # Discover features and existing tests
+  - search_repo # Find coverage gaps and patterns
+  - glob # Find test files and source files
+
+# Recommended inputs (optional - depends on mode)
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (optional - BMad mode only)"
+  - tech_spec: "Technical specification (optional - BMad mode only)"
+  - test_design: "Test design document with risk/priority (optional - BMad mode only)"
+  - source_code: "Feature implementation to analyze (required for standalone mode)"
+  - existing_tests: "Current test suite for gap analysis (always helpful)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+
 tags:
  - qa
  - automation
  - test-architect
+  - regression
+  - coverage

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/ci/README.md
+++ b/src/modules/bmm/workflows/testarch/ci/README.md
@@ -0,0 +1,493 @@
+# CI/CD Pipeline Setup Workflow
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, and artifact collection. This workflow creates platform-specific CI configuration optimized for fast feedback (< 45 min total) and reliable test execution with 20× speedup over sequential runs.
+
+## Usage
+
+```bash
+bmad tea *ci
+```
+
+The TEA agent runs this workflow when:
+
+- Test framework is configured and tests pass locally
+- Team is ready to enable continuous integration
+- Existing CI pipeline needs optimization or modernization
+- Burn-in loop is needed for flaky test detection
+
+## Inputs
+
+**Required Context Files:**
+
+- **Framework config** (playwright.config.ts, cypress.config.ts): Determines test commands and configuration
+- **package.json**: Dependencies and scripts for caching strategy
+- **.nvmrc**: Node version for CI (optional, defaults to Node 20 LTS)
+
+**Optional Context Files:**
+
+- **Existing CI config**: To update rather than create new
+- **.git/config**: For CI platform auto-detection
+
+**Workflow Variables:**
+
+- `ci_platform`: Auto-detected (github-actions/gitlab-ci/circle-ci) or explicit
+- `test_framework`: Detected from framework config (playwright/cypress)
+- `parallel_jobs`: Number of parallel shards (default: 4)
+- `burn_in_enabled`: Enable burn-in loop (default: true)
+- `burn_in_iterations`: Burn-in iterations (default: 10)
+- `selective_testing_enabled`: Run only changed tests (default: true)
+- `artifact_retention_days`: Artifact storage duration (default: 30)
+- `cache_enabled`: Enable dependency caching (default: true)
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - Platform-specific optimizations and best practices
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (<2 min)
+   - **Test**: Parallel execution with 4 shards (<10 min per shard)
+   - **Burn-In**: Flaky test detection with 10 iterations (<30 min)
+   - **Report**: Aggregate results and publish artifacts
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh`: Selective testing (run only affected tests)
+   - `scripts/ci-local.sh`: Local CI mirror for debugging
+   - `scripts/burn-in.sh`: Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md`: Pipeline guide, debugging, secrets setup
+   - `docs/ci-secrets-checklist.md`: Required secrets and configuration
+   - Inline comments in CI configuration files
+
+5. **Optimization Features**
+   - Dependency caching (npm + browser binaries): 2-5 min savings
+   - Parallel sharding: 75% time reduction
+   - Retry logic: Handles transient failures (2 retries)
+   - Failure-only artifacts: Cost-effective debugging
+
+**Performance Targets:**
+
+- Lint: <2 minutes
+- Test (per shard): <10 minutes
+- Burn-in: <30 minutes
+- **Total: <45 minutes** (20× faster than sequential)
+
+**Validation Safeguards:**
+
+- ✅ Git repository initialized
+- ✅ Local tests pass before CI setup
+- ✅ Framework configuration exists
+- ✅ CI platform accessible
+
+## Key Features
+
+### Burn-In Loop for Flaky Test Detection
+
+**Critical production pattern:**
+
+```yaml
+burn-in:
+  runs-on: ubuntu-latest
+  steps:
+    - run: |
+        for i in {1..10}; do
+          echo "🔥 Burn-in iteration $i/10"
+          npm run test:e2e || exit 1
+        done
+```
+
+**Purpose**: Runs tests 10 times to catch non-deterministic failures before they reach main branch.
+
+**When to run:**
+
+- On PRs to main/develop
+- Weekly on cron schedule
+- After test infrastructure changes
+
+**Failure threshold**: Even ONE failure → tests are flaky, must fix before merging.
+
+### Parallel Sharding
+
+**Splits tests across 4 jobs:**
+
+```yaml
+strategy:
+  matrix:
+    shard: [1, 2, 3, 4]
+steps:
+  - run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+```
+
+**Benefits:**
+
+- 75% time reduction (40 min → 10 min per shard)
+- Faster feedback on PRs
+- Configurable shard count
+
+### Smart Caching
+
+**Node modules + browser binaries:**
+
+```yaml
+- uses: actions/cache@v4
+  with:
+    path: ~/.npm
+    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+```
+
+**Benefits:**
+
+- 2-5 min savings per run
+- Consistent across builds
+- Automatic invalidation on dependency changes
+
+### Selective Testing
+
+**Run only tests affected by code changes:**
+
+```bash
+# scripts/test-changed.sh
+CHANGED_FILES=$(git diff --name-only HEAD~1)
+npm run test:e2e -- --grep="$AFFECTED_TESTS"
+```
+
+**Benefits:**
+
+- 50-80% time reduction for focused PRs
+- Faster feedback cycle
+- Full suite still runs on main branch
+
+### Failure-Only Artifacts
+
+**Upload debugging materials only on test failures:**
+
+- Traces (Playwright): 5-10 MB per test
+- Screenshots: 100-500 KB each
+- Videos: 2-5 MB per test
+- HTML reports: 1-2 MB
+
+**Benefits:**
+
+- Reduces storage costs by 90%
+- Maintains full debugging capability
+- 30-day retention default
+
+### Local CI Mirror
+
+**Debug CI failures locally:**
+
+```bash
+./scripts/ci-local.sh
+# Runs: lint → test → burn-in (3 iterations)
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same commands
+- Reduced burn-in (3 vs 10 for faster feedback)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `ci-burn-in.md` - Burn-in loop patterns and iterations
+- `selective-testing.md` - Changed test detection strategies
+- `visual-debugging.md` - Artifact collection best practices
+- `test-quality.md` - CI-specific quality criteria
+
+## Integration with Other Workflows
+
+**Before ci:**
+
+- **framework**: Sets up test infrastructure and configuration
+- **test-design** (optional): Plans test coverage strategy
+
+**After ci:**
+
+- **atdd**: Generate failing tests that run in CI
+- **automate**: Expand test coverage that CI executes
+- **trace (Phase 2)**: Use CI results for quality gate decisions
+
+**Coordinates with:**
+
+- **dev-story**: Tests run in CI after story implementation
+- **retrospective**: CI metrics inform process improvements
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds CI setup to Quality & Testing Progress section
+
+## Important Notes
+
+### CI Platform Auto-Detection
+
+**GitHub Actions** (default):
+
+- Auto-selected if `github.com` in git remote
+- Free 2000 min/month for private repos
+- Unlimited for public repos
+- `.github/workflows/test.yml`
+
+**GitLab CI**:
+
+- Auto-selected if `gitlab.com` in git remote
+- Free 400 min/month
+- `.gitlab-ci.yml`
+
+**Circle CI** / **Jenkins**:
+
+- User must specify explicitly
+- Templates provided for both
+
+### Burn-In Strategy
+
+**Iterations:**
+
+- **3**: Quick feedback (local development)
+- **10**: Standard (PR checks) ← recommended
+- **100**: High-confidence (release branches)
+
+**When to run:**
+
+- ✅ On PRs to main/develop
+- ✅ Weekly scheduled (cron)
+- ✅ After test infra changes
+- ❌ Not on every commit (too slow)
+
+**Cost-benefit:**
+
+- 30 minutes of CI time → Prevents hours of debugging flaky tests
+
+### Artifact Collection Strategy
+
+**Failure-only collection:**
+
+- Saves 90% storage costs
+- Maintains debugging capability
+- Automatic cleanup after retention period
+
+**What to collect:**
+
+- Traces: Full execution context (Playwright)
+- Screenshots: Visual evidence
+- Videos: Interaction playback
+- HTML reports: Detailed results
+- Console logs: Error messages
+
+**What NOT to collect:**
+
+- Passing test artifacts (waste of space)
+- Large binaries
+- Sensitive data (use secrets instead)
+
+### Selective Testing Trade-offs
+
+**Benefits:**
+
+- 50-80% time reduction for focused changes
+- Faster feedback loop
+- Lower CI costs
+
+**Risks:**
+
+- May miss integration issues
+- Relies on accurate change detection
+- False positives if detection is too aggressive
+
+**Mitigation:**
+
+- Always run full suite on merge to main
+- Use burn-in loop on main branch
+- Monitor for missed issues
+
+### Parallelism Configuration
+
+**4 shards** (default):
+
+- Optimal for 40-80 test files
+- ~10 min per shard
+- Balances speed vs resource usage
+
+**Adjust if:**
+
+- Tests complete in <5 min → reduce shards
+- Tests take >15 min → increase shards
+- CI limits concurrent jobs → reduce shards
+
+**Formula:**
+
+```
+Total test time / Target shard time = Optimal shards
+Example: 40 min / 10 min = 4 shards
+```
+
+### Retry Logic
+
+**2 retries** (default):
+
+- Handles transient network issues
+- Mitigates race conditions
+- Does NOT mask flaky tests (burn-in catches those)
+
+**When retries trigger:**
+
+- Network timeouts
+- Service unavailability
+- Resource constraints
+
+**When retries DON'T help:**
+
+- Assertion failures (logic errors)
+- Flaky tests (non-deterministic)
+- Configuration errors
+
+### Notification Setup (Optional)
+
+**Supported channels:**
+
+- Slack: Webhook integration
+- Email: SMTP configuration
+- Discord: Webhook integration
+
+**Configuration:**
+
+```yaml
+notify_on_failure: true
+notification_channels: 'slack'
+# Requires SLACK_WEBHOOK secret in CI settings
+```
+
+**Best practice:** Enable for main/develop branches only, not PRs.
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+- [ ] First CI run triggered and passes
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New GitHub Actions setup**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - GitHub repository (github.com in git remote)
+# - Playwright framework
+# - Node 20 from .nvmrc
+# - 60 test files
+
+# TEA scaffolds:
+# - .github/workflows/test.yml
+# - 4-shard parallel execution
+# - Burn-in loop (10 iterations)
+# - Dependency + browser caching
+# - Failure artifacts (traces, screenshots)
+# - Helper scripts
+# - Documentation
+
+# Result:
+# Total CI time: 42 minutes (was 8 hours sequential)
+# - Lint: 1.5 min
+# - Test (4 shards): 9 min each
+# - Burn-in: 28 min
+```
+
+**Scenario 2: Update existing GitLab CI**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - Existing .gitlab-ci.yml
+# - Cypress framework
+# - No caching configured
+
+# TEA asks: "Update existing CI or create new?"
+# User: "Update"
+
+# TEA enhances:
+# - Adds burn-in job
+# - Configures caching (cache: paths)
+# - Adds parallel: 4
+# - Updates artifact collection
+# - Documents secrets needed
+
+# Result:
+# CI time reduced from 45 min → 12 min
+```
+
+**Scenario 3: Standalone burn-in setup**
+
+```bash
+# User wants only burn-in, no full CI
+bmad tea *ci
+# Set burn_in_enabled: true, skip other stages
+
+# TEA creates:
+# - Minimal workflow with burn-in only
+# - scripts/burn-in.sh for local testing
+# - Documentation for running burn-in
+
+# Use case:
+# - Validate test stability before full CI setup
+# - Debug intermittent failures
+# - Confidence check before release
+```
+
+## Troubleshooting
+
+**Issue: "Git repository not found"**
+
+- **Cause**: No .git/ directory
+- **Solution**: Run `git init` and `git remote add origin <url>`
+
+**Issue: "Tests fail locally but should set up CI anyway"**
+
+- **Cause**: Workflow halts if local tests fail
+- **Solution**: Fix tests first, or temporarily skip preflight (not recommended)
+
+**Issue: "CI takes longer than 10 min per shard"**
+
+- **Cause**: Too many tests per shard
+- **Solution**: Increase shard count (e.g., 4 → 8)
+
+**Issue: "Burn-in passes locally but fails in CI"**
+
+- **Cause**: Environment differences (timing, resources)
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue: "Caching not working"**
+
+- **Cause**: Cache key mismatch or cache limit exceeded
+- **Solution**: Check cache key formula, verify platform limits
+
+## Related Workflows
+
+- **framework**: Set up test infrastructure → [framework/README.md](../framework/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand test coverage → [automate/README.md](../automate/README.md)
+- **trace**: Traceability and quality gate decisions → [trace/README.md](../trace/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, burn-in loop integration
+- **v3.x**: XML format instructions, basic CI setup
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/ci/checklist.md
+++ b/src/modules/bmm/workflows/testarch/ci/checklist.md
@@ -0,0 +1,246 @@
+# CI/CD Pipeline Setup - Validation Checklist
+
+## Prerequisites
+
+- [ ] Git repository initialized (`.git/` exists)
+- [ ] Git remote configured (`git remote -v` shows origin)
+- [ ] Test framework configured (playwright.config._ or cypress.config._)
+- [ ] Local tests pass (`npm run test:e2e` succeeds)
+- [ ] Team agrees on CI platform
+- [ ] Access to CI platform settings (if updating)
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] Git repository validated
+- [ ] Framework configuration detected
+- [ ] Local test execution successful
+- [ ] CI platform detected or selected
+- [ ] Node version identified (.nvmrc or default)
+- [ ] No blocking issues found
+
+### Step 2: CI Pipeline Configuration
+
+- [ ] CI configuration file created (`.github/workflows/test.yml` or `.gitlab-ci.yml`)
+- [ ] File is syntactically valid (no YAML errors)
+- [ ] Correct framework commands configured
+- [ ] Node version matches project
+- [ ] Test directory paths correct
+
+### Step 3: Parallel Sharding
+
+- [ ] Matrix strategy configured (4 shards default)
+- [ ] Shard syntax correct for framework
+- [ ] fail-fast set to false
+- [ ] Shard count appropriate for test suite size
+
+### Step 4: Burn-In Loop
+
+- [ ] Burn-in job created
+- [ ] 10 iterations configured
+- [ ] Proper exit on failure (`|| exit 1`)
+- [ ] Runs on appropriate triggers (PR, cron)
+- [ ] Failure artifacts uploaded
+
+### Step 5: Caching Configuration
+
+- [ ] Dependency cache configured (npm/yarn)
+- [ ] Cache key uses lockfile hash
+- [ ] Browser cache configured (Playwright/Cypress)
+- [ ] Restore-keys defined for fallback
+- [ ] Cache paths correct for platform
+
+### Step 6: Artifact Collection
+
+- [ ] Artifacts upload on failure only
+- [ ] Correct artifact paths (test-results/, traces/, etc.)
+- [ ] Retention days set (30 default)
+- [ ] Artifact names unique per shard
+- [ ] No sensitive data in artifacts
+
+### Step 7: Retry Logic
+
+- [ ] Retry action/strategy configured
+- [ ] Max attempts: 2-3
+- [ ] Timeout appropriate (30 min)
+- [ ] Retry only on transient errors
+
+### Step 8: Helper Scripts
+
+- [ ] `scripts/test-changed.sh` created
+- [ ] `scripts/ci-local.sh` created
+- [ ] `scripts/burn-in.sh` created (optional)
+- [ ] Scripts are executable (`chmod +x`)
+- [ ] Scripts use correct test commands
+- [ ] Shebang present (`#!/bin/bash`)
+
+### Step 9: Documentation
+
+- [ ] `docs/ci.md` created with pipeline guide
+- [ ] `docs/ci-secrets-checklist.md` created
+- [ ] Required secrets documented
+- [ ] Setup instructions clear
+- [ ] Troubleshooting section included
+- [ ] Badge URLs provided (optional)
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] CI file loads without errors
+- [ ] All paths resolve correctly
+- [ ] No hardcoded values (use env vars)
+- [ ] Triggers configured (push, pull_request, schedule)
+- [ ] Platform-specific syntax correct
+
+### Execution Validation
+
+- [ ] First CI run triggered (push to remote)
+- [ ] Pipeline starts without errors
+- [ ] All jobs appear in CI dashboard
+- [ ] Caching works (check logs for cache hit)
+- [ ] Tests execute in parallel
+- [ ] Artifacts collected on failure
+
+### Performance Validation
+
+- [ ] Lint stage: <2 minutes
+- [ ] Test stage (per shard): <10 minutes
+- [ ] Burn-in stage: <30 minutes
+- [ ] Total pipeline: <45 minutes
+- [ ] Cache reduces install time by 2-5 minutes
+
+## Quality Checks
+
+### Best Practices Compliance
+
+- [ ] Burn-in loop follows production patterns
+- [ ] Parallel sharding configured optimally
+- [ ] Failure-only artifact collection
+- [ ] Selective testing enabled (optional)
+- [ ] Retry logic handles transient failures only
+- [ ] No secrets in configuration files
+
+### Knowledge Base Alignment
+
+- [ ] Burn-in pattern matches `ci-burn-in.md`
+- [ ] Selective testing matches `selective-testing.md`
+- [ ] Artifact collection matches `visual-debugging.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in CI configuration
+- [ ] Secrets use platform secret management
+- [ ] Environment variables for sensitive data
+- [ ] Artifact retention appropriate (not too long)
+- [ ] No debug output exposing secrets
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] CI setup logged in Quality & Testing Progress section
+- [ ] Status updated with completion timestamp
+- [ ] Platform and configuration noted
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments loaded
+- [ ] Patterns applied from knowledge base
+- [ ] Documentation references knowledge base
+- [ ] Knowledge base references in README
+
+### Workflow Dependencies
+
+- [ ] `framework` workflow completed first
+- [ ] Can proceed to `atdd` workflow after CI setup
+- [ ] Can proceed to `automate` workflow
+- [ ] CI integrates with `gate` workflow
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] First CI run successful
+- [ ] Performance targets met
+- [ ] Documentation complete
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Commit CI configuration
+2. [ ] Push to remote repository
+3. [ ] Configure required secrets in CI platform
+4. [ ] Open PR to trigger first CI run
+5. [ ] Monitor and verify pipeline execution
+6. [ ] Adjust parallelism if needed (based on actual run times)
+7. [ ] Set up notifications (optional)
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for test generation
+2. [ ] Run `automate` workflow for coverage expansion
+3. [ ] Run `gate` workflow for quality gates
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete CI configuration file
+2. [ ] Remove helper scripts directory
+3. [ ] Remove documentation (docs/ci.md, etc.)
+4. [ ] Clear CI platform secrets (if added)
+5. [ ] Review error logs
+6. [ ] Fix issues and retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: CI file syntax errors
+
+- **Solution**: Validate YAML syntax online or with linter
+
+**Issue**: Tests fail in CI but pass locally
+
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue**: Caching not working
+
+- **Solution**: Check cache key formula, verify paths
+
+**Issue**: Burn-in too slow
+
+- **Solution**: Reduce iterations or run on cron only
+
+### Platform-Specific
+
+**GitHub Actions:**
+
+- Secrets: Repository Settings → Secrets and variables → Actions
+- Runners: Ubuntu latest recommended
+- Concurrency limits: 20 jobs for free tier
+
+**GitLab CI:**
+
+- Variables: Project Settings → CI/CD → Variables
+- Runners: Shared or project-specific
+- Pipeline quota: 400 minutes/month free tier
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Platform:** **\*\***\_\_\_**\*\*** (GitHub Actions / GitLab CI)
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/src/modules/bmm/workflows/testarch/ci/github-actions-template.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/github-actions-template.yaml
@@ -0,0 +1,165 @@
+# GitHub Actions CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+name: Test Pipeline
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+  schedule:
+    # Weekly burn-in on Sundays at 2 AM UTC
+    - cron: "0 2 * * 0"
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  # Lint stage - Code quality checks
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run linter
+        run: npm run lint
+
+  # Test stage - Parallel execution with sharding
+  test:
+    name: Test (Shard ${{ matrix.shard }})
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    needs: lint
+
+    strategy:
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+          restore-keys: |
+            ${{ runner.os }}-playwright-
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run tests (shard ${{ matrix.shard }}/4)
+        run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-${{ matrix.shard }}
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Burn-in stage - Flaky test detection
+  burn-in:
+    name: Burn-In (Flaky Detection)
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    needs: test
+    # Only run burn-in on PRs to main/develop or on schedule
+    if: github.event_name == 'pull_request' || github.event_name == 'schedule'
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run burn-in loop (10 iterations)
+        run: |
+          echo "🔥 Starting burn-in loop - detecting flaky tests"
+          for i in {1..10}; do
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            echo "🔥 Burn-in iteration $i/10"
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            npm run test:e2e || exit 1
+          done
+          echo "✅ Burn-in complete - no flaky tests detected"
+
+      - name: Upload burn-in failure artifacts
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: burn-in-failures
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Report stage - Aggregate and publish results
+  report:
+    name: Test Report
+    runs-on: ubuntu-latest
+    needs: [test, burn-in]
+    if: always()
+
+    steps:
+      - name: Download all artifacts
+        uses: actions/download-artifact@v4
+        with:
+          path: artifacts
+
+      - name: Generate summary
+        run: |
+          echo "## Test Execution Summary" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "- **Status**: ${{ needs.test.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Burn-in**: ${{ needs.burn-in.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Shards**: 4" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+
+          if [ "${{ needs.burn-in.result }}" == "failure" ]; then
+            echo "⚠️ **Flaky tests detected** - Review burn-in artifacts" >> $GITHUB_STEP_SUMMARY
+          fi
--- a/src/modules/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
@@ -0,0 +1,128 @@
+# GitLab CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+stages:
+  - lint
+  - test
+  - burn-in
+  - report
+
+variables:
+  # Disable git depth for accurate change detection
+  GIT_DEPTH: 0
+  # Use npm ci for faster, deterministic installs
+  npm_config_cache: "$CI_PROJECT_DIR/.npm"
+  # Playwright browser cache
+  PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.cache/ms-playwright"
+
+# Caching configuration
+cache:
+  key:
+    files:
+      - package-lock.json
+  paths:
+    - .npm/
+    - .cache/ms-playwright/
+    - node_modules/
+
+# Lint stage - Code quality checks
+lint:
+  stage: lint
+  image: node:20
+  script:
+    - npm ci
+    - npm run lint
+  timeout: 5 minutes
+
+# Test stage - Parallel execution with sharding
+.test-template: &test-template
+  stage: test
+  image: node:20
+  needs:
+    - lint
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 30 minutes
+
+test:shard-1:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=1/4
+
+test:shard-2:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=2/4
+
+test:shard-3:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=3/4
+
+test:shard-4:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=4/4
+
+# Burn-in stage - Flaky test detection
+burn-in:
+  stage: burn-in
+  image: node:20
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+  # Only run burn-in on merge requests to main/develop or on schedule
+  rules:
+    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
+    - if: '$CI_PIPELINE_SOURCE == "schedule"'
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  script:
+    - |
+      echo "🔥 Starting burn-in loop - detecting flaky tests"
+      for i in {1..10}; do
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        echo "🔥 Burn-in iteration $i/10"
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        npm run test:e2e || exit 1
+      done
+      echo "✅ Burn-in complete - no flaky tests detected"
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 60 minutes
+
+# Report stage - Aggregate results
+report:
+  stage: report
+  image: alpine:latest
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+    - burn-in
+  when: always
+  script:
+    - |
+      echo "## Test Execution Summary"
+      echo ""
+      echo "- Pipeline: $CI_PIPELINE_ID"
+      echo "- Shards: 4"
+      echo "- Branch: $CI_COMMIT_REF_NAME"
+      echo ""
+      echo "View detailed results in job artifacts"
--- a/src/modules/bmm/workflows/testarch/ci/instructions.md
+++ b/src/modules/bmm/workflows/testarch/ci/instructions.md
@@ -1,43 +1,517 @@
 <!-- Powered by BMAD-CORE™ -->

-# CI/CD Enablement v3.0
+# CI/CD Pipeline Setup

-```xml
-<task id="bmad/bmm/testarch/ci" name="CI/CD Enablement">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Git repository is initialized.</i>
-    <i>- Local test suite passes.</i>
-    <i>- Team agrees on target environments.</i>
-    <i>- Access to CI platform settings/secrets is available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm all items above; halt if prerequisites are unmet.</action>
-    </step>
-    <step n="2" title="Configure Pipeline">
-      <action>Detect CI platform (default GitHub Actions; ask about GitLab/CircleCI/etc.).</action>
-      <action>Scaffold workflow (e.g., `.github/workflows/test.yml`) with appropriate triggers and caching (Node version from `.nvmrc`, browsers).</action>
-      <action>Stage jobs sequentially (lint → unit → component → e2e) with matrix parallelization (shard by file, not test).</action>
-      <action>Add selective execution script(s) for affected tests plus burn-in job rerunning changed specs 3x to catch flakiness.</action>
-      <action>Attach artifacts on failure (traces/videos/HAR) and configure retries/backoff/concurrency controls.</action>
-      <action>Document required secrets/environment variables and wire Slack/email notifications; provide local mirror script.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce workflow file(s), helper scripts (`test-changed`, burn-in), README/ci.md updates, secrets checklist, and any dashboard/badge configuration.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If git repo is absent, tests fail, or CI platform is unspecified, halt and request setup.</i>
-  </halt>
-  <notes>
-    <i>Use `{project-root}/bmad/bmm/testarch/tea-index.csv` to load CI-focused fragments (ci-burn-in, selective-testing, visual-debugging) before finalising recommendations.</i>
-    <i>Target ~20× speedups via parallel shards and caching; keep jobs under 10 minutes.</i>
-    <i>Use `wait-on-timeout` ≈120s for app startup; ensure local `npm test` mirrors CI run.</i>
-    <i>Mention alternative platform paths when not on GitHub.</i>
-  </notes>
-  <output>
-    <i>CI pipeline configuration and guidance ready for team adoption.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/ci`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, artifact collection, and notification configuration. This workflow creates platform-specific CI configuration optimized for fast feedback and reliable test execution.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Git repository is initialized (`.git/` directory exists)
+- ✅ Local test suite passes (`npm run test:e2e` succeeds)
+- ✅ Test framework is configured (from `framework` workflow)
+- ✅ Team agrees on target CI platform (GitHub Actions, GitLab CI, Circle CI, etc.)
+- ✅ Access to CI platform settings/secrets available (if updating existing pipeline)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Verify Git Repository**
+   - Check for `.git/` directory
+   - Confirm remote repository configured (`git remote -v`)
+   - If not initialized, HALT with message: "Git repository required for CI/CD setup"
+
+2. **Validate Test Framework**
+   - Look for `playwright.config.*` or `cypress.config.*`
+   - Read framework configuration to extract:
+     - Test directory location
+     - Test command
+     - Reporter configuration
+     - Timeout settings
+   - If not found, HALT with message: "Run `framework` workflow first to set up test infrastructure"
+
+3. **Run Local Tests**
+   - Execute `npm run test:e2e` (or equivalent from package.json)
+   - Ensure tests pass before CI setup
+   - If tests fail, HALT with message: "Fix failing tests before setting up CI/CD"
+
+4. **Detect CI Platform**
+   - Check for existing CI configuration:
+     - `.github/workflows/*.yml` (GitHub Actions)
+     - `.gitlab-ci.yml` (GitLab CI)
+     - `.circleci/config.yml` (Circle CI)
+     - `Jenkinsfile` (Jenkins)
+   - If found, ask user: "Update existing CI configuration or create new?"
+   - If not found, detect platform from git remote:
+     - `github.com` → GitHub Actions (default)
+     - `gitlab.com` → GitLab CI
+     - Ask user if unable to auto-detect
+
+5. **Read Environment Configuration**
+   - Check for `.nvmrc` to determine Node version
+   - Default to Node 20 LTS if not found
+   - Read `package.json` to identify dependencies (affects caching strategy)
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold CI Pipeline
+
+### Actions
+
+1. **Select CI Platform Template**
+
+   Based on detection or user preference, use the appropriate template:
+
+   **GitHub Actions** (`.github/workflows/test.yml`):
+   - Most common platform
+   - Excellent caching and matrix support
+   - Free for public repos, generous free tier for private
+
+   **GitLab CI** (`.gitlab-ci.yml`):
+   - Integrated with GitLab
+   - Built-in registry and runners
+   - Powerful pipeline features
+
+   **Circle CI** (`.circleci/config.yml`):
+   - Fast execution with parallelism
+   - Docker-first approach
+   - Enterprise features
+
+   **Jenkins** (`Jenkinsfile`):
+   - Self-hosted option
+   - Maximum customization
+   - Requires infrastructure management
+
+2. **Generate Pipeline Configuration**
+
+   Use templates from `{installed_path}/` directory:
+   - `github-actions-template.yml`
+   - `gitlab-ci-template.yml`
+
+   **Key pipeline stages:**
+
+   ```yaml
+   stages:
+     - lint # Code quality checks
+     - test # Test execution (parallel shards)
+     - burn-in # Flaky test detection
+     - report # Aggregate results and publish
+   ```
+
+3. **Configure Test Execution**
+
+   **Parallel Sharding:**
+
+   ```yaml
+   strategy:
+     fail-fast: false
+     matrix:
+       shard: [1, 2, 3, 4]
+
+   steps:
+     - name: Run tests
+       run: npm run test:e2e -- --shard=${{ matrix.shard }}/${{ strategy.job-total }}
+   ```
+
+   **Purpose:** Splits tests into N parallel jobs for faster execution (target: <10 min per shard)
+
+4. **Add Burn-In Loop**
+
+   **Critical pattern from production systems:**
+
+   ```yaml
+   burn-in:
+     name: Flaky Test Detection
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@v4
+
+       - name: Setup Node
+         uses: actions/setup-node@v4
+         with:
+           node-version-file: '.nvmrc'
+
+       - name: Install dependencies
+         run: npm ci
+
+       - name: Run burn-in loop (10 iterations)
+         run: |
+           for i in {1..10}; do
+             echo "🔥 Burn-in iteration $i/10"
+             npm run test:e2e || exit 1
+           done
+
+       - name: Upload failure artifacts
+         if: failure()
+         uses: actions/upload-artifact@v4
+         with:
+           name: burn-in-failures
+           path: test-results/
+           retention-days: 30
+   ```
+
+   **Purpose:** Runs tests multiple times to catch non-deterministic failures before they reach main branch.
+
+   **When to run:**
+   - On pull requests to main/develop
+   - Weekly on cron schedule
+   - After significant test infrastructure changes
+
+5. **Configure Caching**
+
+   **Node modules cache:**
+
+   ```yaml
+   - name: Cache dependencies
+     uses: actions/cache@v4
+     with:
+       path: ~/.npm
+       key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+       restore-keys: |
+         ${{ runner.os }}-node-
+   ```
+
+   **Browser binaries cache (Playwright):**
+
+   ```yaml
+   - name: Cache Playwright browsers
+     uses: actions/cache@v4
+     with:
+       path: ~/.cache/ms-playwright
+       key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+   ```
+
+   **Purpose:** Reduces CI execution time by 2-5 minutes per run.
+
+6. **Configure Artifact Collection**
+
+   **Failure artifacts only:**
+
+   ```yaml
+   - name: Upload test results
+     if: failure()
+     uses: actions/upload-artifact@v4
+     with:
+       name: test-results-${{ matrix.shard }}
+       path: |
+         test-results/
+         playwright-report/
+       retention-days: 30
+   ```
+
+   **Artifacts to collect:**
+   - Traces (Playwright) - full debugging context
+   - Screenshots - visual evidence of failures
+   - Videos - interaction playback
+   - HTML reports - detailed test results
+   - Console logs - error messages and warnings
+
+7. **Add Retry Logic**
+
+   ```yaml
+   - name: Run tests with retries
+     uses: nick-invision/retry@v2
+     with:
+       timeout_minutes: 30
+       max_attempts: 3
+       retry_on: error
+       command: npm run test:e2e
+   ```
+
+   **Purpose:** Handles transient failures (network issues, race conditions)
+
+8. **Configure Notifications** (Optional)
+
+   If `notify_on_failure` is enabled:
+
+   ```yaml
+   - name: Notify on failure
+     if: failure()
+     uses: 8398a7/action-slack@v3
+     with:
+       status: ${{ job.status }}
+       text: 'Test failures detected in PR #${{ github.event.pull_request.number }}'
+       webhook_url: ${{ secrets.SLACK_WEBHOOK }}
+   ```
+
+9. **Generate Helper Scripts**
+
+   **Selective testing script** (`scripts/test-changed.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Run only tests for changed files
+
+   CHANGED_FILES=$(git diff --name-only HEAD~1)
+
+   if echo "$CHANGED_FILES" | grep -q "src/.*\.ts$"; then
+     echo "Running affected tests..."
+     npm run test:e2e -- --grep="$(echo $CHANGED_FILES | sed 's/src\///g' | sed 's/\.ts//g')"
+   else
+     echo "No test-affecting changes detected"
+   fi
+   ```
+
+   **Local mirror script** (`scripts/ci-local.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Mirror CI execution locally for debugging
+
+   echo "🔍 Running CI pipeline locally..."
+
+   # Lint
+   npm run lint || exit 1
+
+   # Tests
+   npm run test:e2e || exit 1
+
+   # Burn-in (reduced iterations)
+   for i in {1..3}; do
+     echo "🔥 Burn-in $i/3"
+     npm run test:e2e || exit 1
+   done
+
+   echo "✅ Local CI pipeline passed"
+   ```
+
+10. **Generate Documentation**
+
+    **CI README** (`docs/ci.md`):
+    - Pipeline stages and purpose
+    - How to run locally
+    - Debugging failed CI runs
+    - Secrets and environment variables needed
+    - Notification setup
+    - Badge URLs for README
+
+    **Secrets checklist** (`docs/ci-secrets-checklist.md`):
+    - Required secrets list (SLACK_WEBHOOK, etc.)
+    - Where to configure in CI platform
+    - Security best practices
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - `.circleci/config.yml` (Circle CI)
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (ESLint, Prettier)
+   - **Test**: Parallel test execution (4 shards)
+   - **Burn-in**: Flaky test detection (10 iterations)
+   - **Report**: Result aggregation and publishing
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh` - Selective testing
+   - `scripts/ci-local.sh` - Local CI mirror
+   - `scripts/burn-in.sh` - Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md` - CI pipeline guide
+   - `docs/ci-secrets-checklist.md` - Required secrets
+   - Inline comments in CI configuration
+
+5. **Optimization Features**
+   - Dependency caching (npm, browser binaries)
+   - Parallel sharding (4 jobs default)
+   - Retry logic (2 retries on failure)
+   - Failure-only artifact upload
+
+### Performance Targets
+
+- **Lint stage**: <2 minutes
+- **Test stage** (per shard): <10 minutes
+- **Burn-in stage**: <30 minutes (10 iterations)
+- **Total pipeline**: <45 minutes
+
+**Speedup:** 20× faster than sequential execution through parallelism and caching.
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `ci-burn-in.md` - Burn-in loop patterns: 10-iteration detection, GitHub Actions workflow, shard orchestration, selective execution (678 lines, 4 examples)
+- `selective-testing.md` - Changed test detection strategies: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
+- `visual-debugging.md` - Artifact collection best practices: trace viewer, HAR recording, custom artifacts, accessibility integration (522 lines, 5 examples)
+- `test-quality.md` - CI-specific test quality criteria: deterministic tests, isolated with cleanup, explicit assertions, length/time optimization (658 lines, 5 examples)
+- `playwright-config.md` - CI-optimized configuration: parallelization, artifact output, project dependencies, sharding (722 lines, 5 examples)
+
+### CI Platform-Specific Guidance
+
+**GitHub Actions:**
+
+- Use `actions/cache` for caching
+- Matrix strategy for parallelism
+- Secrets in repository settings
+- Free 2000 minutes/month for private repos
+
+**GitLab CI:**
+
+- Use `.gitlab-ci.yml` in root
+- `cache:` directive for caching
+- Parallel execution with `parallel: 4`
+- Variables in project CI/CD settings
+
+**Circle CI:**
+
+- Use `.circleci/config.yml`
+- Docker executors recommended
+- Parallelism with `parallelism: 4`
+- Context for shared secrets
+
+### Burn-In Loop Strategy
+
+**When to run:**
+
+- ✅ On PRs to main/develop branches
+- ✅ Weekly on schedule (cron)
+- ✅ After test infrastructure changes
+- ❌ Not on every commit (too slow)
+
+**Iterations:**
+
+- **10 iterations** for thorough detection
+- **3 iterations** for quick feedback
+- **100 iterations** for high-confidence stability
+
+**Failure threshold:**
+
+- Even ONE failure in burn-in → tests are flaky
+- Must fix before merging
+
+### Artifact Retention
+
+**Failure artifacts only:**
+
+- Saves storage costs
+- Maintains debugging capability
+- 30-day retention default
+
+**Artifact types:**
+
+- Traces (Playwright) - 5-10 MB per test
+- Screenshots - 100-500 KB per screenshot
+- Videos - 2-5 MB per test
+- HTML reports - 1-2 MB per run
+
+### Selective Testing
+
+**Detect changed files:**
+
+```bash
+git diff --name-only HEAD~1
 ```
+
+**Run affected tests only:**
+
+- Faster feedback for small changes
+- Full suite still runs on main branch
+- Reduces CI time by 50-80% for focused PRs
+
+**Trade-off:**
+
+- May miss integration issues
+- Run full suite at least on merge
+
+### Local CI Mirror
+
+**Purpose:** Debug CI failures locally
+
+**Usage:**
+
+```bash
+./scripts/ci-local.sh
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same test command
+- Same stages (lint → test → burn-in)
+- Reduced burn-in iterations (3 vs 10)
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## CI/CD Pipeline Complete
+
+**Platform**: GitHub Actions (or GitLab CI, etc.)
+
+**Artifacts Created**:
+
+- ✅ Pipeline configuration: .github/workflows/test.yml
+- ✅ Burn-in loop: 10 iterations for flaky detection
+- ✅ Parallel sharding: 4 jobs for fast execution
+- ✅ Caching: Dependencies + browser binaries
+- ✅ Artifact collection: Failure-only traces/screenshots/videos
+- ✅ Helper scripts: test-changed.sh, ci-local.sh, burn-in.sh
+- ✅ Documentation: docs/ci.md, docs/ci-secrets-checklist.md
+
+**Performance:**
+
+- Lint: <2 min
+- Test (per shard): <10 min
+- Burn-in: <30 min
+- Total: <45 min (20× speedup vs sequential)
+
+**Next Steps**:
+
+1. Commit CI configuration: `git add .github/workflows/test.yml && git commit -m "ci: add test pipeline"`
+2. Push to remote: `git push`
+3. Configure required secrets in CI platform settings (see docs/ci-secrets-checklist.md)
+4. Open a PR to trigger first CI run
+5. Monitor pipeline execution and adjust parallelism if needed
+
+**Knowledge Base References Applied**:
+
+- Burn-in loop pattern (ci-burn-in.md)
+- Selective testing strategy (selective-testing.md)
+- Artifact collection (visual-debugging.md)
+- Test quality criteria (test-quality.md)
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable (`chmod +x`)
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/ci/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/workflow.yaml
@@ -1,25 +1,89 @@
 # Test Architect workflow: ci
 name: testarch-ci
-description: "Scaffold or update the CI/CD quality pipeline."
+description: "Scaffold CI/CD quality pipeline with test execution, burn-in loops, and artifact collection"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/ci"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"

-template: false
+# Variables and inputs
+variables:
+  ci_platform: "auto" # auto, github-actions, gitlab-ci, circle-ci, jenkins
+  test_framework: "" # Detected from framework workflow (playwright, cypress)
+  test_dir: "{project-root}/tests"
+  config_file: "" # Framework config file path
+  node_version_source: "{project-root}/.nvmrc" # Node version for CI
+
+  # Execution configuration
+  parallel_jobs: 4 # Number of parallel test shards
+  burn_in_enabled: true # Enable burn-in loop for flaky test detection
+  burn_in_iterations: 10 # Number of burn-in iterations
+  selective_testing_enabled: true # Enable changed test detection
+
+  # Artifact configuration
+  artifact_retention_days: 30
+  upload_artifacts_on: "failure" # failure, always, never
+  artifact_types: "traces,screenshots,videos,html-report" # Comma-separated
+
+  # Performance tuning
+  cache_enabled: true # Enable dependency caching
+  browser_cache_enabled: true # Cache browser binaries
+  timeout_minutes: 60 # Overall job timeout
+  test_timeout_minutes: 30 # Individual test run timeout
+
+  # Notification configuration
+  notify_on_failure: false # Enable notifications (requires setup)
+  notification_channels: "" # slack, email, discord
+
+  # Output artifacts
+  generate_ci_readme: true
+  generate_local_mirror_script: true
+  generate_secrets_checklist: true
+
+  # CI-specific optimizations
+  use_matrix_strategy: true # Parallel execution across OS/browsers
+  use_sharding: true # Split tests into shards
+  retry_failed_tests: true
+  retry_count: 2
+
+# Output configuration
+default_output_file: "{project-root}/.github/workflows/test.yml" # GitHub Actions default
+
+# Required tools
+required_tools:
+  - read_file # Read .nvmrc, package.json, framework config
+  - write_file # Create CI config, scripts, documentation
+  - create_directory # Create .github/workflows/ or .gitlab-ci/ directories
+  - list_files # Detect existing CI configuration
+  - search_repo # Find test files for selective testing
+
+# Recommended inputs
+recommended_inputs:
+  - framework_config: "Framework configuration (playwright.config.ts, cypress.config.ts)"
+  - package_json: "Project dependencies and scripts"
+  - nvmrc: ".nvmrc for Node version (optional, defaults to LTS)"
+  - existing_ci: "Existing CI configuration to update (optional)"
+  - git_info: "Git repository information for platform detection"

 tags:
  - qa
  - ci-cd
  - test-architect
+  - pipeline
+  - automation

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts, auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/framework/README.md
+++ b/src/modules/bmm/workflows/testarch/framework/README.md
@@ -0,0 +1,340 @@
+# Test Framework Setup Workflow
+
+Initializes a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and industry best practices. This workflow scaffolds the complete testing infrastructure for modern web applications, providing a robust foundation for test automation.
+
+## Usage
+
+```bash
+bmad tea *framework
+```
+
+The TEA agent runs this workflow when:
+
+- Starting a new project that needs test infrastructure
+- Migrating from an older testing approach
+- Setting up testing from scratch
+- Standardizing test architecture across teams
+
+## Inputs
+
+**Required Context Files:**
+
+- **package.json**: Project dependencies and scripts to detect project type and bundler
+
+**Optional Context Files:**
+
+- **Architecture docs** (solution-architecture.md, tech-spec.md): Informs framework configuration decisions
+- **Existing tests**: Detects current framework to avoid conflicts
+
+**Workflow Variables:**
+
+- `test_framework`: Auto-detected (playwright/cypress) or manually specified
+- `project_type`: Auto-detected from package.json (react/vue/angular/next/node)
+- `bundler`: Auto-detected from package.json (vite/webpack/rollup/esbuild)
+- `test_dir`: Root test directory (default: `{project-root}/tests`)
+- `use_typescript`: Prefer TypeScript configuration (default: true)
+- `framework_preference`: Auto-detection or force specific framework (default: "auto")
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts` with production-ready settings
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+   - Failure-only artifacts (traces, screenshots, videos)
+
+2. **Directory Structure**
+
+   ```
+   tests/
+   ├── e2e/                          # Test files (organize as needed)
+   ├── support/                      # Framework infrastructure (key pattern)
+   │   ├── fixtures/                 # Test fixtures with auto-cleanup
+   │   │   ├── index.ts             # Fixture merging
+   │   │   └── factories/           # Data factories (faker-based)
+   │   ├── helpers/                 # Utility functions
+   │   └── page-objects/            # Page object models (optional)
+   └── README.md                    # Setup and usage guide
+   ```
+
+   **Note**: Test organization (e2e/, api/, integration/, etc.) is flexible. The **support/** folder contains reusable fixtures, helpers, and factories - the core framework pattern.
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`, auth credentials
+   - `.nvmrc` with Node version (LTS)
+
+4. **Test Infrastructure**
+   - Fixture architecture using `mergeTests` pattern
+   - Data factories with auto-cleanup (faker-based)
+   - Sample tests demonstrating best practices
+   - Helper utilities for common operations
+
+5. **Documentation**
+   - `tests/README.md` with comprehensive setup instructions
+   - Inline comments explaining configuration choices
+   - References to TEA knowledge base
+
+**Secondary Deliverables:**
+
+- Updated `package.json` with minimal test script (`test:e2e`)
+- Sample test demonstrating fixture usage
+- Network-first testing patterns
+- Selector strategy guidance (data-testid)
+
+**Validation Safeguards:**
+
+- ✅ No existing framework detected (prevents conflicts)
+- ✅ package.json exists and is valid
+- ✅ Framework auto-detection successful or explicit choice provided
+- ✅ Sample test runs successfully
+- ✅ All generated files are syntactically correct
+
+## Key Features
+
+### Smart Framework Selection
+
+- **Auto-detection logic** based on project characteristics:
+  - **Playwright** recommended for: Large repos (100+ files), performance-critical apps, multi-browser support, complex debugging needs
+  - **Cypress** recommended for: Small teams prioritizing DX, component testing focus, real-time test development
+- Falls back to Playwright as default if uncertain
+
+### Production-Ready Patterns
+
+- **Fixture Architecture**: Pure function → fixture → `mergeTests` composition pattern
+- **Auto-Cleanup**: Fixtures automatically clean up test data in teardown
+- **Network-First**: Route interception before navigation to prevent race conditions
+- **Failure-Only Artifacts**: Screenshots/videos/traces only captured on failure to reduce storage
+- **Parallel Execution**: Configured for optimal CI performance
+
+### Industry Best Practices
+
+- **Selector Strategy**: Prescriptive guidance on `data-testid` attributes
+- **Data Factories**: Faker-based factories for realistic test data
+- **Contract Testing**: Recommends Pact for microservices architectures
+- **Error Handling**: Comprehensive timeout and retry configuration
+- **Reporting**: Multiple reporter formats (HTML, JUnit, console)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `fixture-architecture.md` - Pure function → fixture → mergeTests pattern
+- `data-factories.md` - Faker-based factories with auto-cleanup
+- `network-first.md` - Network interception before navigation
+- `playwright-config.md` - Playwright-specific best practices
+- `test-config.md` - General configuration guidelines
+
+## Integration with Other Workflows
+
+**Before framework:**
+
+- **plan-project** (Phase 2): Determines project scope and testing needs
+- **workflow-status**: Verifies project readiness
+
+**After framework:**
+
+- **ci**: Scaffold CI/CD pipeline using framework configuration
+- **test-design**: Plan test coverage strategy for the project
+- **atdd**: Generate failing acceptance tests using the framework
+
+**Coordinates with:**
+
+- **solution-architecture** (Phase 3): Aligns test structure with system architecture
+- **tech-spec**: Uses technical specifications to inform test configuration
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds framework initialization to Quality & Testing Progress section
+
+## Important Notes
+
+### Preflight Checks
+
+**Critical requirements** verified before scaffolding:
+
+- package.json exists in project root
+- No modern E2E framework already configured
+- Architecture/stack context available
+
+If any check fails, workflow **HALTS** and notifies user.
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console logs)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend:**
+
+- `data-testid` attributes for UI elements (framework-agnostic)
+- `data-cy` attributes if Cypress is chosen (Cypress-specific)
+- Avoid brittle CSS selectors or XPath
+
+### Standalone Operation
+
+This workflow operates independently:
+
+- **No story required**: Can be run at project initialization
+- **No epic context needed**: Works for greenfield and brownfield projects
+- **Autonomous**: Auto-detects configuration and proceeds without user input
+
+### Output Summary Format
+
+After completion, provides structured summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: playwright.config.ts
+- ✅ Directory structure: tests/e2e/, tests/support/
+- ✅ Environment config: .env.example
+- ✅ Node version: .nvmrc
+- ✅ Fixture architecture: tests/support/fixtures/
+- ✅ Data factories: tests/support/fixtures/factories/
+- ✅ Sample tests: tests/e2e/example.spec.ts
+- ✅ Documentation: tests/README.md
+
+**Next Steps**:
+
+1. Copy .env.example to .env and fill in environment variables
+2. Run npm install to install test dependencies
+3. Run npm run test:e2e to execute sample tests
+4. Review tests/README.md for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
+```
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] Configuration file created and syntactically valid
+- [ ] Directory structure exists with all folders
+- [ ] Environment configuration generated (.env.example, .nvmrc)
+- [ ] Sample tests run successfully (npm run test:e2e)
+- [ ] Documentation complete and accurate (tests/README.md)
+- [ ] No errors or warnings during scaffold
+- [ ] package.json scripts updated correctly
+- [ ] Fixtures and factories follow patterns from knowledge base
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New React + Vite project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - React project (from package.json)
+# - Vite bundler
+# - No existing test framework
+# - 150+ files (recommends Playwright)
+
+# TEA scaffolds:
+# - playwright.config.ts with Vite detection
+# - Component testing configuration
+# - React Testing Library helpers
+# - Sample component + E2E tests
+```
+
+**Scenario 2: Existing Node.js API project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - Node.js backend (no frontend framework)
+# - Express framework
+# - Small project (50 files)
+# - API endpoints in routes/
+
+# TEA scaffolds:
+# - playwright.config.ts focused on API testing
+# - tests/api/ directory structure
+# - API helper utilities
+# - Sample API tests with auth
+```
+
+**Scenario 3: Cypress preferred (explicit)**
+
+```bash
+# User sets framework preference
+# (in workflow config: framework_preference: "cypress")
+
+bmad tea *framework
+
+# TEA scaffolds:
+# - cypress.config.ts
+# - tests/e2e/ with Cypress patterns
+# - Cypress-specific commands
+# - data-cy selector strategy
+```
+
+## Troubleshooting
+
+**Issue: "Existing test framework detected"**
+
+- **Cause**: playwright.config._ or cypress.config._ already exists
+- **Solution**: Use `upgrade-framework` workflow (TBD) or manually remove existing config
+
+**Issue: "Cannot detect project type"**
+
+- **Cause**: package.json missing or malformed
+- **Solution**: Ensure package.json exists and has valid dependencies
+
+**Issue: "Sample test fails to run"**
+
+- **Cause**: Missing dependencies or incorrect BASE_URL
+- **Solution**: Run `npm install` and configure `.env` with correct URLs
+
+**Issue: "TypeScript compilation errors"**
+
+- **Cause**: Missing @types packages or tsconfig misconfiguration
+- **Solution**: Ensure TypeScript and type definitions are installed
+
+## Related Workflows
+
+- **ci**: Scaffold CI/CD pipeline → [ci/README.md](../ci/README.md)
+- **test-design**: Plan test coverage → [test-design/README.md](../test-design/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression suite → [automate/README.md](../automate/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, comprehensive README
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/framework/checklist.md
+++ b/src/modules/bmm/workflows/testarch/framework/checklist.md
@@ -0,0 +1,321 @@
+# Test Framework Setup - Validation Checklist
+
+This checklist ensures the framework workflow completes successfully and all deliverables meet quality standards.
+
+---
+
+## Prerequisites
+
+Before starting the workflow:
+
+- [ ] Project root contains valid `package.json`
+- [ ] No existing modern E2E framework detected (`playwright.config.*`, `cypress.config.*`)
+- [ ] Project type identifiable (React, Vue, Angular, Next.js, Node, etc.)
+- [ ] Bundler identifiable (Vite, Webpack, Rollup, esbuild) or not applicable
+- [ ] User has write permissions to create directories and files
+
+---
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] package.json successfully read and parsed
+- [ ] Project type extracted correctly
+- [ ] Bundler identified (or marked as N/A for backend projects)
+- [ ] No framework conflicts detected
+- [ ] Architecture documents located (if available)
+
+### Step 2: Framework Selection
+
+- [ ] Framework auto-detection logic executed
+- [ ] Framework choice justified (Playwright vs Cypress)
+- [ ] Framework preference respected (if explicitly set)
+- [ ] User notified of framework selection and rationale
+
+### Step 3: Directory Structure
+
+- [ ] `tests/` root directory created
+- [ ] `tests/e2e/` directory created (or user's preferred structure)
+- [ ] `tests/support/` directory created (critical pattern)
+- [ ] `tests/support/fixtures/` directory created
+- [ ] `tests/support/fixtures/factories/` directory created
+- [ ] `tests/support/helpers/` directory created
+- [ ] `tests/support/page-objects/` directory created (if applicable)
+- [ ] All directories have correct permissions
+
+**Note**: Test organization is flexible (e2e/, api/, integration/). The **support/** folder is the key pattern.
+
+### Step 4: Configuration Files
+
+- [ ] Framework config file created (`playwright.config.ts` or `cypress.config.ts`)
+- [ ] Config file uses TypeScript (if `use_typescript: true`)
+- [ ] Timeouts configured correctly (action: 15s, navigation: 30s, test: 60s)
+- [ ] Base URL configured with environment variable fallback
+- [ ] Trace/screenshot/video set to retain-on-failure
+- [ ] Multiple reporters configured (HTML + JUnit + console)
+- [ ] Parallel execution enabled
+- [ ] CI-specific settings configured (retries, workers)
+- [ ] Config file is syntactically valid (no compilation errors)
+
+### Step 5: Environment Configuration
+
+- [ ] `.env.example` created in project root
+- [ ] `TEST_ENV` variable defined
+- [ ] `BASE_URL` variable defined with default
+- [ ] `API_URL` variable defined (if applicable)
+- [ ] Authentication variables defined (if applicable)
+- [ ] Feature flag variables defined (if applicable)
+- [ ] `.nvmrc` created with appropriate Node version
+
+### Step 6: Fixture Architecture
+
+- [ ] `tests/support/fixtures/index.ts` created
+- [ ] Base fixture extended from Playwright/Cypress
+- [ ] Type definitions for fixtures created
+- [ ] mergeTests pattern implemented (if multiple fixtures)
+- [ ] Auto-cleanup logic included in fixtures
+- [ ] Fixture architecture follows knowledge base patterns
+
+### Step 7: Data Factories
+
+- [ ] At least one factory created (e.g., UserFactory)
+- [ ] Factories use @faker-js/faker for realistic data
+- [ ] Factories track created entities (for cleanup)
+- [ ] Factories implement `cleanup()` method
+- [ ] Factories integrate with fixtures
+- [ ] Factories follow knowledge base patterns
+
+### Step 8: Sample Tests
+
+- [ ] Example test file created (`tests/e2e/example.spec.ts`)
+- [ ] Test uses fixture architecture
+- [ ] Test demonstrates data factory usage
+- [ ] Test uses proper selector strategy (data-testid)
+- [ ] Test follows Given-When-Then structure
+- [ ] Test includes proper assertions
+- [ ] Network interception demonstrated (if applicable)
+
+### Step 9: Helper Utilities
+
+- [ ] API helper created (if API testing needed)
+- [ ] Network helper created (if network mocking needed)
+- [ ] Auth helper created (if authentication needed)
+- [ ] Helpers follow functional patterns
+- [ ] Helpers have proper error handling
+
+### Step 10: Documentation
+
+- [ ] `tests/README.md` created
+- [ ] Setup instructions included
+- [ ] Running tests section included
+- [ ] Architecture overview section included
+- [ ] Best practices section included
+- [ ] CI integration section included
+- [ ] Knowledge base references included
+- [ ] Troubleshooting section included
+
+### Step 11: Package.json Updates
+
+- [ ] Minimal test script added to package.json: `test:e2e`
+- [ ] Test framework dependency added (if not already present)
+- [ ] Type definitions added (if TypeScript)
+- [ ] Users can extend with additional scripts as needed
+
+---
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] Config file loads without errors
+- [ ] Config file passes linting (if linter configured)
+- [ ] Config file uses correct syntax for chosen framework
+- [ ] All paths in config resolve correctly
+- [ ] Reporter output directories exist or are created on test run
+
+### Test Execution Validation
+
+- [ ] Sample test runs successfully
+- [ ] Test execution produces expected output (pass/fail)
+- [ ] Test artifacts generated correctly (traces, screenshots, videos)
+- [ ] Test report generated successfully
+- [ ] No console errors or warnings during test run
+
+### Directory Structure Validation
+
+- [ ] All required directories exist
+- [ ] Directory structure matches framework conventions
+- [ ] No duplicate or conflicting directories
+- [ ] Directories accessible with correct permissions
+
+### File Integrity Validation
+
+- [ ] All generated files are syntactically correct
+- [ ] No placeholder text left in files (e.g., "TODO", "FIXME")
+- [ ] All imports resolve correctly
+- [ ] No hardcoded credentials or secrets in files
+- [ ] All file paths use correct separators for OS
+
+---
+
+## Quality Checks
+
+### Code Quality
+
+- [ ] Generated code follows project coding standards
+- [ ] TypeScript types are complete and accurate (no `any` unless necessary)
+- [ ] No unused imports or variables
+- [ ] Consistent code formatting (matches project style)
+- [ ] No linting errors in generated files
+
+### Best Practices Compliance
+
+- [ ] Fixture architecture follows pure function → fixture → mergeTests pattern
+- [ ] Data factories implement auto-cleanup
+- [ ] Network interception occurs before navigation
+- [ ] Selectors use data-testid strategy
+- [ ] Artifacts only captured on failure
+- [ ] Tests follow Given-When-Then structure
+- [ ] No hard-coded waits or sleeps
+
+### Knowledge Base Alignment
+
+- [ ] Fixture pattern matches `fixture-architecture.md`
+- [ ] Data factories match `data-factories.md`
+- [ ] Network handling matches `network-first.md`
+- [ ] Config follows `playwright-config.md` or `test-config.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in configuration files
+- [ ] .env.example contains placeholders, not real values
+- [ ] Sensitive test data handled securely
+- [ ] API keys and tokens use environment variables
+- [ ] No secrets committed to version control
+
+---
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] Framework initialization logged in Quality & Testing Progress section
+- [ ] Status file updated with completion timestamp
+- [ ] Status file shows framework: Playwright or Cypress
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments identified from tea-index.csv
+- [ ] Knowledge fragments successfully loaded
+- [ ] Patterns from knowledge base applied correctly
+- [ ] Knowledge base references included in documentation
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `ci` workflow after completion
+- [ ] Can proceed to `test-design` workflow after completion
+- [ ] Can proceed to `atdd` workflow after completion
+- [ ] Framework setup compatible with downstream workflows
+
+---
+
+## Completion Criteria
+
+**All of the following must be true:**
+
+- [ ] All prerequisite checks passed
+- [ ] All process steps completed without errors
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Sample test executes successfully
+- [ ] User can run `npm run test:e2e` without errors
+- [ ] Documentation is complete and accurate
+- [ ] No critical issues or blockers identified
+
+---
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Copy `.env.example` to `.env`
+2. [ ] Fill in environment-specific values in `.env`
+3. [ ] Run `npm install` to install test dependencies
+4. [ ] Run `npm run test:e2e` to verify setup
+5. [ ] Review `tests/README.md` for project-specific guidance
+
+**Recommended next workflows:**
+
+1. [ ] Run `ci` workflow to set up CI/CD pipeline
+2. [ ] Run `test-design` workflow to plan test coverage
+3. [ ] Run `atdd` workflow when ready to develop stories
+
+---
+
+## Rollback Procedure
+
+If workflow fails and needs to be rolled back:
+
+1. [ ] Delete `tests/` directory
+2. [ ] Remove test scripts from package.json
+3. [ ] Delete `.env.example` (if created)
+4. [ ] Delete `.nvmrc` (if created)
+5. [ ] Delete framework config file
+6. [ ] Remove test dependencies from package.json (if added)
+7. [ ] Run `npm install` to clean up node_modules
+
+---
+
+## Notes
+
+### Common Issues
+
+**Issue**: Config file has TypeScript errors
+
+- **Solution**: Ensure `@playwright/test` or `cypress` types are installed
+
+**Issue**: Sample test fails to run
+
+- **Solution**: Check BASE_URL in .env, ensure app is running
+
+**Issue**: Fixture cleanup not working
+
+- **Solution**: Verify cleanup() is called in fixture teardown
+
+**Issue**: Network interception not working
+
+- **Solution**: Ensure route setup occurs before page.goto()
+
+### Framework-Specific Considerations
+
+**Playwright:**
+
+- Requires Node.js 18+
+- Browser binaries auto-installed on first run
+- Trace viewer requires running `npx playwright show-trace`
+
+**Cypress:**
+
+- Requires Node.js 18+
+- Cypress app opens on first run
+- Component testing requires additional setup
+
+### Version Compatibility
+
+- [ ] Node.js version matches .nvmrc
+- [ ] Framework version compatible with Node.js version
+- [ ] TypeScript version compatible with framework
+- [ ] All peer dependencies satisfied
+
+---
+
+**Checklist Complete**: Sign off when all items checked and validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Framework:** **\*\***\_\_\_**\*\*** (Playwright / Cypress)
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/src/modules/bmm/workflows/testarch/framework/instructions.md
+++ b/src/modules/bmm/workflows/testarch/framework/instructions.md
@@ -1,43 +1,455 @@
 <!-- Powered by BMAD-CORE™ -->

-# Test Framework Setup v3.0
+# Test Framework Setup

-```xml
-<task id="bmad/bmm/testarch/framework" name="Test Framework Setup">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Confirm `package.json` exists.</i>
-    <i>- Verify no modern E2E harness is already configured.</i>
-    <i>- Have architectural/stack context available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Run Preflight Checks">
-      <action>Validate each preflight requirement; stop immediately if any fail.</action>
-    </step>
-    <step n="2" title="Scaffold Framework">
-      <action>Identify framework stack from `package.json` (React/Vue/Angular/Next.js) and bundler (Vite/Webpack/Rollup/esbuild).</action>
-      <action>Select Playwright for large/perf-critical repos, Cypress for small DX-first teams.</action>
-      <action>Create folders `{framework}/tests/`, `{framework}/support/fixtures/`, `{framework}/support/helpers/`.</action>
-      <action>Configure timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit).</action>
-      <action>Generate `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL` plus `.nvmrc`.</action>
-      <action>Implement pure function → fixture → `mergeTests` pattern and faker-based data factories.</action>
-      <action>Enable failure-only screenshots/videos and document setup in README.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce Playwright/Cypress scaffold (config + support tree), `.env.example`, `.nvmrc`, seed tests, and README instructions.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If prerequisites fail or an existing harness is detected, halt and notify the user.</i>
-  </halt>
-  <notes>
-    <i>Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load the `knowledge/` fragments relevant to this task (fixtures, network, config).</i>
-    <i>Playwright: take advantage of worker parallelism, trace viewer, multi-language support.</i>
-    <i>Cypress: avoid when dependent API chains are heavy; consider component testing (Vitest/Cypress CT).</i>
-    <i>Contract testing: suggest Pact for microservices; always recommend data-cy/data-testid selectors.</i>
-  </notes>
-  <output>
-    <i>Scaffolded framework assets and summary of what was created.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/framework`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Initialize a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and best practices. This workflow scaffolds the complete testing infrastructure for modern web applications.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ `package.json` exists in project root
+- ✅ No modern E2E test harness is already configured (check for existing `playwright.config.*` or `cypress.config.*`)
+- ✅ Architectural/stack context available (project type, bundler, dependencies)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Validate package.json**
+   - Read `{project-root}/package.json`
+   - Extract project type (React, Vue, Angular, Next.js, Node, etc.)
+   - Identify bundler (Vite, Webpack, Rollup, esbuild)
+   - Note existing test dependencies
+
+2. **Check for Existing Framework**
+   - Search for `playwright.config.*`, `cypress.config.*`, `cypress.json`
+   - Check `package.json` for `@playwright/test` or `cypress` dependencies
+   - If found, HALT with message: "Existing test framework detected. Use workflow `upgrade-framework` instead."
+
+3. **Gather Context**
+   - Look for architecture documents (`solution-architecture.md`, `tech-spec*.md`)
+   - Check for API documentation or endpoint lists
+   - Identify authentication requirements
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold Framework
+
+### Actions
+
+1. **Framework Selection**
+
+   **Default Logic:**
+   - **Playwright** (recommended for):
+     - Large repositories (100+ files)
+     - Performance-critical applications
+     - Multi-browser support needed
+     - Complex user flows requiring video/trace debugging
+     - Projects requiring worker parallelism
+
+   - **Cypress** (recommended for):
+     - Small teams prioritizing developer experience
+     - Component testing focus
+     - Real-time reloading during test development
+     - Simpler setup requirements
+
+   **Detection Strategy:**
+   - Check `package.json` for existing preference
+   - Consider `project_size` variable from workflow config
+   - Use `framework_preference` variable if set
+   - Default to **Playwright** if uncertain
+
+2. **Create Directory Structure**
+
+   ```
+   {project-root}/
+   ├── tests/                        # Root test directory
+   │   ├── e2e/                      # Test files (users organize as needed)
+   │   ├── support/                  # Framework infrastructure (key pattern)
+   │   │   ├── fixtures/             # Test fixtures (data, mocks)
+   │   │   ├── helpers/              # Utility functions
+   │   │   └── page-objects/         # Page object models (optional)
+   │   └── README.md                 # Test suite documentation
+   ```
+
+   **Note**: Users organize test files (e2e/, api/, integration/, component/) as needed. The **support/** folder is the critical pattern for fixtures and helpers used across tests.
+
+3. **Generate Configuration File**
+
+   **For Playwright** (`playwright.config.ts` or `playwright.config.js`):
+
+   ```typescript
+   import { defineConfig, devices } from '@playwright/test';
+
+   export default defineConfig({
+     testDir: './tests/e2e',
+     fullyParallel: true,
+     forbidOnly: !!process.env.CI,
+     retries: process.env.CI ? 2 : 0,
+     workers: process.env.CI ? 1 : undefined,
+
+     timeout: 60 * 1000, // Test timeout: 60s
+     expect: {
+       timeout: 15 * 1000, // Assertion timeout: 15s
+     },
+
+     use: {
+       baseURL: process.env.BASE_URL || 'http://localhost:3000',
+       trace: 'retain-on-failure',
+       screenshot: 'only-on-failure',
+       video: 'retain-on-failure',
+       actionTimeout: 15 * 1000, // Action timeout: 15s
+       navigationTimeout: 30 * 1000, // Navigation timeout: 30s
+     },
+
+     reporter: [['html', { outputFolder: 'test-results/html' }], ['junit', { outputFile: 'test-results/junit.xml' }], ['list']],
+
+     projects: [
+       { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
+       { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
+       { name: 'webkit', use: { ...devices['Desktop Safari'] } },
+     ],
+   });
+   ```
+
+   **For Cypress** (`cypress.config.ts` or `cypress.config.js`):
+
+   ```typescript
+   import { defineConfig } from 'cypress';
+
+   export default defineConfig({
+     e2e: {
+       baseUrl: process.env.BASE_URL || 'http://localhost:3000',
+       specPattern: 'tests/e2e/**/*.cy.{js,jsx,ts,tsx}',
+       supportFile: 'tests/support/e2e.ts',
+       video: false,
+       screenshotOnRunFailure: true,
+
+       setupNodeEvents(on, config) {
+         // implement node event listeners here
+       },
+     },
+
+     retries: {
+       runMode: 2,
+       openMode: 0,
+     },
+
+     defaultCommandTimeout: 15000,
+     requestTimeout: 30000,
+     responseTimeout: 30000,
+     pageLoadTimeout: 60000,
+   });
+   ```
+
+4. **Generate Environment Configuration**
+
+   Create `.env.example`:
+
+   ```bash
+   # Test Environment Configuration
+   TEST_ENV=local
+   BASE_URL=http://localhost:3000
+   API_URL=http://localhost:3001/api
+
+   # Authentication (if applicable)
+   TEST_USER_EMAIL=test@example.com
+   TEST_USER_PASSWORD=
+
+   # Feature Flags (if applicable)
+   FEATURE_FLAG_NEW_UI=true
+
+   # API Keys (if applicable)
+   TEST_API_KEY=
+   ```
+
+5. **Generate Node Version File**
+
+   Create `.nvmrc`:
+
+   ```
+   20.11.0
+   ```
+
+   (Use Node version from existing `.nvmrc` or default to current LTS)
+
+6. **Implement Fixture Architecture**
+
+   **Knowledge Base Reference**: `testarch/knowledge/fixture-architecture.md`
+
+   Create `tests/support/fixtures/index.ts`:
+
+   ```typescript
+   import { test as base } from '@playwright/test';
+   import { UserFactory } from './factories/user-factory';
+
+   type TestFixtures = {
+     userFactory: UserFactory;
+   };
+
+   export const test = base.extend<TestFixtures>({
+     userFactory: async ({}, use) => {
+       const factory = new UserFactory();
+       await use(factory);
+       await factory.cleanup(); // Auto-cleanup
+     },
+   });
+
+   export { expect } from '@playwright/test';
+   ```
+
+7. **Implement Data Factories**
+
+   **Knowledge Base Reference**: `testarch/knowledge/data-factories.md`
+
+   Create `tests/support/fixtures/factories/user-factory.ts`:
+
+   ```typescript
+   import { faker } from '@faker-js/faker';
+
+   export class UserFactory {
+     private createdUsers: string[] = [];
+
+     async createUser(overrides = {}) {
+       const user = {
+         email: faker.internet.email(),
+         name: faker.person.fullName(),
+         password: faker.internet.password({ length: 12 }),
+         ...overrides,
+       };
+
+       // API call to create user
+       const response = await fetch(`${process.env.API_URL}/users`, {
+         method: 'POST',
+         headers: { 'Content-Type': 'application/json' },
+         body: JSON.stringify(user),
+       });
+
+       const created = await response.json();
+       this.createdUsers.push(created.id);
+       return created;
+     }
+
+     async cleanup() {
+       // Delete all created users
+       for (const userId of this.createdUsers) {
+         await fetch(`${process.env.API_URL}/users/${userId}`, {
+           method: 'DELETE',
+         });
+       }
+       this.createdUsers = [];
+     }
+   }
+   ```
+
+8. **Generate Sample Tests**
+
+   Create `tests/e2e/example.spec.ts`:
+
+   ```typescript
+   import { test, expect } from '../support/fixtures';
+
+   test.describe('Example Test Suite', () => {
+     test('should load homepage', async ({ page }) => {
+       await page.goto('/');
+       await expect(page).toHaveTitle(/Home/i);
+     });
+
+     test('should create user and login', async ({ page, userFactory }) => {
+       // Create test user
+       const user = await userFactory.createUser();
+
+       // Login
+       await page.goto('/login');
+       await page.fill('[data-testid="email-input"]', user.email);
+       await page.fill('[data-testid="password-input"]', user.password);
+       await page.click('[data-testid="login-button"]');
+
+       // Assert login success
+       await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+     });
+   });
+   ```
+
+9. **Update package.json Scripts**
+
+   Add minimal test script to `package.json`:
+
+   ```json
+   {
+     "scripts": {
+       "test:e2e": "playwright test"
+     }
+   }
+   ```
+
+   **Note**: Users can add additional scripts as needed (e.g., `--ui`, `--headed`, `--debug`, `show-report`).
+
+10. **Generate Documentation**
+
+    Create `tests/README.md` with setup instructions (see Step 3 deliverables).
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts`
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+
+2. **Directory Structure**
+   - `tests/` with `e2e/`, `api/`, `support/` subdirectories
+   - `support/fixtures/` for test fixtures
+   - `support/helpers/` for utility functions
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`
+   - `.nvmrc` with Node version
+
+4. **Test Infrastructure**
+   - Fixture architecture (`mergeTests` pattern)
+   - Data factories (faker-based, with auto-cleanup)
+   - Sample tests demonstrating patterns
+
+5. **Documentation**
+   - `tests/README.md` with setup instructions
+   - Comments in config files explaining options
+
+### README Contents
+
+The generated `tests/README.md` should include:
+
+- **Setup Instructions**: How to install dependencies, configure environment
+- **Running Tests**: Commands for local execution, headed mode, debug mode
+- **Architecture Overview**: Fixture pattern, data factories, page objects
+- **Best Practices**: Selector strategy (data-testid), test isolation, cleanup
+- **CI Integration**: How tests run in CI/CD pipeline
+- **Knowledge Base References**: Links to relevant TEA knowledge fragments
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `fixture-architecture.md` - Pure function → fixture → `mergeTests` composition with auto-cleanup (406 lines, 5 examples)
+- `data-factories.md` - Faker-based factories with overrides, nested factories, API seeding, auto-cleanup (498 lines, 5 examples)
+- `network-first.md` - Network-first testing safeguards: intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
+- `playwright-config.md` - Playwright-specific configuration: environment-based, timeout standards, artifact output, parallelization, project config (722 lines, 5 examples)
+- `test-quality.md` - Test design principles: deterministic, isolated with cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing (Cypress CT or use Vitest)
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend**:
+
+- `data-testid` attributes for UI elements
+- `data-cy` attributes if Cypress is chosen
+- Avoid brittle CSS selectors or XPath
+
+### Contract Testing
+
+For microservices architectures, **recommend Pact** for consumer-driven contract testing alongside E2E tests.
+
+### Failure Artifacts
+
+Configure **failure-only** capture:
+
+- Screenshots: only on failure
+- Videos: retain on failure (delete on success)
+- Traces: retain on failure (Playwright)
+
+This reduces storage overhead while maintaining debugging capability.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: `playwright.config.ts`
+- ✅ Directory structure: `tests/e2e/`, `tests/support/`
+- ✅ Environment config: `.env.example`
+- ✅ Node version: `.nvmrc`
+- ✅ Fixture architecture: `tests/support/fixtures/`
+- ✅ Data factories: `tests/support/fixtures/factories/`
+- ✅ Sample tests: `tests/e2e/example.spec.ts`
+- ✅ Documentation: `tests/README.md`
+
+**Next Steps**:
+
+1. Copy `.env.example` to `.env` and fill in environment variables
+2. Run `npm install` to install test dependencies
+3. Run `npm run test:e2e` to execute sample tests
+4. Review `tests/README.md` for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
 ```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Configuration file created and valid
+- [ ] Directory structure exists
+- [ ] Environment configuration generated
+- [ ] Sample tests run successfully
+- [ ] Documentation complete and accurate
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/framework/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/framework/workflow.yaml
@@ -1,25 +1,67 @@
 # Test Architect workflow: framework
 name: testarch-framework
-description: "Initialize or refresh the test framework harness."
+description: "Initialize production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, and configuration"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/framework"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"

-template: false
+# Variables and inputs
+variables:
+  test_framework: "" # playwright or cypress - auto-detect from package.json or ask
+  project_type: "" # react, vue, angular, next, node - detected from package.json
+  bundler: "" # vite, webpack, rollup, esbuild - detected from package.json
+  test_dir: "{project-root}/tests" # Root test directory
+  config_file: "" # Will be set to {project-root}/{framework}.config.{ts|js}
+  use_typescript: true # Prefer TypeScript configuration
+  standalone_mode: true # Can run without story context
+
+  # Framework selection criteria
+  framework_preference: "auto" # auto, playwright, cypress
+  project_size: "auto" # auto, small, large - influences framework choice
+
+  # Output artifacts
+  generate_env_example: true
+  generate_nvmrc: true
+  generate_readme: true
+  generate_sample_tests: true
+
+# Output configuration
+default_output_file: "{test_dir}/README.md" # Main deliverable is test setup README
+
+# Required tools
+required_tools:
+  - read_file # Read package.json, existing configs
+  - write_file # Create config files, helpers, fixtures, tests
+  - create_directory # Create test directory structure
+  - list_files # Check for existing framework
+  - search_repo # Find architecture docs
+
+# Recommended inputs
+recommended_inputs:
+  - package_json: "package.json with project dependencies and scripts"
+  - architecture_docs: "Architecture or tech stack documentation (optional)"
+  - existing_tests: "Existing test files to detect current framework (optional)"

 tags:
  - qa
  - setup
  - test-architect
+  - framework
+  - initialization

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts; auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/gate/instructions.md
+++ b/src/modules/bmm/workflows/testarch/gate/instructions.md
@@ -1,39 +0,0 @@
-<!-- Powered by BMAD-CORE™ -->
-
-# Quality Gate v3.0
-
-```xml
-<task id="bmad/bmm/testarch/gate" name="Quality Gate">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Latest assessments (risk/test design, trace, automation, NFR) are available.</i>
-    <i>- Team has consensus on fixes/mitigations.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Gather required assessments and confirm consensus; halt if information is stale or missing.</action>
-    </step>
-    <step n="2" title="Determine Gate Decision">
-      <action>Assemble story metadata (id, title, links) for the gate file.</action>
-      <action>Apply deterministic rules: PASS (all critical issues resolved), CONCERNS (minor residual risk), FAIL (critical blockers), WAIVED (business-approved waiver).</action>
-      <action>Document rationale, residual risks, owners, due dates, and waiver details where applicable.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Update gate YAML with schema fields (story info, status, rationale, waiver, top issues, risk summary, recommendations, NFR validation, history).</action>
-      <action>Provide summary message for the team highlighting decision and next steps.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If reviews are incomplete or risk data is outdated, halt and request the necessary reruns.</i>
-  </halt>
-  <notes>
-    <i>Pull the risk-governance, probability-impact, and test-quality fragments via `{project-root}/bmad/bmm/testarch/tea-index.csv` before issuing a gate decision.</i>
-    <i>FAIL whenever unresolved P0 risks/tests or security issues remain.</i>
-    <i>CONCERNS when mitigations are planned but residual risk exists; WAIVED requires reason, approver, and expiry.</i>
-    <i>Maintain audit trail in the history section.</i>
-  </notes>
-  <output>
-    <i>Gate YAML entry and communication summary documenting the decision.</i>
-  </output>
-</task>
-```
--- a/src/modules/bmm/workflows/testarch/gate/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/gate/workflow.yaml
@@ -1,25 +0,0 @@
-# Test Architect workflow: gate
-name: testarch-gate
-description: "Record the quality gate decision for the story."
-author: "BMad"
-
-config_source: "{project-root}/bmad/bmm/config.yaml"
-output_folder: "{config_source}:output_folder"
-user_name: "{config_source}:user_name"
-communication_language: "{config_source}:communication_language"
-date: system-generated
-
-installed_path: "{project-root}/bmad/bmm/workflows/testarch/gate"
-instructions: "{installed_path}/instructions.md"
-
-template: false
-
-tags:
-  - qa
-  - gate
-  - test-architect
-
-execution_hints:
-  interactive: false
-  autonomous: true
-  iterative: true
--- a/src/modules/bmm/workflows/testarch/nfr-assess/README.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/README.md
@@ -0,0 +1,469 @@
+# Non-Functional Requirements Assessment Workflow
+
+**Workflow ID:** `testarch-nfr`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *nfr-assess`
+
+---
+
+## Overview
+
+The **nfr-assess** workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Features:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate CI/CD-ready YAML snippets for quality gates
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## When to Use This Workflow
+
+Use `*nfr-assess` when you need to:
+
+- ✅ Validate non-functional requirements before release
+- ✅ Assess performance against defined thresholds
+- ✅ Verify security requirements are met
+- ✅ Validate reliability and error handling
+- ✅ Check maintainability standards (coverage, quality, documentation)
+- ✅ Generate NFR assessment reports for stakeholders
+- ✅ Create gate-ready metrics for CI/CD pipelines
+
+**Typical Timing:**
+
+- Before release (validate all NFRs)
+- Before PR merge (validate critical NFRs)
+- During sprint retrospectives (assess maintainability)
+- After performance testing (validate performance NFRs)
+- After security audit (validate security NFRs)
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- NFR targets are undefined and cannot be obtained → Halt and request definition
+- Implementation is not accessible for evaluation → Halt and request deployment
+
+---
+
+## Usage
+
+### Basic Usage (BMad Mode)
+
+```bash
+bmad tea *nfr-assess
+```
+
+The workflow will:
+
+1. Read tech-spec.md for NFR requirements
+2. Gather evidence from test results, metrics, logs
+3. Assess each NFR category against thresholds
+4. Generate NFR assessment report
+5. Save to `bmad/output/nfr-assessment.md`
+
+### Standalone Mode (No Tech Spec)
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security true \
+  --assess-reliability true \
+  --assess-maintainability true \
+  --performance-response-time-ms 500 \
+  --security-score-min 85
+```
+
+---
+
+## Workflow Steps
+
+1. **Load Context** - Read tech spec, PRD, knowledge base fragments
+2. **Identify NFRs** - Determine categories and thresholds
+3. **Gather Evidence** - Read test results, metrics, logs, CI results
+4. **Assess NFRs** - Apply deterministic PASS/CONCERNS/FAIL rules
+5. **Identify Actions** - Quick wins, recommended actions, monitoring hooks
+6. **Generate Deliverables** - NFR assessment report, gate YAML, evidence checklist
+
+---
+
+## Outputs
+
+### NFR Assessment Report (`nfr-assessment.md`)
+
+Comprehensive markdown file with:
+
+- Executive summary (overall status, critical issues)
+- Assessment by category (performance, security, reliability, maintainability)
+- Evidence for each NFR (test results, metrics, thresholds)
+- Status classification (PASS/CONCERNS/FAIL)
+- Quick wins section
+- Recommended actions section
+- Evidence gaps checklist
+
+### Gate YAML Snippet (Optional)
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  concerns: 1
+  blockers: false
+```
+
+### Evidence Checklist (Optional)
+
+- List of NFRs with missing or incomplete evidence
+- Owners for evidence collection
+- Suggested evidence sources
+- Deadlines for evidence collection
+
+---
+
+## NFR Categories
+
+### Performance
+
+**Criteria:** Response time, throughput, resource usage, scalability
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70%
+- Memory usage: < 80%
+
+**Evidence Sources:** Load test results, APM data, Lighthouse reports, Playwright traces
+
+---
+
+### Security
+
+**Criteria:** Authentication, authorization, data protection, vulnerability management
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- MFA enabled
+
+**Evidence Sources:** SAST results, DAST results, dependency scanning, pentest reports
+
+---
+
+### Reliability
+
+**Criteria:** Availability, error handling, fault tolerance, disaster recovery
+**Thresholds (Default):**
+
+- Uptime: >= 99.9%
+- Error rate: < 0.1%
+- MTTR: < 15 minutes
+- CI burn-in: 100 consecutive runs
+
+**Evidence Sources:** Uptime monitoring, error logs, CI burn-in results, chaos tests
+
+---
+
+### Maintainability
+
+**Criteria:** Code quality, test coverage, documentation, technical debt
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality: >= 85/100
+- Technical debt: < 5%
+- Documentation: >= 90%
+
+**Evidence Sources:** Coverage reports, static analysis, documentation audit, test review
+
+---
+
+## Assessment Rules
+
+### PASS ✅
+
+- Evidence exists AND meets or exceeds threshold
+- No concerns flagged in evidence
+- Quality is acceptable
+
+### CONCERNS ⚠️
+
+- Threshold is UNKNOWN (not defined)
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+
+### FAIL ❌
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # NFR categories to assess
+  assess_performance: true
+  assess_security: true
+  assess_reliability: true
+  assess_maintainability: true
+
+  # Custom NFR categories
+  custom_nfr_categories: '' # e.g., "accessibility,compliance"
+
+  # Evidence sources
+  test_results_dir: '{project-root}/test-results'
+  metrics_dir: '{project-root}/metrics'
+  logs_dir: '{project-root}/logs'
+  include_ci_results: true
+
+  # Thresholds
+  performance_response_time_ms: 500
+  performance_throughput_rps: 100
+  security_score_min: 85
+  reliability_uptime_pct: 99.9
+  maintainability_coverage_pct: 80
+
+  # Assessment configuration
+  use_deterministic_rules: true
+  never_guess_thresholds: true
+  require_evidence: true
+  suggest_monitoring: true
+
+  # Output configuration
+  output_file: '{output_folder}/nfr-assessment.md'
+  generate_gate_yaml: true
+  generate_evidence_checklist: true
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+- `nfr-criteria.md` - Non-functional requirements criteria
+- `ci-burn-in.md` - CI/CD burn-in patterns for reliability
+- `test-quality.md` - Test quality expectations (maintainability)
+- `playwright-config.md` - Performance configuration patterns
+
+---
+
+## Examples
+
+### Example 1: Full NFR Assessment Before Release
+
+```bash
+bmad tea *nfr-assess
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - Story 1.3
+
+**Overall Status:** PASS ✅ (No blockers)
+
+## Performance Assessment
+
+- Response Time p95: PASS ✅ (320ms < 500ms threshold)
+- Throughput: PASS ✅ (250 RPS > 100 RPS threshold)
+
+## Security Assessment
+
+- Authentication: PASS ✅ (MFA enforced)
+- Data Protection: PASS ✅ (AES-256 + TLS 1.3)
+
+## Reliability Assessment
+
+- Uptime: PASS ✅ (99.95% > 99.9% threshold)
+- Error Rate: PASS ✅ (0.05% < 0.1% threshold)
+
+## Maintainability Assessment
+
+- Test Coverage: PASS ✅ (87% > 80% threshold)
+- Code Quality: PASS ✅ (92/100 > 85/100 threshold)
+
+Gate Status: PASS ✅ - Ready for release
+```
+
+### Example 2: NFR Assessment with Concerns
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - User Authentication
+
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Recommendation:** HIGH - Enforce MFA for all new accounts
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA
+   - No code changes needed
+
+Gate Status: CONCERNS ⚠️ - Address HIGH priority issues before release
+```
+
+### Example 3: Performance-Only Assessment
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security false \
+  --assess-reliability false \
+  --assess-maintainability false
+```
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned in assessment report
+
+### "FAIL status blocks release"
+
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Integration with Other Workflows
+
+- **testarch-test-design** → `*nfr-assess` - Define NFR requirements, then assess
+- **testarch-framework** → `*nfr-assess` - Set up frameworks, then validate NFRs
+- **testarch-ci** → `*nfr-assess` - Configure CI, then assess reliability with burn-in
+- `*nfr-assess` → **testarch-trace (Phase 2)** - Assess NFRs, then apply quality gates
+- `*nfr-assess` → **testarch-test-review** - Assess maintainability, then review tests
+
+---
+
+## Best Practices
+
+1. **Never Guess Thresholds**
+   - If threshold is unknown, mark as CONCERNS
+   - Recommend defining threshold in tech-spec.md
+   - Don't infer thresholds from similar features
+
+2. **Evidence-Based Assessment**
+   - Every assessment must be backed by evidence
+   - Mark NFRs without evidence as "NO EVIDENCE"
+   - Don't assume or infer - require explicit evidence
+
+3. **Deterministic Rules**
+   - Apply PASS/CONCERNS/FAIL consistently
+   - Document reasoning for each classification
+   - Use same rules across all NFR categories
+
+4. **Actionable Recommendations**
+   - Provide specific steps, not generic advice
+   - Include priority, effort estimate, owner suggestion
+   - Focus on quick wins first
+
+5. **Gate Integration**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Quality Gates
+
+| Status      | Criteria                     | Action                      |
+| ----------- | ---------------------------- | --------------------------- |
+| PASS ✅     | All NFRs have PASS status    | Ready for release           |
+| CONCERNS ⚠️ | Any NFR has CONCERNS status  | Address before next release |
+| FAIL ❌     | Critical NFR has FAIL status | Do not release - BLOCKER    |
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define NFR requirements and test plan
+- `bmad tea *framework` - Set up performance/security testing frameworks
+- `bmad tea *ci` - Configure CI/CD for NFR validation
+- `bmad tea *trace` (Phase 2) - Apply quality gates using NFR assessment metrics
+- `bmad tea *test-review` - Review test quality (maintainability NFR)
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./nfr-report-template.md) - NFR assessment report template
+- [Knowledge Base](../../testarch/knowledge/) - NFR criteria and best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/checklist.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/checklist.md
@@ -0,0 +1,405 @@
+# Non-Functional Requirements Assessment - Validation Checklist
+
+**Workflow:** `testarch-nfr`
+**Purpose:** Ensure comprehensive and evidence-based NFR assessment with actionable recommendations
+
+---
+
+## Prerequisites Validation
+
+- [ ] Implementation is deployed and accessible for evaluation
+- [ ] Evidence sources are available (test results, metrics, logs, CI results)
+- [ ] NFR categories are determined (performance, security, reliability, maintainability, custom)
+- [ ] Evidence directories exist and are accessible (`test_results_dir`, `metrics_dir`, `logs_dir`)
+- [ ] Knowledge base is loaded (nfr-criteria, ci-burn-in, test-quality)
+
+---
+
+## Context Loading
+
+- [ ] Tech-spec.md loaded successfully (if available)
+- [ ] PRD.md loaded (if available)
+- [ ] Story file loaded (if applicable)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`:
+  - [ ] `nfr-criteria.md`
+  - [ ] `ci-burn-in.md`
+  - [ ] `test-quality.md`
+  - [ ] `playwright-config.md` (if using Playwright)
+
+---
+
+## NFR Categories and Thresholds
+
+### Performance
+
+- [ ] Response time threshold defined or marked as UNKNOWN
+- [ ] Throughput threshold defined or marked as UNKNOWN
+- [ ] Resource usage thresholds defined or marked as UNKNOWN
+- [ ] Scalability requirements defined or marked as UNKNOWN
+
+### Security
+
+- [ ] Authentication requirements defined or marked as UNKNOWN
+- [ ] Authorization requirements defined or marked as UNKNOWN
+- [ ] Data protection requirements defined or marked as UNKNOWN
+- [ ] Vulnerability management thresholds defined or marked as UNKNOWN
+- [ ] Compliance requirements identified (GDPR, HIPAA, PCI-DSS, etc.)
+
+### Reliability
+
+- [ ] Availability (uptime) threshold defined or marked as UNKNOWN
+- [ ] Error rate threshold defined or marked as UNKNOWN
+- [ ] MTTR (Mean Time To Recovery) threshold defined or marked as UNKNOWN
+- [ ] Fault tolerance requirements defined or marked as UNKNOWN
+- [ ] Disaster recovery requirements defined (RTO, RPO) or marked as UNKNOWN
+
+### Maintainability
+
+- [ ] Test coverage threshold defined or marked as UNKNOWN
+- [ ] Code quality threshold defined or marked as UNKNOWN
+- [ ] Technical debt threshold defined or marked as UNKNOWN
+- [ ] Documentation completeness threshold defined or marked as UNKNOWN
+
+### Custom NFR Categories (if applicable)
+
+- [ ] Custom NFR category 1: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 2: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 3: Thresholds defined or marked as UNKNOWN
+
+---
+
+## Evidence Gathering
+
+### Performance Evidence
+
+- [ ] Load test results collected (JMeter, k6, Gatling, etc.)
+- [ ] Application metrics collected (response times, throughput, resource usage)
+- [ ] APM data collected (New Relic, Datadog, Dynatrace, etc.)
+- [ ] Lighthouse reports collected (if web app)
+- [ ] Playwright performance traces collected (if applicable)
+
+### Security Evidence
+
+- [ ] SAST results collected (SonarQube, Checkmarx, Veracode, etc.)
+- [ ] DAST results collected (OWASP ZAP, Burp Suite, etc.)
+- [ ] Dependency scanning results collected (Snyk, Dependabot, npm audit)
+- [ ] Penetration test reports collected (if available)
+- [ ] Security audit logs collected
+- [ ] Compliance audit results collected (if applicable)
+
+### Reliability Evidence
+
+- [ ] Uptime monitoring data collected (Pingdom, UptimeRobot, StatusCake)
+- [ ] Error logs collected
+- [ ] Error rate metrics collected
+- [ ] CI burn-in results collected (stability over time)
+- [ ] Chaos engineering test results collected (if available)
+- [ ] Failover/recovery test results collected (if available)
+- [ ] Incident reports and postmortems collected (if applicable)
+
+### Maintainability Evidence
+
+- [ ] Code coverage reports collected (Istanbul, NYC, c8, JaCoCo)
+- [ ] Static analysis results collected (ESLint, SonarQube, CodeClimate)
+- [ ] Technical debt metrics collected
+- [ ] Documentation audit results collected
+- [ ] Test review report collected (from test-review workflow, if available)
+- [ ] Git metrics collected (code churn, commit frequency, etc.)
+
+---
+
+## NFR Assessment with Deterministic Rules
+
+### Performance Assessment
+
+- [ ] Response time assessed against threshold
+- [ ] Throughput assessed against threshold
+- [ ] Resource usage assessed against threshold
+- [ ] Scalability assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, metric name)
+
+### Security Assessment
+
+- [ ] Authentication strength assessed against requirements
+- [ ] Authorization controls assessed against requirements
+- [ ] Data protection assessed against requirements
+- [ ] Vulnerability management assessed against thresholds
+- [ ] Compliance assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, scan result)
+
+### Reliability Assessment
+
+- [ ] Availability (uptime) assessed against threshold
+- [ ] Error rate assessed against threshold
+- [ ] MTTR assessed against threshold
+- [ ] Fault tolerance assessed against requirements
+- [ ] Disaster recovery assessed against requirements (RTO, RPO)
+- [ ] CI burn-in assessed (stability over time)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, monitoring data)
+
+### Maintainability Assessment
+
+- [ ] Test coverage assessed against threshold
+- [ ] Code quality assessed against threshold
+- [ ] Technical debt assessed against threshold
+- [ ] Documentation completeness assessed against threshold
+- [ ] Test quality assessed (from test-review, if available)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, coverage report)
+
+### Custom NFR Assessment (if applicable)
+
+- [ ] Custom NFR 1 assessed against threshold with justification
+- [ ] Custom NFR 2 assessed against threshold with justification
+- [ ] Custom NFR 3 assessed against threshold with justification
+
+---
+
+## Status Classification Validation
+
+### PASS Criteria Verified
+
+- [ ] Evidence exists for PASS status
+- [ ] Evidence meets or exceeds threshold
+- [ ] No concerns flagged in evidence
+- [ ] Quality is acceptable
+
+### CONCERNS Criteria Verified
+
+- [ ] Threshold is UNKNOWN (documented) OR
+- [ ] Evidence is MISSING or INCOMPLETE (documented) OR
+- [ ] Evidence is close to threshold (within 10%, documented) OR
+- [ ] Evidence shows intermittent issues (documented)
+
+### FAIL Criteria Verified
+
+- [ ] Evidence exists BUT does not meet threshold (documented) OR
+- [ ] Critical evidence is MISSING (documented) OR
+- [ ] Evidence shows consistent failures (documented) OR
+- [ ] Quality is unacceptable (documented)
+
+### No Threshold Guessing
+
+- [ ] All thresholds are either defined or marked as UNKNOWN
+- [ ] No thresholds were guessed or inferred
+- [ ] All UNKNOWN thresholds result in CONCERNS status
+
+---
+
+## Quick Wins and Recommended Actions
+
+### Quick Wins Identified
+
+- [ ] Low-effort, high-impact improvements identified for CONCERNS/FAIL
+- [ ] Configuration changes (no code changes) identified
+- [ ] Optimization opportunities identified (caching, indexing, compression)
+- [ ] Monitoring additions identified (detect issues before failures)
+
+### Recommended Actions
+
+- [ ] Specific remediation steps provided (not generic advice)
+- [ ] Priority assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Estimated effort provided (hours, days)
+- [ ] Owner suggestions provided (dev, ops, security)
+
+### Monitoring Hooks
+
+- [ ] Performance monitoring suggested (APM, synthetic monitoring)
+- [ ] Error tracking suggested (Sentry, Rollbar, error logs)
+- [ ] Security monitoring suggested (intrusion detection, audit logs)
+- [ ] Alerting thresholds suggested (notify before breach)
+
+### Fail-Fast Mechanisms
+
+- [ ] Circuit breakers suggested for reliability
+- [ ] Rate limiting suggested for performance
+- [ ] Validation gates suggested for security
+- [ ] Smoke tests suggested for maintainability
+
+---
+
+## Deliverables Generated
+
+### NFR Assessment Report
+
+- [ ] File created at `{output_folder}/nfr-assessment.md`
+- [ ] Template from `nfr-report-template.md` used
+- [ ] Executive summary included (overall status, critical issues)
+- [ ] Assessment by category included (performance, security, reliability, maintainability)
+- [ ] Evidence for each NFR documented
+- [ ] Status classifications documented (PASS/CONCERNS/FAIL)
+- [ ] Findings summary included (PASS count, CONCERNS count, FAIL count)
+- [ ] Quick wins section included
+- [ ] Recommended actions section included
+- [ ] Evidence gaps checklist included
+
+### Gate YAML Snippet (if enabled)
+
+- [ ] YAML snippet generated
+- [ ] Date included
+- [ ] Categories status included (performance, security, reliability, maintainability)
+- [ ] Overall status included (PASS/CONCERNS/FAIL)
+- [ ] Issue counts included (critical, high, medium, concerns)
+- [ ] Blockers flag included (true/false)
+- [ ] Recommendations included
+
+### Evidence Checklist (if enabled)
+
+- [ ] All NFRs with MISSING or INCOMPLETE evidence listed
+- [ ] Owners assigned for evidence collection
+- [ ] Suggested evidence sources provided
+- [ ] Deadlines set for evidence collection
+
+### Updated Story File (if enabled and requested)
+
+- [ ] "NFR Assessment" section added to story markdown
+- [ ] Link to NFR assessment report included
+- [ ] Overall status and critical issues included
+- [ ] Gate status included
+
+---
+
+## Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All NFR categories assessed (none skipped)
+- [ ] All thresholds documented (defined or UNKNOWN)
+- [ ] All evidence sources documented (file paths, metric names)
+- [ ] Status classifications are deterministic and consistent
+- [ ] No false positives (status correctly assigned)
+- [ ] No false negatives (all issues identified)
+
+### Completeness Checks
+
+- [ ] All NFR categories covered (performance, security, reliability, maintainability, custom)
+- [ ] All evidence sources checked (test results, metrics, logs, CI results)
+- [ ] All status types used appropriately (PASS, CONCERNS, FAIL)
+- [ ] All NFRs with CONCERNS/FAIL have recommendations
+- [ ] All evidence gaps have owners and deadlines
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Remediation steps are clear and actionable
+- [ ] Priorities are assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Effort estimates are provided (hours, days)
+- [ ] Owners are suggested (dev, ops, security)
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- [ ] Tech spec loaded for NFR requirements and thresholds
+- [ ] Performance targets extracted
+- [ ] Security requirements extracted
+- [ ] Reliability SLAs extracted
+- [ ] Architectural decisions considered
+
+### With test-design.md
+
+- [ ] Test design loaded for NFR test plan
+- [ ] Test priorities referenced (P0/P1/P2/P3)
+- [ ] Assessment aligned with planned NFR validation
+
+### With PRD.md
+
+- [ ] PRD loaded for product-level NFR context
+- [ ] User experience goals considered
+- [ ] Unstated requirements checked
+- [ ] Product-level SLAs referenced
+
+---
+
+## Quality Gates Validation
+
+### Release Blocker (FAIL)
+
+- [ ] Critical NFR status checked (security, reliability)
+- [ ] Performance failures assessed for user impact
+- [ ] Release blocker flagged if critical NFR has FAIL status
+
+### PR Blocker (HIGH CONCERNS)
+
+- [ ] High-priority NFR status checked
+- [ ] Multiple CONCERNS assessed
+- [ ] PR blocker flagged if HIGH priority issues exist
+
+### Warning (CONCERNS)
+
+- [ ] Any NFR with CONCERNS status flagged
+- [ ] Missing or incomplete evidence documented
+- [ ] Warning issued to address before next release
+
+### Pass (PASS)
+
+- [ ] All NFRs have PASS status
+- [ ] No blockers or concerns exist
+- [ ] Ready for release confirmed
+
+---
+
+## Non-Prescriptive Validation
+
+- [ ] NFR categories adapted to team needs
+- [ ] Thresholds appropriate for project context
+- [ ] Assessment criteria customized as needed
+- [ ] Teams can extend with custom NFR categories
+- [ ] Integration with external tools supported (New Relic, Datadog, SonarQube, JIRA)
+
+---
+
+## Documentation and Communication
+
+- [ ] NFR assessment report is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Overall status is prominent and unambiguous
+- [ ] Executive summary provides quick understanding
+
+---
+
+## Final Validation
+
+- [ ] All prerequisites met
+- [ ] All NFR categories assessed with evidence (or gaps documented)
+- [ ] No thresholds were guessed (all defined or UNKNOWN)
+- [ ] Status classifications are deterministic and justified
+- [ ] Quick wins identified for all CONCERNS/FAIL
+- [ ] Recommended actions are specific and actionable
+- [ ] Evidence gaps documented with owners and deadlines
+- [ ] NFR assessment report generated and saved
+- [ ] Gate YAML snippet generated (if enabled)
+- [ ] Evidence checklist generated (if enabled)
+- [ ] Workflow completed successfully
+
+---
+
+## Sign-Off
+
+**NFR Assessment Status:**
+
+- [ ] ✅ PASS - All NFRs meet requirements, ready for release
+- [ ] ⚠️ CONCERNS - Some NFRs have concerns, address before next release
+- [ ] ❌ FAIL - Critical NFRs not met, BLOCKER for release
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Critical Issues:** {COUNT}
+**High Priority Issues:** {COUNT}
+**Concerns:** {COUNT}
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/instructions.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/instructions.md
@@ -1,39 +1,722 @@
-<!-- Powered by BMAD-CORE™ -->
+# Non-Functional Requirements Assessment - Instructions v4.0

-# NFR Assessment v3.0
+**Workflow:** `testarch-nfr`
+**Purpose:** Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)

-```xml
-<task id="bmad/bmm/testarch/nfr-assess" name="NFR Assessment">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Implementation is deployed locally or accessible for evaluation.</i>
-    <i>- Non-functional goals/SLAs are defined or discoverable.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm prerequisites; halt if targets are unknown and cannot be clarified.</action>
-    </step>
-    <step n="2" title="Assess NFRs">
-      <action>Identify which NFRs to assess (default: Security, Performance, Reliability, Maintainability).</action>
-      <action>Gather thresholds from story/architecture/technical preferences; mark unknown targets.</action>
-      <action>Inspect evidence (tests, telemetry, logs) for each NFR and classify status using deterministic PASS/CONCERNS/FAIL rules.</action>
-      <action>List quick wins and recommended actions for any concerns/failures.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce NFR assessment markdown summarizing evidence, status, and actions; update gate YAML block with NFR findings; compile checklist of evidence gaps and owners.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If NFR targets are undefined and cannot be obtained, halt and request definition.</i>
-  </halt>
-  <notes>
-    <i>Load the `nfr-criteria`, `ci-burn-in`, and relevant fragments via `{project-root}/bmad/bmm/testarch/tea-index.csv` to ground the assessment.</i>
-    <i>Unknown thresholds default to CONCERNS—never guess.</i>
-    <i>Ensure every NFR has evidence or call it out explicitly.</i>
-    <i>Suggest monitoring hooks and fail-fast mechanisms when gaps exist.</i>
-  </notes>
-  <output>
-    <i>NFR assessment report with actionable follow-ups and gate snippet.</i>
-  </output>
-</task>
+---
+
+## Overview
+
+This workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Capabilities:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate gate-ready YAML snippets for CI/CD integration
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- If NFR targets are undefined and cannot be obtained, halt and request definition
+- If implementation is not accessible for evaluation, halt and request deployment
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `nfr-criteria.md` - Non-functional requirements criteria and thresholds (security, performance, reliability, maintainability with code examples, 658 lines, 4 examples)
+   - `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation (10-iteration detection, sharding, selective execution, 678 lines, 4 examples)
+   - `test-quality.md` - Test quality expectations for maintainability (deterministic, isolated, explicit assertions, length/time limits, 658 lines, 5 examples)
+   - `playwright-config.md` - Performance configuration patterns: parallelization, timeout standards, artifact output (722 lines, 5 examples)
+   - `error-handling.md` - Reliability validation patterns: scoped exceptions, retry validation, telemetry logging, graceful degradation (736 lines, 4 examples)
+
+2. Read story file (if provided):
+   - Extract NFR requirements
+   - Identify specific thresholds or SLAs
+   - Note any custom NFR categories
+
+3. Read related BMad artifacts (if available):
+   - `tech-spec.md` - Technical NFR requirements and targets
+   - `PRD.md` - Product-level NFR context (user expectations)
+   - `test-design.md` - NFR test plan and priorities
+
+**Output:** Complete understanding of NFR targets, evidence sources, and validation criteria
+
+---
+
+### Step 2: Identify NFR Categories and Thresholds
+
+**Actions:**
+
+1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
+   - **Performance**: Response time, throughput, resource usage
+   - **Security**: Authentication, authorization, data protection, vulnerability scanning
+   - **Reliability**: Error handling, recovery, availability, fault tolerance
+   - **Maintainability**: Code quality, test coverage, documentation, technical debt
+
+2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
+
+3. Gather thresholds for each NFR:
+   - From tech-spec.md (primary source)
+   - From PRD.md (product-level SLAs)
+   - From story file (feature-specific requirements)
+   - From workflow variables (default thresholds)
+   - Mark thresholds as UNKNOWN if not defined
+
+4. Never guess thresholds - if a threshold is unknown, mark the NFR as CONCERNS
+
+**Output:** Complete list of NFRs to assess with defined (or UNKNOWN) thresholds
+
+---
+
+### Step 3: Gather Evidence
+
+**Actions:**
+
+1. For each NFR category, discover evidence sources:
+
+   **Performance Evidence:**
+   - Load test results (JMeter, k6, Lighthouse)
+   - Application metrics (response times, throughput, resource usage)
+   - Performance monitoring data (New Relic, Datadog, APM)
+   - Playwright performance traces (if applicable)
+
+   **Security Evidence:**
+   - Security scan results (SAST, DAST, dependency scanning)
+   - Authentication/authorization test results
+   - Penetration test reports
+   - Vulnerability assessment reports
+   - Compliance audit results
+
+   **Reliability Evidence:**
+   - Error logs and error rates
+   - Uptime monitoring data
+   - Chaos engineering test results
+   - Failover/recovery test results
+   - CI burn-in results (stability over time)
+
+   **Maintainability Evidence:**
+   - Code coverage reports (Istanbul, NYC, c8)
+   - Static analysis results (ESLint, SonarQube)
+   - Technical debt metrics
+   - Documentation completeness
+   - Test quality assessment (from test-review workflow)
+
+2. Read relevant files from evidence directories:
+   - `{test_results_dir}` for test execution results
+   - `{metrics_dir}` for application metrics
+   - `{logs_dir}` for application logs
+   - CI/CD pipeline results (if `include_ci_results` is true)
+
+3. Mark NFRs without evidence as "NO EVIDENCE" - never infer or assume
+
+**Output:** Comprehensive evidence inventory for each NFR
+
+---
+
+### Step 4: Assess NFRs with Deterministic Rules
+
+**Actions:**
+
+1. For each NFR, apply deterministic PASS/CONCERNS/FAIL rules:
+
+   **PASS Criteria:**
+   - Evidence exists AND meets defined threshold
+   - No concerns flagged in evidence
+   - Example: Response time is 350ms (threshold: 500ms) → PASS
+
+   **CONCERNS Criteria:**
+   - Threshold is UNKNOWN (not defined)
+   - Evidence is MISSING or INCOMPLETE
+   - Evidence is close to threshold (within 10%)
+   - Evidence shows intermittent issues
+   - Example: Response time is 480ms (threshold: 500ms, 96% of threshold) → CONCERNS
+
+   **FAIL Criteria:**
+   - Evidence exists BUT does not meet threshold
+   - Critical evidence is MISSING
+   - Evidence shows consistent failures
+   - Example: Response time is 750ms (threshold: 500ms) → FAIL
+
+2. Document findings for each NFR:
+   - Status (PASS/CONCERNS/FAIL)
+   - Evidence source (file path, test name, metric name)
+   - Actual value vs threshold
+   - Justification for status classification
+
+3. Classify severity based on category:
+   - **CRITICAL**: Security failures, reliability failures (affect users immediately)
+   - **HIGH**: Performance failures, maintainability failures (affect users soon)
+   - **MEDIUM**: Concerns without failures (may affect users eventually)
+   - **LOW**: Missing evidence for non-critical NFRs
+
+**Output:** Complete NFR assessment with deterministic status classifications
+
+---
+
+### Step 5: Identify Quick Wins and Recommended Actions
+
+**Actions:**
+
+1. For each NFR with CONCERNS or FAIL status, identify quick wins:
+   - Low-effort, high-impact improvements
+   - Configuration changes (no code changes needed)
+   - Optimization opportunities (caching, indexing, compression)
+   - Monitoring additions (detect issues before they become failures)
+
+2. Provide recommended actions for each issue:
+   - Specific steps to remediate (not generic advice)
+   - Priority (CRITICAL, HIGH, MEDIUM, LOW)
+   - Estimated effort (hours, days)
+   - Owner suggestion (dev, ops, security)
+
+3. Suggest monitoring hooks for gaps:
+   - Add performance monitoring (APM, synthetic monitoring)
+   - Add error tracking (Sentry, Rollbar, error logs)
+   - Add security monitoring (intrusion detection, audit logs)
+   - Add alerting thresholds (notify before thresholds are breached)
+
+4. Suggest fail-fast mechanisms:
+   - Add circuit breakers for reliability
+   - Add rate limiting for performance
+   - Add validation gates for security
+   - Add smoke tests for maintainability
+
+**Output:** Actionable remediation plan with prioritized recommendations
+
+---
+
+### Step 6: Generate Deliverables
+
+**Actions:**
+
+1. Create NFR assessment markdown file:
+   - Use template from `nfr-report-template.md`
+   - Include executive summary (overall status, critical issues)
+   - Add NFR-by-NFR assessment (status, evidence, thresholds)
+   - Add findings summary (PASS count, CONCERNS count, FAIL count)
+   - Add quick wins section
+   - Add recommended actions section
+   - Add evidence gaps checklist
+   - Save to `{output_folder}/nfr-assessment.md`
+
+2. Generate gate YAML snippet (if enabled):
+
+   ```yaml
+   nfr_assessment:
+     date: '2025-10-14'
+     categories:
+       performance: 'PASS'
+       security: 'CONCERNS'
+       reliability: 'PASS'
+       maintainability: 'PASS'
+     overall_status: 'CONCERNS'
+     critical_issues: 0
+     high_priority_issues: 1
+     concerns: 2
+     blockers: false
+   ```
+
+3. Generate evidence checklist (if enabled):
+   - List all NFRs with MISSING or INCOMPLETE evidence
+   - Assign owners for evidence collection
+   - Suggest evidence sources (tests, metrics, logs)
+   - Set deadlines for evidence collection
+
+4. Update story file (if enabled and requested):
+   - Add "NFR Assessment" section to story markdown
+   - Link to NFR assessment report
+   - Include overall status and critical issues
+   - Add gate status
+
+**Output:** Complete NFR assessment documentation ready for review and CI/CD integration
+
+---
+
+## Non-Prescriptive Approach
+
+**Minimal Examples:** This workflow provides principles and patterns, not rigid templates. Teams should adapt NFR categories, thresholds, and assessment criteria to their needs.
+
+**Key Patterns to Follow:**
+
+- Use evidence-based validation (no guessing or inference)
+- Apply deterministic rules (consistent PASS/CONCERNS/FAIL classification)
+- Never guess thresholds (mark as CONCERNS if unknown)
+- Provide actionable recommendations (specific steps, not generic advice)
+- Generate gate-ready artifacts (YAML snippets for CI/CD)
+
+**Extend as Needed:**
+
+- Add custom NFR categories (accessibility, internationalization, compliance)
+- Integrate with external tools (New Relic, Datadog, SonarQube, JIRA)
+- Add custom thresholds and rules
+- Link to external assessment systems
+
+---
+
+## NFR Categories and Criteria
+
+### Performance
+
+**Criteria:**
+
+- Response time (p50, p95, p99 percentiles)
+- Throughput (requests per second, transactions per second)
+- Resource usage (CPU, memory, disk, network)
+- Scalability (horizontal, vertical)
+
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70% average
+- Memory usage: < 80% max
+
+**Evidence Sources:**
+
+- Load test results (JMeter, k6, Gatling)
+- APM data (New Relic, Datadog, Dynatrace)
+- Lighthouse reports (for web apps)
+- Playwright performance traces
+
+---
+
+### Security
+
+**Criteria:**
+
+- Authentication (login security, session management)
+- Authorization (access control, permissions)
+- Data protection (encryption, PII handling)
+- Vulnerability management (SAST, DAST, dependency scanning)
+- Compliance (GDPR, HIPAA, PCI-DSS)
+
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- Authentication strength: MFA enabled
+
+**Evidence Sources:**
+
+- SAST results (SonarQube, Checkmarx, Veracode)
+- DAST results (OWASP ZAP, Burp Suite)
+- Dependency scanning (Snyk, Dependabot, npm audit)
+- Penetration test reports
+- Security audit logs
+
+---
+
+### Reliability
+
+**Criteria:**
+
+- Availability (uptime percentage)
+- Error handling (graceful degradation, error recovery)
+- Fault tolerance (redundancy, failover)
+- Disaster recovery (backup, restore, RTO/RPO)
+- Stability (CI burn-in, chaos engineering)
+
+**Thresholds (Default):**
+
+- Uptime: >= 99.9% (three nines)
+- Error rate: < 0.1% (1 in 1000 requests)
+- MTTR (Mean Time To Recovery): < 15 minutes
+- CI burn-in: 100 consecutive successful runs
+
+**Evidence Sources:**
+
+- Uptime monitoring (Pingdom, UptimeRobot, StatusCake)
+- Error logs and error rates
+- CI burn-in results (see `ci-burn-in.md`)
+- Chaos engineering test results (Chaos Monkey, Gremlin)
+- Incident reports and postmortems
+
+---
+
+### Maintainability
+
+**Criteria:**
+
+- Code quality (complexity, duplication, code smells)
+- Test coverage (unit, integration, E2E)
+- Documentation (code comments, README, architecture docs)
+- Technical debt (debt ratio, code churn)
+- Test quality (from test-review workflow)
+
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality score: >= 85/100
+- Technical debt ratio: < 5%
+- Documentation completeness: >= 90%
+
+**Evidence Sources:**
+
+- Coverage reports (Istanbul, NYC, c8, JaCoCo)
+- Static analysis (ESLint, SonarQube, CodeClimate)
+- Documentation audit (manual or automated)
+- Test review report (from test-review workflow)
+- Git metrics (code churn, commit frequency)
+
+---
+
+## Deterministic Assessment Rules
+
+### PASS Rules
+
+- Evidence exists
+- Evidence meets or exceeds threshold
+- No concerns flagged
+- Quality is acceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 350ms p95
+Status: PASS ✅
+```
+
+---
+
+### CONCERNS Rules
+
+- Threshold is UNKNOWN
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+- Quality is marginal
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 480ms p95 (96% of threshold)
+Status: CONCERNS ⚠️
+Recommendation: Optimize before production - very close to threshold
+```
+
+---
+
+### FAIL Rules
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 750ms p95 (150% of threshold)
+Status: FAIL ❌
+Recommendation: BLOCKER - optimize performance before release
+```
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- Primary source for NFR requirements and thresholds
+- Load performance targets, security requirements, reliability SLAs
+- Use architectural decisions to understand NFR trade-offs
+
+### With test-design.md
+
+- Understand NFR test plan and priorities
+- Reference test priorities (P0/P1/P2/P3) for severity classification
+- Align assessment with planned NFR validation
+
+### With PRD.md
+
+- Understand product-level NFR expectations
+- Verify NFRs align with user experience goals
+- Check for unstated NFR requirements (implied by product goals)
+
+---
+
+## Quality Gates
+
+### Release Blocker (FAIL)
+
+- Critical NFR has FAIL status (security, reliability)
+- Performance failure affects user experience severely
+- Do not release until FAIL is resolved
+
+### PR Blocker (HIGH CONCERNS)
+
+- High-priority NFR has FAIL status
+- Multiple CONCERNS exist
+- Block PR merge until addressed
+
+### Warning (CONCERNS)
+
+- Any NFR has CONCERNS status
+- Evidence is missing or incomplete
+- Address before next release
+
+### Pass (PASS)
+
+- All NFRs have PASS status
+- No blockers or concerns
+- Ready for release
+
+---
+
+## Example NFR Assessment
+
+````markdown
+# NFR Assessment - Story 1.3
+
+**Feature:** User Authentication
+**Date:** 2025-10-14
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Executive Summary
+
+**Assessment:** 3 PASS, 1 CONCERNS, 0 FAIL
+**Blockers:** None
+**High Priority Issues:** 1 (Security - MFA not enforced)
+**Recommendation:** Address security concern before release
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** PASS ✅
+- **Threshold:** 500ms
+- **Actual:** 320ms (64% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** Response time well below threshold across all percentiles
+
+### Throughput
+
+- **Status:** PASS ✅
+- **Threshold:** 100 RPS
+- **Actual:** 250 RPS (250% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** System handles 2.5x target load without degradation
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Findings:** MFA is implemented but not enforced by default
+- **Recommendation:** HIGH - Enforce MFA for all new accounts, provide migration path for existing users
+
+### Data Protection
+
+- **Status:** PASS ✅
+- **Threshold:** PII encrypted at rest and in transit
+- **Actual:** AES-256 at rest, TLS 1.3 in transit
+- **Evidence:** Security scan (security-scan-2025-10-14.json)
+- **Findings:** All PII properly encrypted
+
+## Reliability Assessment
+
+### Uptime
+
+- **Status:** PASS ✅
+- **Threshold:** 99.9% (three nines)
+- **Actual:** 99.95% over 30 days
+- **Evidence:** Uptime monitoring (uptime-report-2025-10-14.csv)
+- **Findings:** Exceeds target with margin
+
+### Error Rate
+
+- **Status:** PASS ✅
+- **Threshold:** < 0.1% (1 in 1000)
+- **Actual:** 0.05% (1 in 2000)
+- **Evidence:** Error logs (logs/errors-2025-10.log)
+- **Findings:** Error rate well below threshold
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** PASS ✅
+- **Threshold:** >= 80%
+- **Actual:** 87%
+- **Evidence:** Coverage report (coverage/lcov-report/index.html)
+- **Findings:** Coverage exceeds threshold with good distribution
+
+### Code Quality
+
+- **Status:** PASS ✅
+- **Threshold:** >= 85/100
+- **Actual:** 92/100
+- **Evidence:** SonarQube analysis (sonarqube-report-2025-10-14.pdf)
+- **Findings:** High code quality score with low technical debt
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA for new accounts
+   - No code changes needed, only config adjustment
+
+## Recommended Actions
+
+### Immediate (Before Release)
+
+1. **Enforce MFA for all new accounts** - HIGH - 4 hours - Security Team
+   - Add `ENFORCE_MFA=true` to production config
+   - Update user onboarding flow to require MFA setup
+   - Test MFA enforcement in staging environment
+
+### Short-term (Next Sprint)
+
+1. **Migrate existing users to MFA** - MEDIUM - 3 days - Product + Engineering
+   - Design migration UX (prompt, incentives, deadline)
+   - Implement migration flow with grace period
+   - Communicate migration to existing users
+
+## Evidence Gaps
+
+- [ ] Chaos engineering test results (reliability)
+  - Owner: DevOps Team
+  - Deadline: 2025-10-21
+  - Suggested evidence: Run chaos monkey tests in staging
+
+- [ ] Penetration test report (security)
+  - Owner: Security Team
+  - Deadline: 2025-10-28
+  - Suggested evidence: Schedule third-party pentest
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  story_id: '1.3'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  medium_priority_issues: 0
+  concerns: 1
+  blockers: false
+  recommendations:
+    - 'Enforce MFA for all new accounts (HIGH - 4 hours)'
+  evidence_gaps: 2
+```
+````
+
+## Recommendations Summary
+
+- **Release Blocker:** None ✅
+- **High Priority:** 1 (Enforce MFA before release)
+- **Medium Priority:** 1 (Migrate existing users to MFA)
+- **Next Steps:** Address HIGH priority item, then proceed to gate workflow
+
+```
+
+---
+
+## Validation Checklist
+
+Before completing this workflow, verify:
+
+- ✅ All NFR categories assessed (performance, security, reliability, maintainability, custom)
+- ✅ Thresholds defined or marked as UNKNOWN
+- ✅ Evidence gathered for each NFR (or marked as MISSING)
+- ✅ Status classified deterministically (PASS/CONCERNS/FAIL)
+- ✅ No thresholds were guessed (marked as CONCERNS if unknown)
+- ✅ Quick wins identified for CONCERNS/FAIL
+- ✅ Recommended actions are specific and actionable
+- ✅ Evidence gaps documented with owners and deadlines
+- ✅ NFR assessment report generated and saved
+- ✅ Gate YAML snippet generated (if enabled)
+- ✅ Evidence checklist generated (if enabled)
+
+---
+
+## Notes
+
+- **Never Guess Thresholds:** If a threshold is unknown, mark as CONCERNS and recommend defining it
+- **Evidence-Based:** Every assessment must be backed by evidence (tests, metrics, logs, CI results)
+- **Deterministic Rules:** Use consistent PASS/CONCERNS/FAIL classification based on evidence
+- **Actionable Recommendations:** Provide specific steps, not generic advice
+- **Gate Integration:** Generate YAML snippets that can be consumed by CI/CD pipelines
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned
+
+### "FAIL status blocks release"
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Related Workflows
+
+- **testarch-test-design** - Define NFR requirements and test plan
+- **testarch-framework** - Set up performance/security testing frameworks
+- **testarch-ci** - Configure CI/CD for NFR validation
+- **testarch-gate** - Use NFR assessment as input for quality gate decisions
+- **testarch-test-review** - Review test quality (maintainability NFR)
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
 ```
--- a/src/modules/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
@@ -0,0 +1,443 @@
+# NFR Assessment - {FEATURE_NAME}
+
+**Date:** {DATE}
+**Story:** {STORY_ID} (if applicable)
+**Overall Status:** {OVERALL_STATUS} {STATUS_ICON}
+
+---
+
+## Executive Summary
+
+**Assessment:** {PASS_COUNT} PASS, {CONCERNS_COUNT} CONCERNS, {FAIL_COUNT} FAIL
+
+**Blockers:** {BLOCKER_COUNT} {BLOCKER_DESCRIPTION}
+
+**High Priority Issues:** {HIGH_PRIORITY_COUNT} {HIGH_PRIORITY_DESCRIPTION}
+
+**Recommendation:** {OVERALL_RECOMMENDATION}
+
+---
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Throughput
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Resource Usage
+
+- **CPU Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **Memory Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+### Scalability
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+- **Recommendation:** {RECOMMENDATION} (if CONCERNS or FAIL)
+
+### Authorization Controls
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Data Protection
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Vulnerability Management
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION} (e.g., "0 critical, <3 high vulnerabilities")
+- **Actual:** {ACTUAL_DESCRIPTION} (e.g., "0 critical, 1 high, 5 medium vulnerabilities")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Snyk scan results - scan-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Compliance (if applicable)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Standards:** {COMPLIANCE_STANDARDS} (e.g., "GDPR, HIPAA, PCI-DSS")
+- **Actual:** {ACTUAL_COMPLIANCE_STATUS}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Reliability Assessment
+
+### Availability (Uptime)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "99.9%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "99.95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Uptime monitoring - uptime-report-2025-10-14.csv")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Error Rate
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<0.1%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "0.05%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Error logs - logs/errors-2025-10.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### MTTR (Mean Time To Recovery)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<15 minutes")
+- **Actual:** {ACTUAL_VALUE} (e.g., "12 minutes")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Incident reports - incidents/")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Fault Tolerance
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### CI Burn-In (Stability)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "100 consecutive successful runs")
+- **Actual:** {ACTUAL_VALUE} (e.g., "150 consecutive successful runs")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CI burn-in results - ci-burn-in-2025-10-14.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Disaster Recovery (if applicable)
+
+- **RTO (Recovery Time Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **RPO (Recovery Point Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+---
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=80%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "87%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Coverage report - coverage/lcov-report/index.html")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Code Quality
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=85/100")
+- **Actual:** {ACTUAL_VALUE} (e.g., "92/100")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "SonarQube analysis - sonarqube-report-2025-10-14.pdf")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Technical Debt
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<5% debt ratio")
+- **Actual:** {ACTUAL_VALUE} (e.g., "3.2% debt ratio")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CodeClimate analysis - codeclimate-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Documentation Completeness
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=90%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Documentation audit - docs-audit-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Test Quality (from test-review, if available)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Test review report - test-review-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Custom NFR Assessments (if applicable)
+
+### {CUSTOM_NFR_NAME_1}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### {CUSTOM_NFR_NAME_2}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Quick Wins
+
+{QUICK_WIN_COUNT} quick wins identified for immediate implementation:
+
+1. **{QUICK_WIN_TITLE_1}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+   - No code changes needed / Minimal code changes
+
+2. **{QUICK_WIN_TITLE_2}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+
+---
+
+## Recommended Actions
+
+### Immediate (Before Release) - CRITICAL/HIGH Priority
+
+1. **{ACTION_TITLE_1}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+2. **{ACTION_TITLE_2}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+### Short-term (Next Sprint) - MEDIUM Priority
+
+1. **{ACTION_TITLE_3}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+2. **{ACTION_TITLE_4}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+### Long-term (Backlog) - LOW Priority
+
+1. **{ACTION_TITLE_5}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+---
+
+## Monitoring Hooks
+
+{MONITORING_HOOK_COUNT} monitoring hooks recommended to detect issues before failures:
+
+### Performance Monitoring
+
+- [ ] {MONITORING_TOOL_1} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+- [ ] {MONITORING_TOOL_2} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Security Monitoring
+
+- [ ] {MONITORING_TOOL_3} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Reliability Monitoring
+
+- [ ] {MONITORING_TOOL_4} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Alerting Thresholds
+
+- [ ] {ALERT_DESCRIPTION} - Notify when {THRESHOLD_CONDITION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+---
+
+## Fail-Fast Mechanisms
+
+{FAIL_FAST_COUNT} fail-fast mechanisms recommended to prevent failures:
+
+### Circuit Breakers (Reliability)
+
+- [ ] {CIRCUIT_BREAKER_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Rate Limiting (Performance)
+
+- [ ] {RATE_LIMITING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Validation Gates (Security)
+
+- [ ] {VALIDATION_GATE_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Smoke Tests (Maintainability)
+
+- [ ] {SMOKE_TEST_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+---
+
+## Evidence Gaps
+
+{EVIDENCE_GAP_COUNT} evidence gaps identified - action required:
+
+- [ ] **{NFR_NAME_1}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+- [ ] **{NFR_NAME_2}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+---
+
+## Findings Summary
+
+| Category        | PASS             | CONCERNS             | FAIL             | Overall Status                      |
+| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
+| Performance     | {P_PASS_COUNT}   | {P_CONCERNS_COUNT}   | {P_FAIL_COUNT}   | {P_STATUS} {P_ICON}                 |
+| Security        | {S_PASS_COUNT}   | {S_CONCERNS_COUNT}   | {S_FAIL_COUNT}   | {S_STATUS} {S_ICON}                 |
+| Reliability     | {R_PASS_COUNT}   | {R_CONCERNS_COUNT}   | {R_FAIL_COUNT}   | {R_STATUS} {R_ICON}                 |
+| Maintainability | {M_PASS_COUNT}   | {M_CONCERNS_COUNT}   | {M_FAIL_COUNT}   | {M_STATUS} {M_ICON}                 |
+| **Total**       | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
+
+---
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '{DATE}'
+  story_id: '{STORY_ID}'
+  feature_name: '{FEATURE_NAME}'
+  categories:
+    performance: '{PERFORMANCE_STATUS}'
+    security: '{SECURITY_STATUS}'
+    reliability: '{RELIABILITY_STATUS}'
+    maintainability: '{MAINTAINABILITY_STATUS}'
+  overall_status: '{OVERALL_STATUS}'
+  critical_issues: { CRITICAL_COUNT }
+  high_priority_issues: { HIGH_COUNT }
+  medium_priority_issues: { MEDIUM_COUNT }
+  concerns: { CONCERNS_COUNT }
+  blockers: { BLOCKER_BOOLEAN } # true/false
+  quick_wins: { QUICK_WIN_COUNT }
+  evidence_gaps: { EVIDENCE_GAP_COUNT }
+  recommendations:
+    - '{RECOMMENDATION_1}'
+    - '{RECOMMENDATION_2}'
+    - '{RECOMMENDATION_3}'
+```
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH} (if applicable)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **PRD:** {PRD_PATH} (if available)
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Evidence Sources:**
+  - Test Results: {TEST_RESULTS_DIR}
+  - Metrics: {METRICS_DIR}
+  - Logs: {LOGS_DIR}
+  - CI Results: {CI_RESULTS_PATH}
+
+---
+
+## Recommendations Summary
+
+**Release Blocker:** {RELEASE_BLOCKER_SUMMARY}
+
+**High Priority:** {HIGH_PRIORITY_SUMMARY}
+
+**Medium Priority:** {MEDIUM_PRIORITY_SUMMARY}
+
+**Next Steps:** {NEXT_STEPS_DESCRIPTION}
+
+---
+
+## Sign-Off
+
+**NFR Assessment:**
+
+- Overall Status: {OVERALL_STATUS} {OVERALL_ICON}
+- Critical Issues: {CRITICAL_COUNT}
+- High Priority Issues: {HIGH_COUNT}
+- Concerns: {CONCERNS_COUNT}
+- Evidence Gaps: {EVIDENCE_GAP_COUNT}
+
+**Gate Status:** {GATE_STATUS} {GATE_ICON}
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Generated:** {DATE}
+**Workflow:** testarch-nfr v4.0
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/workflow.yaml
@@ -1,25 +1,107 @@
 # Test Architect workflow: nfr-assess
 name: testarch-nfr
-description: "Assess non-functional requirements before release."
+description: "Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/nfr-assess"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/nfr-report-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target specification
+  story_file: "" # Path to story markdown (optional)
+  feature_name: "" # Feature to assess (if no story file)
+
+  # NFR categories to assess
+  assess_performance: true # Response time, throughput, resource usage
+  assess_security: true # Authentication, authorization, data protection
+  assess_reliability: true # Error handling, recovery, availability
+  assess_maintainability: true # Code quality, test coverage, documentation
+
+  # Custom NFR categories (comma-separated)
+  custom_nfr_categories: "" # e.g., "accessibility,internationalization,compliance"
+
+  # Evidence sources
+  test_results_dir: "{project-root}/test-results"
+  metrics_dir: "{project-root}/metrics"
+  logs_dir: "{project-root}/logs"
+  include_ci_results: true # Analyze CI/CD pipeline results
+
+  # Thresholds (can be overridden)
+  performance_response_time_ms: 500 # Target response time
+  performance_throughput_rps: 100 # Target requests per second
+  security_score_min: 85 # Minimum security score (0-100)
+  reliability_uptime_pct: 99.9 # Target uptime percentage
+  maintainability_coverage_pct: 80 # Minimum test coverage
+
+  # Assessment configuration
+  use_deterministic_rules: true # PASS/CONCERNS/FAIL based on evidence
+  never_guess_thresholds: true # Mark as CONCERNS if threshold unknown
+  require_evidence: true # Every NFR must have evidence or be called out
+  suggest_monitoring: true # Recommend monitoring hooks for gaps
+
+  # Integration with BMad artifacts
+  use_tech_spec: true # Load tech-spec.md for NFR requirements
+  use_prd: true # Load PRD.md for NFR context
+  use_test_design: true # Load test-design.md for NFR test plan
+
+  # Output configuration
+  output_file: "{output_folder}/nfr-assessment.md"
+  generate_gate_yaml: true # Create gate YAML snippet with NFR status
+  generate_evidence_checklist: true # Create checklist of evidence gaps
+  update_story_file: false # Add NFR section to story (optional)
+
+  # Quality gates
+  fail_on_critical_nfr: true # Fail if critical NFR has FAIL status
+  warn_on_concerns: true # Warn if any NFR has CONCERNS status
+  block_release_on_fail: true # Block release if NFR assessment fails
+
+  # Advanced options
+  auto_load_knowledge: true # Load nfr-criteria, ci-burn-in fragments
+  include_quick_wins: true # Suggest quick wins for concerns/failures
+  include_recommended_actions: true # Provide actionable remediation steps
+
+# Output configuration
+default_output_file: "{output_folder}/nfr-assessment.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test results, metrics, logs, BMad artifacts
+  - write_file # Create NFR assessment, gate YAML, evidence checklist
+  - list_files # Discover test results, metrics, logs
+  - search_repo # Find NFR-related tests and evidence
+  - glob # Find result files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with NFR requirements (optional)"
+  - tech_spec: "Technical specification with NFR targets (recommended)"
+  - test_results: "Test execution results (performance, security, etc.)"
+  - metrics: "Application metrics (response times, error rates, etc.)"
+  - logs: "Application logs for reliability analysis"
+  - ci_results: "CI/CD pipeline results for burn-in validation"

 tags:
  - qa
  - nfr
  - test-architect
+  - performance
+  - security
+  - reliability

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/test-design/README.md
+++ b/src/modules/bmm/workflows/testarch/test-design/README.md
@@ -0,0 +1,493 @@
+# Test Design and Risk Assessment Workflow
+
+Plans comprehensive test coverage strategy with risk assessment (probability × impact scoring), priority classification (P0-P3), and resource estimation. This workflow generates a test design document that identifies high-risk areas, maps requirements to appropriate test levels, and provides execution ordering for optimal feedback.
+
+## Usage
+
+```bash
+bmad tea *test-design
+```
+
+The TEA agent runs this workflow when:
+
+- Planning test coverage before development starts
+- Assessing risks for an epic or story
+- Prioritizing test scenarios by business impact
+- Estimating testing effort and resources
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown**: Acceptance criteria and requirements
+- **PRD or epics.md**: High-level product context
+- **Architecture docs** (optional): Technical constraints and integration points
+
+**Workflow Variables:**
+
+- `epic_num`: Epic number for scoped design
+- `story_path`: Specific story for design (optional)
+- `design_level`: full/targeted/minimal (default: full)
+- `risk_threshold`: Score for high-priority flag (default: 6)
+- `risk_categories`: TECH,SEC,PERF,DATA,BUS,OPS (all enabled)
+- `priority_levels`: P0,P1,P2,P3 (all enabled)
+
+## Outputs
+
+**Primary Deliverable:**
+
+**Test Design Document** (`test-design-epic-{N}.md`):
+
+1. **Risk Assessment Matrix**
+   - Risk ID, category, description
+   - Probability (1-3) × Impact (1-3) = Score
+   - Scores ≥6 flagged as high-priority
+   - Mitigation plans with owners and timelines
+
+2. **Coverage Matrix**
+   - Requirement → Test Level (E2E/API/Component/Unit)
+   - Priority assignment (P0-P3)
+   - Risk linkage
+   - Test count estimates
+
+3. **Execution Order**
+   - Smoke tests (P0 subset, <5 min)
+   - P0 tests (critical paths, <10 min)
+   - P1 tests (important features, <30 min)
+   - P2/P3 tests (full regression, <60 min)
+
+4. **Resource Estimates**
+   - Hours per priority level
+   - Total effort in days
+   - Tooling and data prerequisites
+
+5. **Quality Gate Criteria**
+   - P0 pass rate: 100%
+   - P1 pass rate: ≥95%
+   - High-risk mitigations: 100%
+   - Coverage target: ≥80%
+
+## Key Features
+
+### Risk Scoring Framework
+
+**Probability × Impact = Risk Score**
+
+**Probability** (1-3):
+
+- 1 (Unlikely): <10% chance
+- 2 (Possible): 10-50% chance
+- 3 (Likely): >50% chance
+
+**Impact** (1-3):
+
+- 1 (Minor): Cosmetic, workaround exists
+- 2 (Degraded): Feature impaired, difficult workaround
+- 3 (Critical): System failure, no workaround
+
+**Scores**:
+
+- 1-2: Low risk (monitor)
+- 3-4: Medium risk (plan mitigation)
+- **6-9: High risk** (immediate mitigation required)
+
+### Risk Categories (6 types)
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws, integration failures
+- Scalability issues, technical debt
+
+**SEC** (Security):
+
+- Missing access controls, auth bypass
+- Data exposure, injection vulnerabilities
+
+**PERF** (Performance):
+
+- SLA violations, response time degradation
+- Resource exhaustion, scalability limits
+
+**DATA** (Data Integrity):
+
+- Data loss/corruption, inconsistent state
+- Migration failures
+
+**BUS** (Business Impact):
+
+- UX degradation, business logic errors
+- Revenue impact, compliance violations
+
+**OPS** (Operations):
+
+- Deployment failures, configuration errors
+- Monitoring gaps, rollback issues
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical)** - Run on every commit:
+
+- Blocks core user journey
+- High-risk (score ≥6)
+- Revenue-impacting or security-critical
+
+**P1 (High)** - Run on PR to main:
+
+- Important user features
+- Medium-risk (score 3-4)
+- Common workflows
+
+**P2 (Medium)** - Run nightly/weekly:
+
+- Secondary features
+- Low-risk (score 1-2)
+- Edge cases
+
+**P3 (Low)** - Run on-demand:
+
+- Nice-to-have, exploratory
+- Performance benchmarks
+
+### Test Level Selection
+
+**E2E (End-to-End)**:
+
+- Critical user journeys
+- Multi-system integration
+- Highest confidence, slowest
+
+**API (Integration)**:
+
+- Service contracts
+- Business logic validation
+- Fast feedback, stable
+
+**Component**:
+
+- UI component behavior
+- Visual regression
+- Fast, isolated
+
+**Unit**:
+
+- Business logic, edge cases
+- Error handling
+- Fastest, most granular
+
+**Key principle**: Avoid duplicate coverage - don't test same behavior at multiple levels.
+
+### Exploratory Mode (NEW - Phase 2.5)
+
+**test-design** supports UI exploration for brownfield applications with missing documentation.
+
+**Activation**: Automatic when requirements missing/incomplete for brownfield apps
+
+- If config.tea_use_mcp_enhancements is true + MCP available → MCP-assisted exploration
+- Otherwise → Manual exploration with user documentation
+
+**When to Use Exploratory Mode:**
+
+- ✅ Brownfield projects with missing documentation
+- ✅ Legacy systems lacking requirements
+- ✅ Undocumented features needing test coverage
+- ✅ Unknown user journeys requiring discovery
+- ❌ NOT for greenfield projects with clear requirements
+
+**Exploration Modes:**
+
+1. **MCP-Assisted Exploration** (if Playwright MCP available):
+   - Interactive browser exploration using MCP tools
+   - `planner_setup_page` - Initialize browser
+   - `browser_navigate` - Explore pages
+   - `browser_click` - Interact with UI elements
+   - `browser_hover` - Reveal hidden menus
+   - `browser_snapshot` - Capture state at each step
+   - `browser_screenshot` - Document visually
+   - `browser_console_messages` - Find JavaScript errors
+   - `browser_network_requests` - Identify API endpoints
+
+2. **Manual Exploration** (fallback without MCP):
+   - User explores application manually
+   - Documents findings in markdown:
+     - Pages/features discovered
+     - User journeys identified
+     - API endpoints observed (DevTools Network)
+     - JavaScript errors noted (DevTools Console)
+     - Critical workflows mapped
+   - Provides exploration findings to workflow
+
+**Exploration Workflow:**
+
+```
+1. Enable exploratory_mode and set exploration_url
+2. IF MCP available:
+   - Use planner_setup_page to init browser
+   - Explore UI with browser_* tools
+   - Capture snapshots and screenshots
+   - Monitor console and network
+   - Document discoveries
+3. IF MCP unavailable:
+   - Notify user to explore manually
+   - Wait for exploration findings
+4. Convert discoveries to testable requirements
+5. Continue with standard risk assessment (Step 2)
+```
+
+**Example Output from Exploratory Mode:**
+
+```markdown
+## Exploration Findings - Legacy Admin Panel
+
+**Exploration URL**: https://admin.example.com
+**Mode**: MCP-Assisted
+
+### Discovered Features:
+
+1. User Management (/admin/users)
+   - List users (table with 10 columns)
+   - Edit user (modal form)
+   - Delete user (confirmation dialog)
+   - Export to CSV (download button)
+
+2. Reporting Dashboard (/admin/reports)
+   - Date range picker
+   - Filter by department
+   - Generate PDF report
+   - Email report to stakeholders
+
+3. API Endpoints Discovered:
+   - GET /api/admin/users
+   - PUT /api/admin/users/:id
+   - DELETE /api/admin/users/:id
+   - POST /api/reports/generate
+
+### User Journeys Mapped:
+
+1. Admin deletes inactive user
+   - Navigate to /admin/users
+   - Click delete icon
+   - Confirm in modal
+   - User removed from table
+
+2. Admin generates monthly report
+   - Navigate to /admin/reports
+   - Select date range (last month)
+   - Click generate
+   - Download PDF
+
+### Risks Identified (from exploration):
+
+- R-001 (SEC): No RBAC check observed (any admin can delete any user)
+- R-002 (DATA): No confirmation on bulk delete
+- R-003 (PERF): User table loads slowly (5s for 1000 rows)
+
+**Next**: Proceed to risk assessment with discovered requirements
+```
+
+**Graceful Degradation:**
+
+- Exploratory mode is OPTIONAL (default: disabled)
+- Works without Playwright MCP (manual fallback)
+- If exploration fails, can disable mode and provide requirements documentation
+- Seamlessly transitions to standard risk assessment workflow
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+## Integration with Other Workflows
+
+**Before test-design:**
+
+- **plan-project** (Phase 2): Creates PRD and epics
+- **solution-architecture** (Phase 3): Defines technical approach
+- **tech-spec** (Phase 3): Implementation details
+
+**After test-design:**
+
+- **atdd**: Generate failing tests for P0 scenarios
+- **automate**: Expand coverage for P1/P2 scenarios
+- **trace (Phase 2)**: Use quality gate criteria for release decisions
+
+**Coordinates with:**
+
+- **framework**: Test infrastructure must exist
+- **ci**: Execution order maps to CI stages
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds test design to Quality & Testing Progress
+
+## Important Notes
+
+### Evidence-Based Assessment
+
+**Critical principle**: Base risk assessment on **evidence**, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**When uncertain**: Document assumptions, request user clarification.
+
+**Avoid**:
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+### Resource Estimation Formula
+
+```
+P0: 2 hours per test (setup + complex scenarios)
+P1: 1 hour per test (standard coverage)
+P2: 0.5 hours per test (simple scenarios)
+P3: 0.25 hours per test (exploratory)
+
+Total Days = Total Hours / 8
+```
+
+Example:
+
+- 15 P0 × 2h = 30h
+- 25 P1 × 1h = 25h
+- 40 P2 × 0.5h = 20h
+- **Total: 75 hours (~10 days)**
+
+### Execution Order Strategy
+
+**Smoke tests** (subset of P0, <5 min):
+
+- Login successful
+- Dashboard loads
+- Core API responds
+
+**Purpose**: Fast feedback, catch build-breaking issues immediately.
+
+**P0 tests** (critical paths, <10 min):
+
+- All scenarios blocking user journeys
+- Security-critical flows
+
+**P1 tests** (important features, <30 min):
+
+- Common workflows
+- Medium-risk areas
+
+**P2/P3 tests** (full regression, <60 min):
+
+- Edge cases
+- Performance benchmarks
+
+### Quality Gate Criteria
+
+**Pass/Fail thresholds:**
+
+- P0: 100% pass (no exceptions)
+- P1: ≥95% pass (2-3 failures acceptable with waivers)
+- P2/P3: ≥90% pass (informational)
+- High-risk items: All mitigated or have approved waivers
+
+**Coverage targets:**
+
+- Critical paths: ≥80%
+- Security scenarios: 100%
+- Business logic: ≥70%
+
+## Validation Checklist
+
+After workflow completion:
+
+- [ ] Risk assessment complete (all categories)
+- [ ] Risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priorities assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created
+
+Refer to `checklist.md` for comprehensive validation.
+
+## Example Execution
+
+**Scenario: E-commerce checkout epic**
+
+```bash
+bmad tea *test-design
+# Epic 3: Checkout flow redesign
+
+# Risk Assessment identifies:
+- R-001 (SEC): Payment bypass, P=2 × I=3 = 6 (HIGH)
+- R-002 (PERF): Cart load time, P=3 × I=2 = 6 (HIGH)
+- R-003 (BUS): Order confirmation email, P=2 × I=2 = 4 (MEDIUM)
+
+# Coverage Plan:
+P0 scenarios: 12 tests (payment security, order creation)
+P1 scenarios: 18 tests (cart management, promo codes)
+P2 scenarios: 25 tests (edge cases, error handling)
+
+Total effort: 65 hours (~8 days)
+
+# Test Levels:
+- E2E: 8 tests (critical checkout path)
+- API: 30 tests (business logic, payment processing)
+- Unit: 17 tests (calculations, validations)
+
+# Execution Order:
+1. Smoke: Payment successful, order created (2 min)
+2. P0: All payment & security flows (8 min)
+3. P1: Cart & promo codes (20 min)
+4. P2: Edge cases (40 min)
+
+# Quality Gates:
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- R-001 mitigated: Add payment validation layer
+- R-002 mitigated: Implement cart caching
+```
+
+## Troubleshooting
+
+**Issue: "Unable to score risks - missing context"**
+
+- **Cause**: Insufficient documentation
+- **Solution**: Request PRD, architecture docs, or user clarification
+
+**Issue: "All tests marked as P0"**
+
+- **Cause**: Over-prioritization
+- **Solution**: Apply strict P0 criteria (blocks core journey + high risk + no workaround)
+
+**Issue: "Duplicate coverage at multiple test levels"**
+
+- **Cause**: Not following test pyramid
+- **Solution**: Use E2E for critical paths only, API for logic, unit for edge cases
+
+**Issue: "Resource estimates too high"**
+
+- **Cause**: Complex test setup or insufficient automation
+- **Solution**: Invest in fixtures/factories upfront, reduce per-test setup time
+
+## Related Workflows
+
+- **atdd**: Generate failing tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression coverage → [automate/README.md](../automate/README.md)
+- **trace**: Traceability and quality gate decisions → [trace/README.md](../trace/README.md)
+- **framework**: Test infrastructure → [framework/README.md](../framework/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, risk scoring framework, template-based output
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/modules/bmm/workflows/testarch/test-design/checklist.md
@@ -0,0 +1,234 @@
+# Test Design and Risk Assessment - Validation Checklist
+
+## Prerequisites
+
+- [ ] Story markdown with clear acceptance criteria exists
+- [ ] PRD or epic documentation available
+- [ ] Architecture documents available (optional)
+- [ ] Requirements are testable and unambiguous
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] PRD.md read and requirements extracted
+- [ ] Epics.md or specific epic documentation loaded
+- [ ] Story markdown with acceptance criteria analyzed
+- [ ] Architecture documents reviewed (if available)
+- [ ] Existing test coverage analyzed
+- [ ] Knowledge base fragments loaded (risk-governance, probability-impact, test-levels, test-priorities)
+
+### Step 2: Risk Assessment
+
+- [ ] Genuine risks identified (not just features)
+- [ ] Risks classified by category (TECH/SEC/PERF/DATA/BUS/OPS)
+- [ ] Probability scored (1-3 for each risk)
+- [ ] Impact scored (1-3 for each risk)
+- [ ] Risk scores calculated (probability × impact)
+- [ ] High-priority risks (score ≥6) flagged
+- [ ] Mitigation plans defined for high-priority risks
+- [ ] Owners assigned for each mitigation
+- [ ] Timelines set for mitigations
+- [ ] Residual risk documented
+
+### Step 3: Coverage Design
+
+- [ ] Acceptance criteria broken into atomic scenarios
+- [ ] Test levels selected (E2E/API/Component/Unit)
+- [ ] No duplicate coverage across levels
+- [ ] Priority levels assigned (P0/P1/P2/P3)
+- [ ] P0 scenarios meet strict criteria (blocks core + high risk + no workaround)
+- [ ] Data prerequisites identified
+- [ ] Tooling requirements documented
+- [ ] Execution order defined (smoke → P0 → P1 → P2/P3)
+
+### Step 4: Deliverables Generation
+
+- [ ] Risk assessment matrix created
+- [ ] Coverage matrix created
+- [ ] Execution order documented
+- [ ] Resource estimates calculated
+- [ ] Quality gate criteria defined
+- [ ] Output file written to correct location
+- [ ] Output file uses template structure
+
+## Output Validation
+
+### Risk Assessment Matrix
+
+- [ ] All risks have unique IDs (R-001, R-002, etc.)
+- [ ] Each risk has category assigned
+- [ ] Probability values are 1, 2, or 3
+- [ ] Impact values are 1, 2, or 3
+- [ ] Scores calculated correctly (P × I)
+- [ ] High-priority risks (≥6) clearly marked
+- [ ] Mitigation strategies specific and actionable
+
+### Coverage Matrix
+
+- [ ] All requirements mapped to test levels
+- [ ] Priorities assigned to all scenarios
+- [ ] Risk linkage documented
+- [ ] Test counts realistic
+- [ ] Owners assigned where applicable
+- [ ] No duplicate coverage (same behavior at multiple levels)
+
+### Execution Order
+
+- [ ] Smoke tests defined (<5 min target)
+- [ ] P0 tests listed (<10 min target)
+- [ ] P1 tests listed (<30 min target)
+- [ ] P2/P3 tests listed (<60 min target)
+- [ ] Order optimizes for fast feedback
+
+### Resource Estimates
+
+- [ ] P0 hours calculated (count × 2 hours)
+- [ ] P1 hours calculated (count × 1 hour)
+- [ ] P2 hours calculated (count × 0.5 hours)
+- [ ] P3 hours calculated (count × 0.25 hours)
+- [ ] Total hours summed
+- [ ] Days estimate provided (hours / 8)
+- [ ] Estimates include setup time
+
+### Quality Gate Criteria
+
+- [ ] P0 pass rate threshold defined (should be 100%)
+- [ ] P1 pass rate threshold defined (typically ≥95%)
+- [ ] High-risk mitigation completion required
+- [ ] Coverage targets specified (≥80% recommended)
+
+## Quality Checks
+
+### Evidence-Based Assessment
+
+- [ ] Risk assessment based on documented evidence
+- [ ] No speculation on business impact
+- [ ] Assumptions clearly documented
+- [ ] Clarifications requested where needed
+- [ ] Historical data referenced where available
+
+### Risk Classification Accuracy
+
+- [ ] TECH risks are architecture/integration issues
+- [ ] SEC risks are security vulnerabilities
+- [ ] PERF risks are performance/scalability concerns
+- [ ] DATA risks are data integrity issues
+- [ ] BUS risks are business/revenue impacts
+- [ ] OPS risks are deployment/operational issues
+
+### Priority Assignment Accuracy
+
+- [ ] P0: Truly blocks core functionality
+- [ ] P0: High-risk (score ≥6)
+- [ ] P0: No workaround exists
+- [ ] P1: Important but not blocking
+- [ ] P2/P3: Nice-to-have or edge cases
+
+### Test Level Selection
+
+- [ ] E2E used only for critical paths
+- [ ] API tests cover complex business logic
+- [ ] Component tests for UI interactions
+- [ ] Unit tests for edge cases and algorithms
+- [ ] No redundant coverage
+
+## Integration Points
+
+### Knowledge Base Integration
+
+- [ ] risk-governance.md consulted
+- [ ] probability-impact.md applied
+- [ ] test-levels-framework.md referenced
+- [ ] test-priorities-matrix.md used
+- [ ] Additional fragments loaded as needed
+
+### Status File Integration
+
+- [ ] bmm-workflow-status.md exists
+- [ ] Test design logged in Quality & Testing Progress
+- [ ] Epic number and scope documented
+- [ ] Completion timestamp recorded
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `atdd` workflow with P0 scenarios
+- [ ] Can proceed to `automate` workflow with full coverage plan
+- [ ] Risk assessment informs `gate` workflow criteria
+- [ ] Integrates with `ci` workflow execution order
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Output file complete and well-formatted
+- [ ] Team review scheduled (if required)
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Review risk assessment with team
+2. [ ] Prioritize mitigation for high-priority risks (score ≥6)
+3. [ ] Allocate resources per estimates
+4. [ ] Run `atdd` workflow to generate P0 tests
+5. [ ] Set up test data factories and fixtures
+6. [ ] Schedule team review of test design document
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for P0 test generation
+2. [ ] Run `framework` workflow if not already done
+3. [ ] Run `ci` workflow to configure pipeline stages
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete output file
+2. [ ] Review error logs
+3. [ ] Fix missing context (PRD, architecture docs)
+4. [ ] Clarify ambiguous requirements
+5. [ ] Retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: Too many P0 tests
+
+- **Solution**: Apply strict P0 criteria - must block core AND high risk AND no workaround
+
+**Issue**: Risk scores all high
+
+- **Solution**: Differentiate between high-impact (3) and degraded (2) impacts
+
+**Issue**: Duplicate coverage across levels
+
+- **Solution**: Use test pyramid - E2E for critical paths only
+
+**Issue**: Resource estimates too high
+
+- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+
+### Best Practices
+
+- Base risk assessment on evidence, not assumptions
+- High-priority risks (≥6) require immediate mitigation
+- P0 tests should cover <10% of total scenarios
+- Avoid testing same behavior at multiple levels
+- Include smoke tests (P0 subset) for fast feedback
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Epic:** **\*\***\_\_\_**\*\***
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/src/modules/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/modules/bmm/workflows/testarch/test-design/instructions.md
@@ -1,44 +1,621 @@
 <!-- Powered by BMAD-CORE™ -->

-# Risk and Test Design v3.1
+# Test Design and Risk Assessment

-```xml
-<task id="bmad/bmm/testarch/test-design" name="Risk and Test Design">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Story markdown, acceptance criteria, PRD/architecture context are available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm inputs; halt if any are missing or unclear.</action>
-    </step>
-    <step n="2" title="Assess Risks">
-      <action>Use `{project-root}/bmad/bmm/testarch/tea-index.csv` to load the `risk-governance`, `probability-impact`, and `test-levels` fragments before scoring.</action>
-      <action>Filter requirements to isolate genuine risks; review PRD/architecture/story for unresolved gaps.</action>
-      <action>Classify risks across TECH, SEC, PERF, DATA, BUS, OPS; request clarification when evidence is missing.</action>
-      <action>Score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical); compute totals and highlight scores ≥6.</action>
-      <action>Plan mitigations with owners, timelines, and update residual risk expectations.</action>
-    </step>
-    <step n="3" title="Design Coverage">
-      <action>Break acceptance criteria into atomic scenarios tied to mitigations.</action>
-      <action>Load the `test-levels` fragment (knowledge/test-levels-framework.md) to select appropriate levels and avoid duplicate coverage.</action>
-      <action>Load the `test-priorities` fragment (knowledge/test-priorities-matrix.md) to assign P0–P3 priorities and outline data/tooling prerequisites.</action>
-    </step>
-    <step n="4" title="Deliverables">
-      <action>Create risk assessment markdown (category/probability/impact/score) with mitigation matrix and gate snippet totals.</action>
-      <action>Produce coverage matrix (requirement/level/priority/mitigation) plus recommended execution order.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If story data or criteria are missing, halt and request them.</i>
-  </halt>
-  <notes>
-    <i>Category definitions: TECH=architecture flaws; SEC=missing controls; PERF=SLA risk; DATA=loss/corruption; BUS=user/business harm; OPS=deployment/run failures.</i>
-    <i>Leverage `tea-index.csv` tags to find supporting evidence (e.g., fixture-architecture, selective-testing) without loading unnecessary files.</i>
-    <i>Rely on evidence, not speculation; tie scenarios back to mitigations; keep scenarios independent and maintainable.</i>
-  </notes>
-  <output>
-    <i>Unified risk assessment and coverage strategy ready for implementation.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow generates a test design document that identifies high-risk areas, maps requirements to test levels, prioritizes scenarios (P0-P3), and provides resource estimates for the testing effort.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story markdown with acceptance criteria available
+- ✅ PRD or epic documentation exists for context
+- ✅ Architecture documents available (optional but recommended)
+- ✅ Requirements are clear and testable
+
+---
+
+## Step 1: Load Context and Requirements
+
+### Actions
+
+1. **Read Requirements Documentation**
+   - Load PRD.md for high-level product requirements
+   - Read epics.md or specific epic for feature scope
+   - Read story markdown for detailed acceptance criteria
+   - Identify all testable requirements
+
+2. **Load Architecture Context**
+   - Read solution-architecture.md for system design
+   - Read tech-spec for implementation details
+   - Identify technical constraints and dependencies
+   - Note integration points and external systems
+
+3. **Analyze Existing Test Coverage**
+   - Search for existing test files in `{test_dir}`
+   - Identify coverage gaps
+   - Note areas with insufficient testing
+   - Check for flaky or outdated tests
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `risk-governance.md` - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)
+   - `probability-impact.md` - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)
+   - `test-levels-framework.md` - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)
+   - `test-priorities-matrix.md` - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)
+
+**Halt Condition:** If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"
+
+---
+
+## Step 1.5: Mode Selection (NEW - Phase 2.5)
+
+### Actions
+
+1. **Detect Planning Mode**
+
+   Determine mode based on context:
+
+   **Requirements-Based Mode (DEFAULT)**:
+   - Have clear story/PRD with acceptance criteria
+   - Uses: Existing workflow (Steps 2-4)
+   - Appropriate for: Documented features, greenfield projects
+
+   **Exploratory Mode (OPTIONAL - Brownfield)**:
+   - Missing/incomplete requirements AND brownfield application exists
+   - Uses: UI exploration to discover functionality
+   - Appropriate for: Undocumented brownfield apps, legacy systems
+
+2. **Requirements-Based Mode (DEFAULT - Skip to Step 2)**
+
+   If requirements are clear:
+   - Continue with existing workflow (Step 2: Assess and Classify Risks)
+   - Use loaded requirements from Step 1
+   - Proceed with risk assessment based on documented requirements
+
+3. **Exploratory Mode (OPTIONAL - Brownfield Apps)**
+
+   If exploring brownfield application:
+
+   **A. Check MCP Availability**
+
+   If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:
+   - Use MCP-assisted exploration (Step 3.B)
+
+   If MCP unavailable OR config.tea_use_mcp_enhancements is false:
+   - Use manual exploration fallback (Step 3.C)
+
+   **B. MCP-Assisted Exploration (If MCP Tools Available)**
+
+   Use Playwright MCP browser tools to explore UI:
+
+   **Setup:**
+
+   ```
+   1. Use planner_setup_page to initialize browser
+   2. Navigate to {exploration_url}
+   3. Capture initial state with browser_snapshot
+   ```
+
+   **Exploration Process:**
+
+   ```
+   4. Use browser_navigate to explore different pages
+   5. Use browser_click to interact with buttons, links, forms
+   6. Use browser_hover to reveal hidden menus/tooltips
+   7. Capture browser_snapshot at each significant state
+   8. Take browser_screenshot for documentation
+   9. Monitor browser_console_messages for JavaScript errors
+   10. Track browser_network_requests to identify API calls
+   11. Map user flows and interactive elements
+   12. Document discovered functionality
+   ```
+
+   **Discovery Documentation:**
+   - Create list of discovered features (pages, workflows, forms)
+   - Identify user journeys (navigation paths)
+   - Map API endpoints (from network requests)
+   - Note error states (from console messages)
+   - Capture screenshots for visual reference
+
+   **Convert to Test Scenarios:**
+   - Transform discoveries into testable requirements
+   - Prioritize based on user flow criticality
+   - Identify risks from discovered functionality
+   - Continue with Step 2 (Assess and Classify Risks) using discovered requirements
+
+   **C. Manual Exploration Fallback (If MCP Unavailable)**
+
+   If Playwright MCP is not available:
+
+   **Notify User:**
+
+   ```markdown
+   Exploratory mode enabled but Playwright MCP unavailable.
+
+   **Manual exploration required:**
+
+   1. Open application at: {exploration_url}
+   2. Explore all pages, workflows, and features
+   3. Document findings in markdown:
+      - List of pages/features discovered
+      - User journeys identified
+      - API endpoints observed (DevTools Network tab)
+      - JavaScript errors noted (DevTools Console)
+      - Critical workflows mapped
+
+   4. Provide exploration findings to continue workflow
+
+   **Alternative:** Disable exploratory_mode and provide requirements documentation
+   ```
+
+   Wait for user to provide exploration findings, then:
+   - Parse user-provided discovery documentation
+   - Convert to testable requirements
+   - Continue with Step 2 (risk assessment)
+
+4. **Proceed to Risk Assessment**
+
+   After mode selection (Requirements-Based OR Exploratory):
+   - Continue to Step 2: Assess and Classify Risks
+   - Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)
+
+---
+
+## Step 2: Assess and Classify Risks
+
+### Actions
+
+1. **Identify Genuine Risks**
+
+   Filter requirements to isolate actual risks (not just features):
+   - Unresolved technical gaps
+   - Security vulnerabilities
+   - Performance bottlenecks
+   - Data loss or corruption potential
+   - Business impact failures
+   - Operational deployment issues
+
+2. **Classify Risks by Category**
+
+   Use these standard risk categories:
+
+   **TECH** (Technical/Architecture):
+   - Architecture flaws
+   - Integration failures
+   - Scalability issues
+   - Technical debt
+
+   **SEC** (Security):
+   - Missing access controls
+   - Authentication bypass
+   - Data exposure
+   - Injection vulnerabilities
+
+   **PERF** (Performance):
+   - SLA violations
+   - Response time degradation
+   - Resource exhaustion
+   - Scalability limits
+
+   **DATA** (Data Integrity):
+   - Data loss
+   - Data corruption
+   - Inconsistent state
+   - Migration failures
+
+   **BUS** (Business Impact):
+   - User experience degradation
+   - Business logic errors
+   - Revenue impact
+   - Compliance violations
+
+   **OPS** (Operations):
+   - Deployment failures
+   - Configuration errors
+   - Monitoring gaps
+   - Rollback issues
+
+3. **Score Risk Probability**
+
+   Rate likelihood (1-3):
+   - **1 (Unlikely)**: <10% chance, edge case
+   - **2 (Possible)**: 10-50% chance, known scenario
+   - **3 (Likely)**: >50% chance, common occurrence
+
+4. **Score Risk Impact**
+
+   Rate severity (1-3):
+   - **1 (Minor)**: Cosmetic, workaround exists, limited users
+   - **2 (Degraded)**: Feature impaired, workaround difficult, affects many users
+   - **3 (Critical)**: System failure, data loss, no workaround, blocks usage
+
+5. **Calculate Risk Score**
+
+   ```
+   Risk Score = Probability × Impact
+
+   Scores:
+   1-2: Low risk (monitor)
+   3-4: Medium risk (plan mitigation)
+   6-9: High risk (immediate mitigation required)
+   ```
+
+6. **Highlight High-Priority Risks**
+
+   Flag all risks with score ≥6 for immediate attention.
+
+7. **Request Clarification**
+
+   If evidence is missing or assumptions required:
+   - Document assumptions clearly
+   - Request user clarification
+   - Do NOT speculate on business impact
+
+8. **Plan Mitigations**
+
+   For each high-priority risk:
+   - Define mitigation strategy
+   - Assign owner (dev, QA, ops)
+   - Set timeline
+   - Update residual risk expectation
+
+---
+
+## Step 3: Design Test Coverage
+
+### Actions
+
+1. **Break Down Acceptance Criteria**
+
+   Convert each acceptance criterion into atomic test scenarios:
+   - One scenario per testable behavior
+   - Scenarios are independent
+   - Scenarios are repeatable
+   - Scenarios tie back to risk mitigations
+
+2. **Select Appropriate Test Levels**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   Map requirements to optimal test levels (avoid duplication):
+
+   **E2E (End-to-End)**:
+   - Critical user journeys
+   - Multi-system integration
+   - Production-like environment
+   - Highest confidence, slowest execution
+
+   **API (Integration)**:
+   - Service contracts
+   - Business logic validation
+   - Fast feedback
+   - Good for complex scenarios
+
+   **Component**:
+   - UI component behavior
+   - Interaction testing
+   - Visual regression
+   - Fast, isolated
+
+   **Unit**:
+   - Business logic
+   - Edge cases
+   - Error handling
+   - Fastest, most granular
+
+   **Avoid duplicate coverage**: Don't test same behavior at multiple levels unless necessary.
+
+3. **Assign Priority Levels**
+
+   **Knowledge Base Reference**: `test-priorities-matrix.md`
+
+   **P0 (Critical)**:
+   - Blocks core user journey
+   - High-risk areas (score ≥6)
+   - Revenue-impacting
+   - Security-critical
+   - **Run on every commit**
+
+   **P1 (High)**:
+   - Important user features
+   - Medium-risk areas (score 3-4)
+   - Common workflows
+   - **Run on PR to main**
+
+   **P2 (Medium)**:
+   - Secondary features
+   - Low-risk areas (score 1-2)
+   - Edge cases
+   - **Run nightly or weekly**
+
+   **P3 (Low)**:
+   - Nice-to-have
+   - Exploratory
+   - Performance benchmarks
+   - **Run on-demand**
+
+4. **Outline Data and Tooling Prerequisites**
+
+   For each test scenario, identify:
+   - Test data requirements (factories, fixtures)
+   - External services (mocks, stubs)
+   - Environment setup
+   - Tools and dependencies
+
+5. **Define Execution Order**
+
+   Recommend test execution sequence:
+   1. **Smoke tests** (P0 subset, <5 min)
+   2. **P0 tests** (critical paths, <10 min)
+   3. **P1 tests** (important features, <30 min)
+   4. **P2/P3 tests** (full regression, <60 min)
+
+---
+
+## Step 4: Generate Deliverables
+
+### Actions
+
+1. **Create Risk Assessment Matrix**
+
+   Use template structure:
+
+   ```markdown
+   | Risk ID | Category | Description | Probability | Impact | Score | Mitigation      |
+   | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
+   | R-001   | SEC      | Auth bypass | 2           | 3      | 6     | Add authz check |
+   ```
+
+2. **Create Coverage Matrix**
+
+   ```markdown
+   | Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+   | ----------- | ---------- | -------- | --------- | ---------- | ----- |
+   | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
+   ```
+
+3. **Document Execution Order**
+
+   ```markdown
+   ### Smoke Tests (<5 min)
+
+   - Login successful
+   - Dashboard loads
+
+   ### P0 Tests (<10 min)
+
+   - [Full P0 list]
+
+   ### P1 Tests (<30 min)
+
+   - [Full P1 list]
+   ```
+
+4. **Include Resource Estimates**
+
+   ```markdown
+   ### Test Effort Estimates
+
+   - P0 scenarios: 15 tests × 2 hours = 30 hours
+   - P1 scenarios: 25 tests × 1 hour = 25 hours
+   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
+   - **Total:** 75 hours (~10 days)
+   ```
+
+5. **Add Gate Criteria**
+
+   ```markdown
+   ### Quality Gate Criteria
+
+   - All P0 tests pass (100%)
+   - P1 tests pass rate ≥95%
+   - No high-risk (score ≥6) items unmitigated
+   - Test coverage ≥80% for critical paths
+   ```
+
+6. **Write to Output File**
+
+   Save to `{output_folder}/test-design-epic-{epic_num}.md` using template structure.
+
+---
+
+## Important Notes
+
+### Risk Category Definitions
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws or technical debt
+- Integration complexity
+- Scalability concerns
+
+**SEC** (Security):
+
+- Missing security controls
+- Authentication/authorization gaps
+- Data exposure risks
+
+**PERF** (Performance):
+
+- SLA risk or performance degradation
+- Resource constraints
+- Scalability bottlenecks
+
+**DATA** (Data Integrity):
+
+- Data loss or corruption potential
+- State consistency issues
+- Migration risks
+
+**BUS** (Business Impact):
+
+- User experience harm
+- Business logic errors
+- Revenue or compliance impact
+
+**OPS** (Operations):
+
+- Deployment or runtime failures
+- Configuration issues
+- Monitoring/observability gaps
+
+### Risk Scoring Methodology
+
+**Probability × Impact = Risk Score**
+
+Examples:
+
+- High likelihood (3) × Critical impact (3) = **Score 9** (highest priority)
+- Possible (2) × Critical (3) = **Score 6** (high priority threshold)
+- Unlikely (1) × Minor (1) = **Score 1** (low priority)
+
+**Threshold**: Scores ≥6 require immediate mitigation.
+
+### Test Level Selection Strategy
+
+**Avoid duplication:**
+
+- Don't test same behavior at E2E and API level
+- Use E2E for critical paths only
+- Use API tests for complex business logic
+- Use unit tests for edge cases
+
+**Tradeoffs:**
+
+- E2E: High confidence, slow execution, brittle
+- API: Good balance, fast, stable
+- Unit: Fastest feedback, narrow scope
+
+### Priority Assignment Guidelines
+
+**P0 criteria** (all must be true):
+
+- Blocks core functionality
+- High-risk (score ≥6)
+- No workaround exists
+- Affects majority of users
+
+**P1 criteria**:
+
+- Important feature
+- Medium risk (score 3-5)
+- Workaround exists but difficult
+
+**P2/P3**: Everything else, prioritized by value
+
+### Knowledge Base Integration
+
+**Core Fragments (Auto-loaded in Step 1):**
+
+- `risk-governance.md` - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)
+- `probability-impact.md` - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)
+- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)
+- `test-priorities-matrix.md` - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)
+
+**Reference for Test Planning:**
+
+- `selective-testing.md` - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
+- `fixture-architecture.md` - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)
+
+**Manual Reference (Optional):**
+
+- Use `tea-index.csv` to find additional specialized fragments as needed
+
+### Evidence-Based Assessment
+
+**Critical principle:** Base risk assessment on evidence, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**Avoid:**
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+**When uncertain:** Document assumptions and request clarification from user.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Test Design Complete
+
+**Epic**: {epic_num}
+**Scope**: {design_level}
+
+**Risk Assessment**:
+
+- Total risks identified: {count}
+- High-priority risks (≥6): {high_count}
+- Categories: {categories}
+
+**Coverage Plan**:
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+**Test Levels**:
+
+- E2E: {e2e_count}
+- API: {api_count}
+- Component: {component_count}
+- Unit: {unit_count}
+
+**Quality Gate Criteria**:
+
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- High-risk mitigations: 100%
+- Coverage: ≥80%
+
+**Output File**: {output_file}
+
+**Next Steps**:
+
+1. Review risk assessment with team
+2. Prioritize mitigation for high-risk items (score ≥6)
+3. Run `atdd` workflow to generate failing tests for P0 scenarios
+4. Allocate resources per effort estimates
+5. Set up test data factories and fixtures
 ```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Risk assessment complete with all categories
+- [ ] All risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priority levels assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/test-design/test-design-template.md
+++ b/src/modules/bmm/workflows/testarch/test-design/test-design-template.md
@@ -0,0 +1,285 @@
+# Test Design: Epic {epic_num} - {epic_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Status:** Draft / Approved
+
+---
+
+## Executive Summary
+
+**Scope:** {design_level} test design for Epic {epic_num}
+
+**Risk Summary:**
+
+- Total risks identified: {total_risks}
+- High-priority risks (≥6): {high_priority_count}
+- Critical categories: {top_categories}
+
+**Coverage Summary:**
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+---
+
+## Risk Assessment
+
+### High-Priority Risks (Score ≥6)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   | Timeline |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- | -------- |
+| R-001   | SEC      | {description} | 2           | 3      | 6     | {mitigation} | {owner} | {date}   |
+| R-002   | PERF     | {description} | 3           | 2      | 6     | {mitigation} | {owner} | {date}   |
+
+### Medium-Priority Risks (Score 3-4)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- |
+| R-003   | TECH     | {description} | 2           | 2      | 4     | {mitigation} | {owner} |
+| R-004   | DATA     | {description} | 1           | 3      | 3     | {mitigation} | {owner} |
+
+### Low-Priority Risks (Score 1-2)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Action  |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------- |
+| R-005   | OPS      | {description} | 1           | 2      | 2     | Monitor |
+| R-006   | BUS      | {description} | 1           | 1      | 1     | Monitor |
+
+### Risk Category Legend
+
+- **TECH**: Technical/Architecture (flaws, integration, scalability)
+- **SEC**: Security (access controls, auth, data exposure)
+- **PERF**: Performance (SLA violations, degradation, resource limits)
+- **DATA**: Data Integrity (loss, corruption, inconsistency)
+- **BUS**: Business Impact (UX harm, logic errors, revenue)
+- **OPS**: Operations (deployment, config, monitoring)
+
+---
+
+## Test Coverage Plan
+
+### P0 (Critical) - Run on every commit
+
+**Criteria**: Blocks core journey + High risk (≥6) + No workaround
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | R-001     | 3          | QA    | {notes} |
+| {requirement} | API        | R-002     | 5          | QA    | {notes} |
+
+**Total P0**: {p0_count} tests, {p0_hours} hours
+
+### P1 (High) - Run on PR to main
+
+**Criteria**: Important features + Medium risk (3-4) + Common workflows
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-003     | 4          | QA    | {notes} |
+| {requirement} | Component  | -         | 6          | DEV   | {notes} |
+
+**Total P1**: {p1_count} tests, {p1_hours} hours
+
+### P2 (Medium) - Run nightly/weekly
+
+**Criteria**: Secondary features + Low risk (1-2) + Edge cases
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-004     | 8          | QA    | {notes} |
+| {requirement} | Unit       | -         | 15         | DEV   | {notes} |
+
+**Total P2**: {p2_count} tests, {p2_hours} hours
+
+### P3 (Low) - Run on-demand
+
+**Criteria**: Nice-to-have + Exploratory + Performance benchmarks
+
+| Requirement   | Test Level | Test Count | Owner | Notes   |
+| ------------- | ---------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | 2          | QA    | {notes} |
+| {requirement} | Unit       | 8          | DEV   | {notes} |
+
+**Total P3**: {p3_count} tests, {p3_hours} hours
+
+---
+
+## Execution Order
+
+### Smoke Tests (<5 min)
+
+**Purpose**: Fast feedback, catch build-breaking issues
+
+- [ ] {scenario} (30s)
+- [ ] {scenario} (45s)
+- [ ] {scenario} (1min)
+
+**Total**: {smoke_count} scenarios
+
+### P0 Tests (<10 min)
+
+**Purpose**: Critical path validation
+
+- [ ] {scenario} (E2E)
+- [ ] {scenario} (API)
+- [ ] {scenario} (API)
+
+**Total**: {p0_count} scenarios
+
+### P1 Tests (<30 min)
+
+**Purpose**: Important feature coverage
+
+- [ ] {scenario} (API)
+- [ ] {scenario} (Component)
+
+**Total**: {p1_count} scenarios
+
+### P2/P3 Tests (<60 min)
+
+**Purpose**: Full regression coverage
+
+- [ ] {scenario} (Unit)
+- [ ] {scenario} (API)
+
+**Total**: {p2p3_count} scenarios
+
+---
+
+## Resource Estimates
+
+### Test Development Effort
+
+| Priority  | Count             | Hours/Test | Total Hours       | Notes                   |
+| --------- | ----------------- | ---------- | ----------------- | ----------------------- |
+| P0        | {p0_count}        | 2.0        | {p0_hours}        | Complex setup, security |
+| P1        | {p1_count}        | 1.0        | {p1_hours}        | Standard coverage       |
+| P2        | {p2_count}        | 0.5        | {p2_hours}        | Simple scenarios        |
+| P3        | {p3_count}        | 0.25       | {p3_hours}        | Exploratory             |
+| **Total** | **{total_count}** | **-**      | **{total_hours}** | **~{total_days} days**  |
+
+### Prerequisites
+
+**Test Data:**
+
+- {factory_name} factory (faker-based, auto-cleanup)
+- {fixture_name} fixture (setup/teardown)
+
+**Tooling:**
+
+- {tool} for {purpose}
+- {tool} for {purpose}
+
+**Environment:**
+
+- {env_requirement}
+- {env_requirement}
+
+---
+
+## Quality Gate Criteria
+
+### Pass/Fail Thresholds
+
+- **P0 pass rate**: 100% (no exceptions)
+- **P1 pass rate**: ≥95% (waivers required for failures)
+- **P2/P3 pass rate**: ≥90% (informational)
+- **High-risk mitigations**: 100% complete or approved waivers
+
+### Coverage Targets
+
+- **Critical paths**: ≥80%
+- **Security scenarios**: 100%
+- **Business logic**: ≥70%
+- **Edge cases**: ≥50%
+
+### Non-Negotiable Requirements
+
+- [ ] All P0 tests pass
+- [ ] No high-risk (≥6) items unmitigated
+- [ ] Security tests (SEC category) pass 100%
+- [ ] Performance targets met (PERF category)
+
+---
+
+## Mitigation Plans
+
+### R-001: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+### R-002: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+---
+
+## Assumptions and Dependencies
+
+### Assumptions
+
+1. {assumption}
+2. {assumption}
+3. {assumption}
+
+### Dependencies
+
+1. {dependency} - Required by {date}
+2. {dependency} - Required by {date}
+
+### Risks to Plan
+
+- **Risk**: {risk_to_plan}
+  - **Impact**: {impact}
+  - **Contingency**: {contingency}
+
+---
+
+## Approval
+
+**Test Design Approved By:**
+
+- [ ] Product Manager: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] Tech Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] QA Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+
+**Comments:**
+
+---
+
+---
+
+---
+
+## Appendix
+
+### Knowledge Base References
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+### Related Documents
+
+- PRD: {prd_link}
+- Epic: {epic_link}
+- Architecture: {arch_link}
+- Tech Spec: {tech_spec_link}
+
+---
+
+**Generated by**: BMad TEA Agent - Test Architect Module
+**Workflow**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
--- a/src/modules/bmm/workflows/testarch/test-design/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/test-design/workflow.yaml
@@ -1,25 +1,79 @@
 # Test Architect workflow: test-design
-name: testarch-plan
-description: "Plan risk mitigation and test coverage before development."
+name: testarch-test-design
+description: "Plan risk mitigation and test coverage strategy before development with risk assessment and prioritization"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-design"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-design-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target scope
+  epic_num: "" # Epic number for scoped design
+  story_path: "" # Specific story for design (optional)
+  design_level: "full" # full, targeted, minimal
+
+  # Risk assessment configuration
+  risk_assessment_enabled: true
+  risk_threshold: 6 # Scores >= 6 are high-priority (probability × impact)
+  risk_categories: "TECH,SEC,PERF,DATA,BUS,OPS" # Comma-separated
+
+  # Coverage planning
+  priority_levels: "P0,P1,P2,P3" # Test priorities
+  test_levels: "e2e,api,integration,unit,component" # Test levels to consider
+  selective_testing_strategy: "risk-based" # risk-based, coverage-based, hybrid
+
+  # Output configuration
+  output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+  include_risk_matrix: true
+  include_coverage_matrix: true
+  include_execution_order: true
+  include_resource_estimates: true
+
+  # Advanced options
+  auto_load_knowledge: true # Load relevant knowledge fragments
+  include_mitigation_plan: true
+  include_gate_criteria: true
+  standalone_mode: false # Can run without epic context
+
+# Output configuration
+default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read PRD, epics, stories, architecture docs
+  - write_file # Create test design document
+  - list_files # Find related documentation
+  - search_repo # Search for existing tests and patterns
+
+# Recommended inputs
+recommended_inputs:
+  - prd: "Product Requirements Document for context"
+  - epics: "Epic documentation (epics.md or specific epic)"
+  - story: "Story markdown with acceptance criteria"
+  - architecture: "Architecture documents (solution-architecture.md, tech-spec)"
+  - existing_tests: "Current test coverage for gap analysis"

 tags:
  - qa
  - planning
  - test-architect
+  - risk-assessment
+  - coverage

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/test-review/README.md
+++ b/src/modules/bmm/workflows/testarch/test-review/README.md
@@ -0,0 +1,775 @@
+# Test Quality Review Workflow
+
+The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.
+
+## Overview
+
+This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.
+
+**Key Features:**
+
+- **Knowledge-Based Review**: Applies patterns from 19+ knowledge fragments in tea-index.csv
+- **Quality Scoring**: 0-100 score with letter grade (A+ to F) based on violations
+- **Multi-Scope Review**: Single file, directory, or entire test suite
+- **Pattern Detection**: Identifies hard waits, race conditions, shared state, conditionals
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions, test length
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Code Examples**: Every issue includes recommended fix with code snippets
+- **Integration**: Works with story files, test-design, acceptance criteria context
+
+---
+
+## Usage
+
+```bash
+bmad tea *test-review
+```
+
+The TEA agent runs this workflow when:
+
+- After `*atdd` workflow → validate generated acceptance tests
+- After `*automate` workflow → ensure regression suite quality
+- After developer writes tests → provide quality feedback
+- Before `*gate` workflow → confirm test quality before release
+- User explicitly requests review: `bmad tea *test-review`
+- Periodic quality audits of existing test suite
+
+**Typical workflow sequence:**
+
+1. `*atdd` → Generate failing acceptance tests
+2. **`*test-review`** → Validate test quality ⬅️ YOU ARE HERE (option 1)
+3. `*dev story` → Implement feature with tests passing
+4. **`*test-review`** → Review implementation tests ⬅️ YOU ARE HERE (option 2)
+5. `*automate` → Expand regression suite
+6. **`*test-review`** → Validate new regression tests ⬅️ YOU ARE HERE (option 3)
+7. `*gate` → Final quality gate decision
+
+---
+
+## Inputs
+
+### Required Context Files
+
+- **Test File(s)**: One or more test files to review (auto-discovered or explicitly provided)
+- **Test Framework Config**: playwright.config.ts, jest.config.js, etc. (for context)
+
+### Recommended Context Files
+
+- **Story File**: Acceptance criteria for context (e.g., `story-1.3.md`)
+- **Test Design**: Priority context (P0/P1/P2/P3) from test-design.md
+- **Knowledge Base**: tea-index.csv with best practice fragments (required for thorough review)
+
+### Workflow Variables
+
+Key variables that control review behavior (configured in `workflow.yaml`):
+
+- **review_scope**: `single` | `directory` | `suite` (default: `single`)
+  - `single`: Review one test file
+  - `directory`: Review all tests in a directory
+  - `suite`: Review entire test suite
+
+- **quality_score_enabled**: Enable 0-100 quality scoring (default: `true`)
+- **append_to_file**: Add inline comments to test files (default: `false`)
+- **check_against_knowledge**: Use tea-index.csv fragments (default: `true`)
+- **strict_mode**: Fail on any violation vs advisory only (default: `false`)
+
+**Quality Criteria Flags** (all default to `true`):
+
+- `check_given_when_then`: BDD format validation
+- `check_test_ids`: Test ID conventions
+- `check_priority_markers`: P0/P1/P2/P3 classification
+- `check_hard_waits`: Detect sleep(), wait(X)
+- `check_determinism`: No conditionals/try-catch abuse
+- `check_isolation`: Tests clean up, no shared state
+- `check_fixture_patterns`: Pure function → Fixture → mergeTests
+- `check_data_factories`: Factory usage vs hardcoded data
+- `check_network_first`: Route intercept before navigate
+- `check_assertions`: Explicit assertions present
+- `check_test_length`: Warn if >300 lines
+- `check_test_duration`: Warn if >1.5 min
+- `check_flakiness_patterns`: Common flaky patterns
+
+---
+
+## Outputs
+
+### Primary Deliverable
+
+**Test Quality Review Report** (`test-review-{filename}.md`):
+
+- **Executive Summary**: Overall assessment, key strengths/weaknesses, recommendation
+- **Quality Score**: 0-100 score with letter grade (A+ to F)
+- **Quality Criteria Assessment**: Table with all criteria evaluated (PASS/WARN/FAIL)
+- **Critical Issues**: P0/P1 violations that must be fixed
+- **Recommendations**: P2/P3 violations that should be fixed
+- **Best Practices Examples**: Good patterns found in tests
+- **Knowledge Base References**: Links to detailed guidance
+
+Each issue includes:
+
+- Code location (file:line)
+- Explanation of problem
+- Recommended fix with code example
+- Knowledge base fragment reference
+
+### Secondary Outputs
+
+- **Inline Comments**: TODO comments in test files at violation locations (if enabled)
+- **Quality Badge**: Badge with score (e.g., "Test Quality: 87/100 (A)")
+- **Story Update**: Test quality section appended to story file (if enabled)
+
+### Validation Safeguards
+
+- ✅ All knowledge base fragments loaded successfully
+- ✅ Test files parsed and structure analyzed
+- ✅ All enabled quality criteria evaluated
+- ✅ Violations categorized by severity (P0/P1/P2/P3)
+- ✅ Quality score calculated with breakdown
+- ✅ Actionable feedback with code examples provided
+
+---
+
+## Quality Criteria Explained
+
+### 1. BDD Format (Given-When-Then)
+
+**PASS**: Tests use clear Given-When-Then structure
+
+```typescript
+// Given: User is logged in
+const user = await createTestUser();
+await loginPage.login(user.email, user.password);
+
+// When: User navigates to dashboard
+await page.goto('/dashboard');
+
+// Then: User sees welcome message
+await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);
+```
+
+**FAIL**: Tests lack structure, hard to understand intent
+
+```typescript
+await page.goto('/dashboard');
+await page.click('.button');
+await expect(page.locator('.text')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, tdd-cycles.md
+
+---
+
+### 2. Test IDs
+
+**PASS**: All tests have IDs following convention
+
+```typescript
+test.describe('1.3-E2E-001: User Login Flow', () => {
+  test('should log in successfully with valid credentials', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**FAIL**: No test IDs, can't trace to requirements
+
+```typescript
+test.describe('Login', () => {
+  test('login works', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**Knowledge**: traceability.md, test-quality.md
+
+---
+
+### 3. Priority Markers
+
+**PASS**: Tests classified as P0/P1/P2/P3
+
+```typescript
+test.describe('P0: Critical User Journey - Checkout', () => {
+  // Critical tests
+});
+
+test.describe('P2: Edge Case - International Addresses', () => {
+  // Nice-to-have tests
+});
+```
+
+**Knowledge**: test-priorities.md, risk-governance.md
+
+---
+
+### 4. No Hard Waits
+
+**PASS**: No sleep(), wait(), hardcoded delays
+
+```typescript
+// ✅ Good: Explicit wait for condition
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+
+**FAIL**: Hard waits introduce flakiness
+
+```typescript
+// ❌ Bad: Hard wait
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, network-first.md
+
+---
+
+### 5. Determinism
+
+**PASS**: Tests work deterministically, no conditionals
+
+```typescript
+// ✅ Good: Deterministic test
+await expect(page.locator('[data-testid="status"]')).toHaveText('Active');
+```
+
+**FAIL**: Conditionals make tests unpredictable
+
+```typescript
+// ❌ Bad: Conditional logic
+const status = await page.locator('[data-testid="status"]').textContent();
+if (status === 'Active') {
+  await page.click('[data-testid="deactivate"]');
+} else {
+  await page.click('[data-testid="activate"]');
+}
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 6. Isolation
+
+**PASS**: Tests clean up, no shared state
+
+```typescript
+test.afterEach(async ({ page, testUser }) => {
+  // Cleanup: Delete test user
+  await api.deleteUser(testUser.id);
+});
+```
+
+**FAIL**: Shared state, tests depend on order
+
+```typescript
+// ❌ Bad: Shared global variable
+let userId: string;
+
+test('create user', async () => {
+  userId = await createUser(); // Sets global
+});
+
+test('update user', async () => {
+  await updateUser(userId); // Depends on previous test
+});
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 7. Fixture Patterns
+
+**PASS**: Pure function → Fixture → mergeTests
+
+```typescript
+// ✅ Good: Pure function fixture
+const createAuthenticatedPage = async (page: Page, user: User) => {
+  await loginPage.login(user.email, user.password);
+  return page;
+};
+
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    const authedPage = await createAuthenticatedPage(page, user);
+    await use(authedPage);
+  },
+});
+```
+
+**FAIL**: No fixtures, repeated setup
+
+```typescript
+// ❌ Bad: Repeated setup in every test
+test('test 1', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'password123');
+  await page.click('[type="submit"]');
+  // Test logic
+});
+```
+
+**Knowledge**: fixture-architecture.md
+
+---
+
+### 8. Data Factories
+
+**PASS**: Factory functions with overrides
+
+```typescript
+// ✅ Good: Factory function
+import { createTestUser } from './factories/user-factory';
+
+test('user can update profile', async ({ page }) => {
+  const user = createTestUser({ role: 'admin' });
+  await api.createUser(user); // API-first setup
+  // Test UI interaction
+});
+```
+
+**FAIL**: Hardcoded test data
+
+```typescript
+// ❌ Bad: Magic strings
+await page.fill('[name="email"]', 'test@example.com');
+await page.fill('[name="phone"]', '555-1234');
+```
+
+**Knowledge**: data-factories.md
+
+---
+
+### 9. Network-First Pattern
+
+**PASS**: Route intercept before navigate
+
+```typescript
+// ✅ Good: Intercept before navigation
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
+await page.goto('/users'); // Navigate after route setup
+```
+
+**FAIL**: Race condition risk
+
+```typescript
+// ❌ Bad: Navigate before intercept
+await page.goto('/users');
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!
+```
+
+**Knowledge**: network-first.md
+
+---
+
+### 10. Explicit Assertions
+
+**PASS**: Clear, specific assertions
+
+```typescript
+await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
+await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);
+```
+
+**FAIL**: Missing or vague assertions
+
+```typescript
+await page.locator('[data-testid="username"]').isVisible(); // No assertion!
+```
+
+**Knowledge**: test-quality.md
+
+---
+
+### 11. Test Length
+
+**PASS**: ≤300 lines per file (ideal: ≤200)
+**WARN**: 301-500 lines (consider splitting)
+**FAIL**: >500 lines (too large)
+
+**Knowledge**: test-quality.md
+
+---
+
+### 12. Test Duration
+
+**PASS**: ≤1.5 minutes per test (target: <30 seconds)
+**WARN**: 1.5-3 minutes (consider optimization)
+**FAIL**: >3 minutes (too slow)
+
+**Knowledge**: test-quality.md, selective-testing.md
+
+---
+
+### 13. Flakiness Patterns
+
+Common flaky patterns detected:
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions
+- Retry logic hiding flakiness
+- Environment-dependent assumptions
+
+**Knowledge**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+## Quality Scoring
+
+### Score Calculation
+
+```
+Starting Score: 100
+
+Deductions:
+- Critical Violations (P0): -10 points each
+- High Violations (P1): -5 points each
+- Medium Violations (P2): -2 points each
+- Low Violations (P3): -1 point each
+
+Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5
+
+Final Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+### Quality Grades
+
+- **90-100** (A+): Excellent - Production-ready, best practices followed
+- **80-89** (A): Good - Minor improvements recommended
+- **70-79** (B): Acceptable - Some issues to address
+- **60-69** (C): Needs Improvement - Several issues detected
+- **<60** (F): Critical Issues - Significant problems, not production-ready
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Excellent Quality (Score: 95)
+
+```markdown
+# Test Quality Review: checkout-flow.spec.ts
+
+**Quality Score**: 95/100 (A+ - Excellent)
+**Recommendation**: Approve - Production Ready
+
+## Executive Summary
+
+Excellent test quality with comprehensive coverage and best practices throughout.
+Tests demonstrate expert-level patterns including fixture architecture, data
+factories, network-first approach, and perfect isolation.
+
+**Strengths:**
+✅ Clear Given-When-Then structure in all tests
+✅ Comprehensive fixtures for authenticated states
+✅ Data factories with faker.js for realistic test data
+✅ Network-first pattern prevents race conditions
+✅ Perfect test isolation with cleanup
+✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)
+
+**Minor Recommendations:**
+⚠️ One test slightly verbose (245 lines) - consider extracting helper function
+
+**Recommendation**: Approve without changes. Use as reference for other tests.
+```
+
+---
+
+### Scenario 2: Good Quality (Score: 82)
+
+```markdown
+# Test Quality Review: user-profile.spec.ts
+
+**Quality Score**: 82/100 (A - Good)
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Solid test quality with good structure and coverage. A few improvements would
+enhance maintainability and reduce flakiness risk.
+
+**Strengths:**
+✅ Good BDD structure
+✅ Test IDs present
+✅ Explicit assertions
+
+**Issues to Address:**
+⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
+⚠️ Hardcoded test data (line 23) - use factory functions
+⚠️ Missing cleanup in one test (line 89) - add afterEach hook
+
+**Recommendation**: Address hard waits before merging. Other improvements
+can be addressed in follow-up PR.
+```
+
+---
+
+### Scenario 3: Needs Improvement (Score: 68)
+
+```markdown
+# Test Quality Review: legacy-report.spec.ts
+
+**Quality Score**: 68/100 (C - Needs Improvement)
+**Recommendation**: Request Changes
+
+## Executive Summary
+
+Test has several quality issues that should be addressed before merging.
+Primarily concerns around flakiness risk and maintainability.
+
+**Critical Issues:**
+❌ 5 hard waits detected (flakiness risk)
+❌ Race condition: navigation before route interception (line 45)
+❌ Shared global state between tests (line 12)
+❌ Missing test IDs (can't trace to requirements)
+
+**Recommendations:**
+⚠️ Test file is 487 lines - consider splitting
+⚠️ Hardcoded data throughout - use factories
+⚠️ Missing cleanup in afterEach
+
+**Recommendation**: Address all critical issues (❌) before re-review.
+Significant refactoring needed.
+```
+
+---
+
+### Scenario 4: Critical Issues (Score: 42)
+
+```markdown
+# Test Quality Review: data-export.spec.ts
+
+**Quality Score**: 42/100 (F - Critical Issues)
+**Recommendation**: Block - Not Production Ready
+
+## Executive Summary
+
+CRITICAL: Test has severe quality issues that make it unsuitable for
+production. Significant refactoring required.
+
+**Critical Issues:**
+❌ 12 hard waits (page.waitForTimeout) throughout
+❌ No test IDs or structure
+❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
+❌ No cleanup - tests leave data in database
+❌ Conditional logic (if/else) throughout tests
+❌ No assertions in 3 tests (tests do nothing!)
+❌ 687 lines - far too large
+❌ Multiple race conditions
+❌ Hardcoded credentials in plain text (SECURITY ISSUE)
+
+**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
+TEA knowledge base patterns. Suggest pairing session with QA engineer.
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+1. **atdd** - Generates acceptance tests → TEA reviews for quality
+2. **dev story** - Developer implements tests → TEA provides feedback
+3. **automate** - Expands regression suite → TEA validates new tests
+
+### After Test Review
+
+1. **Developer** - Addresses critical issues, improves based on recommendations
+2. **gate** - Test quality feeds into release decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria for context
+- **Test Design**: Review validates tests align with P0/P1/P2/P3 prioritization
+- **Knowledge Base**: All feedback references tea-index.csv fragments
+
+---
+
+## Review Scopes
+
+### Single File Review
+
+```bash
+# Review specific test file
+bmad tea *test-review
+# Provide test_file_path when prompted: tests/auth/login.spec.ts
+```
+
+**Use When:**
+
+- Reviewing tests just written
+- PR review of specific test file
+- Debugging flaky test
+- Learning test quality patterns
+
+---
+
+### Directory Review
+
+```bash
+# Review all tests in directory
+bmad tea *test-review
+# Provide review_scope: directory
+# Provide test_dir: tests/auth/
+```
+
+**Use When:**
+
+- Feature branch has multiple test files
+- Reviewing entire feature test suite
+- Auditing test quality for module
+
+---
+
+### Suite Review
+
+```bash
+# Review entire test suite
+bmad tea *test-review
+# Provide review_scope: suite
+```
+
+**Use When:**
+
+- Periodic quality audit (monthly/quarterly)
+- Before major release
+- Identifying patterns across codebase
+- Establishing quality baseline
+
+---
+
+## Configuration Examples
+
+### Strict Review (Fail on Violations)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: true # Fail if score <70
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: PR gates, production releases
+
+---
+
+### Balanced Review (Advisory)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: false # Advisory only
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: Most development workflows (default)
+
+---
+
+### Focused Review (Specific Criteria)
+
+```yaml
+review_scope: 'single'
+check_hard_waits: true
+check_flakiness_patterns: true
+check_network_first: true
+# Other checks: false
+```
+
+Use for: Debugging flaky tests, targeted improvements
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified (document with comments)
+3. **Knowledge-Based**: All feedback grounded in proven patterns
+4. **Actionable**: Every issue includes recommended fix with code example
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review tests periodically as patterns evolve
+7. **Learning Tool**: Use reviews to learn best practices, not just find bugs
+
+---
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-quality.md** - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **fixture-architecture.md** - Pure function → Fixture → mergeTests pattern
+- **network-first.md** - Route intercept before navigate (race condition prevention)
+- **data-factories.md** - Factory functions with overrides, API-first setup
+- **test-levels-framework.md** - E2E vs API vs Component vs Unit appropriateness
+- **playwright-config.md** - Environment-based configuration patterns
+- **tdd-cycles.md** - Red-Green-Refactor patterns
+- **selective-testing.md** - Duplicate coverage detection
+- **ci-burn-in.md** - Flakiness detection patterns
+- **test-priorities.md** - P0/P1/P2/P3 classification framework
+- **traceability.md** - Requirements-to-tests mapping
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Troubleshooting
+
+### Problem: Quality score seems too low
+
+**Solution:**
+
+- Review violation breakdown - focus on critical issues first
+- Consider project context - some patterns may be justified
+- Check if criteria are appropriate for project type
+- Score is indicator, not absolute - focus on actionable feedback
+
+---
+
+### Problem: No test files found
+
+**Solution:**
+
+- Verify test_dir path is correct
+- Check test file extensions (_.spec.ts, _.test.js, etc.)
+- Use glob pattern to discover: `tests/**/*.spec.ts`
+
+---
+
+### Problem: Knowledge fragments not loading
+
+**Solution:**
+
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct in tea-index.csv
+- Ensure auto_load_knowledge: true in workflow variables
+
+---
+
+### Problem: Too many false positives
+
+**Solution:**
+
+- Add justification comments in code for legitimate violations
+- Adjust check\_\* flags to disable specific criteria
+- Use strict_mode: false for advisory-only feedback
+- Context matters - document why pattern is appropriate
+
+---
+
+## Related Commands
+
+- `bmad tea *atdd` - Generate acceptance tests (review after generation)
+- `bmad tea *automate` - Expand regression suite (review new tests)
+- `bmad tea *gate` - Quality gate decision (test quality feeds into decision)
+- `bmad dev story` - Implement story (review tests after implementation)
--- a/src/modules/bmm/workflows/testarch/test-review/checklist.md
+++ b/src/modules/bmm/workflows/testarch/test-review/checklist.md
@@ -0,0 +1,470 @@
+# Test Quality Review - Validation Checklist
+
+Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
+
+---
+
+## Prerequisites
+
+### Test File Discovery
+
+- [ ] Test file(s) identified for review (single/directory/suite scope)
+- [ ] Test files exist and are readable
+- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
+
+### Knowledge Base Loading
+
+- [ ] tea-index.csv loaded successfully
+- [ ] `test-quality.md` loaded (Definition of Done)
+- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
+- [ ] `network-first.md` loaded (Route intercept before navigate)
+- [ ] `data-factories.md` loaded (Factory patterns)
+- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
+- [ ] All other enabled fragments loaded successfully
+
+### Context Gathering
+
+- [ ] Story file discovered or explicitly provided (if available)
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Review scope determined (single/directory/suite)
+- [ ] Test file paths collected
+- [ ] Related artifacts discovered (story, test-design)
+- [ ] Knowledge base fragments loaded successfully
+- [ ] Quality criteria flags read from workflow variables
+
+### Step 2: Test File Parsing
+
+**For Each Test File:**
+
+- [ ] File read successfully
+- [ ] File size measured (lines, KB)
+- [ ] File structure parsed (describe blocks, it blocks)
+- [ ] Test IDs extracted (if present)
+- [ ] Priority markers extracted (if present)
+- [ ] Imports analyzed
+- [ ] Dependencies identified
+
+**Test Structure Analysis:**
+
+- [ ] Describe block count calculated
+- [ ] It/test block count calculated
+- [ ] BDD structure identified (Given-When-Then)
+- [ ] Fixture usage detected
+- [ ] Data factory usage detected
+- [ ] Network interception patterns identified
+- [ ] Assertions counted
+- [ ] Waits and timeouts cataloged
+- [ ] Conditionals (if/else) detected
+- [ ] Try/catch blocks detected
+- [ ] Shared state or globals detected
+
+### Step 3: Quality Criteria Validation
+
+**For Each Enabled Criterion:**
+
+#### BDD Format (if `check_given_when_then: true`)
+
+- [ ] Given-When-Then structure evaluated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers
+- [ ] Examples of good/bad patterns noted
+
+#### Test IDs (if `check_test_ids: true`)
+
+- [ ] Test ID presence validated
+- [ ] Test ID format checked (e.g., 1.3-E2E-001)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing IDs cataloged
+
+#### Priority Markers (if `check_priority_markers: true`)
+
+- [ ] P0/P1/P2/P3 classification validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing priorities cataloged
+
+#### Hard Waits (if `check_hard_waits: true`)
+
+- [ ] sleep(), waitForTimeout(), hardcoded delays detected
+- [ ] Justification comments checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers and recommended fixes
+
+#### Determinism (if `check_determinism: true`)
+
+- [ ] Conditionals (if/else/switch) detected
+- [ ] Try/catch abuse detected
+- [ ] Random values (Math.random, Date.now) detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Isolation (if `check_isolation: true`)
+
+- [ ] Cleanup hooks (afterEach/afterAll) validated
+- [ ] Shared state detected
+- [ ] Global variable mutations detected
+- [ ] Resource cleanup verified
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Fixture Patterns (if `check_fixture_patterns: true`)
+
+- [ ] Fixtures detected (test.extend)
+- [ ] Pure functions validated
+- [ ] mergeTests usage checked
+- [ ] beforeEach complexity analyzed
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Data Factories (if `check_data_factories: true`)
+
+- [ ] Factory functions detected
+- [ ] Hardcoded data (magic strings/numbers) detected
+- [ ] Faker.js or similar usage validated
+- [ ] API-first setup pattern checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Network-First (if `check_network_first: true`)
+
+- [ ] page.route() before page.goto() validated
+- [ ] Race conditions detected (route after navigate)
+- [ ] waitForResponse patterns checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Assertions (if `check_assertions: true`)
+
+- [ ] Explicit assertions counted
+- [ ] Implicit waits without assertions detected
+- [ ] Assertion specificity validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Test Length (if `check_test_length: true`)
+
+- [ ] File line count calculated
+- [ ] Threshold comparison (≤300 lines ideal)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Splitting recommendations generated (if >300 lines)
+
+#### Test Duration (if `check_test_duration: true`)
+
+- [ ] Test complexity analyzed (as proxy for duration if no execution data)
+- [ ] Threshold comparison (≤1.5 min target)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Optimization recommendations generated
+
+#### Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
+- [ ] Race conditions detected
+- [ ] Timing-dependent assertions detected
+- [ ] Retry logic detected
+- [ ] Environment-dependent assumptions detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+---
+
+### Step 4: Quality Score Calculation
+
+**Violation Counting:**
+
+- [ ] Critical (P0) violations counted
+- [ ] High (P1) violations counted
+- [ ] Medium (P2) violations counted
+- [ ] Low (P3) violations counted
+- [ ] Violation breakdown by criterion recorded
+
+**Score Calculation:**
+
+- [ ] Starting score: 100
+- [ ] Critical violations deducted (-10 each)
+- [ ] High violations deducted (-5 each)
+- [ ] Medium violations deducted (-2 each)
+- [ ] Low violations deducted (-1 each)
+- [ ] Bonus points added (max +30):
+  - [ ] Excellent BDD structure (+5 if applicable)
+  - [ ] Comprehensive fixtures (+5 if applicable)
+  - [ ] Comprehensive data factories (+5 if applicable)
+  - [ ] Network-first pattern (+5 if applicable)
+  - [ ] Perfect isolation (+5 if applicable)
+  - [ ] All test IDs present (+5 if applicable)
+- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
+
+**Quality Grade:**
+
+- [ ] Grade assigned based on score:
+  - 90-100: A+ (Excellent)
+  - 80-89: A (Good)
+  - 70-79: B (Acceptable)
+  - 60-69: C (Needs Improvement)
+  - <60: F (Critical Issues)
+
+---
+
+### Step 5: Review Report Generation
+
+**Report Sections Created:**
+
+- [ ] **Header Section**:
+  - [ ] Test file(s) reviewed listed
+  - [ ] Review date recorded
+  - [ ] Review scope noted (single/directory/suite)
+  - [ ] Quality score and grade displayed
+
+- [ ] **Executive Summary**:
+  - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
+  - [ ] Key strengths listed (3-5 bullet points)
+  - [ ] Key weaknesses listed (3-5 bullet points)
+  - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
+
+- [ ] **Quality Criteria Assessment**:
+  - [ ] Table with all criteria evaluated
+  - [ ] Status for each criterion (PASS/WARN/FAIL)
+  - [ ] Violation count per criterion
+
+- [ ] **Critical Issues (Must Fix)**:
+  - [ ] P0/P1 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended fix provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Recommendations (Should Fix)**:
+  - [ ] P2/P3 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended improvement provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Best Practices Examples** (if good patterns found):
+  - [ ] Good patterns highlighted from tests
+  - [ ] Knowledge base fragments referenced
+  - [ ] Examples provided for others to follow
+
+- [ ] **Knowledge Base References**:
+  - [ ] All fragments consulted listed
+  - [ ] Links to detailed guidance provided
+
+---
+
+### Step 6: Optional Outputs Generation
+
+**Inline Comments** (if `generate_inline_comments: true`):
+
+- [ ] Inline comments generated at violation locations
+- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
+- [ ] Comments added to test files (no logic changes)
+- [ ] Test files remain valid and executable
+
+**Quality Badge** (if `generate_quality_badge: true`):
+
+- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
+- [ ] Badge format suitable for README or documentation
+- [ ] Badge saved to output folder
+
+**Story Update** (if `append_to_story: true` and story file exists):
+
+- [ ] "Test Quality Review" section created
+- [ ] Quality score included
+- [ ] Critical issues summarized
+- [ ] Link to full review report provided
+- [ ] Story file updated successfully
+
+---
+
+### Step 7: Save and Notify
+
+**Outputs Saved:**
+
+- [ ] Review report saved to `{output_file}`
+- [ ] Inline comments written to test files (if enabled)
+- [ ] Quality badge saved (if enabled)
+- [ ] Story file updated (if enabled)
+- [ ] All outputs are valid and readable
+
+**Summary Message Generated:**
+
+- [ ] Quality score and grade included
+- [ ] Critical issue count stated
+- [ ] Recommendation provided (Approve/Request changes/Block)
+- [ ] Next steps clarified
+- [ ] Message displayed to user
+
+---
+
+## Output Validation
+
+### Review Report Completeness
+
+- [ ] All required sections present
+- [ ] No placeholder text or TODOs in report
+- [ ] All code locations are accurate (file:line)
+- [ ] All code examples are valid and demonstrate fix
+- [ ] All knowledge base references are correct
+
+### Review Report Accuracy
+
+- [ ] Quality score matches violation breakdown
+- [ ] Grade matches score range
+- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
+- [ ] Violations correctly attributed to quality criteria
+- [ ] No false positives (violations are legitimate issues)
+- [ ] No false negatives (critical issues not missed)
+
+### Review Report Clarity
+
+- [ ] Executive summary is clear and actionable
+- [ ] Issue explanations are understandable
+- [ ] Recommended fixes are implementable
+- [ ] Code examples are correct and runnable
+- [ ] Recommendation (Approve/Request changes) is clear
+
+---
+
+## Quality Checks
+
+### Knowledge-Based Validation
+
+- [ ] All feedback grounded in knowledge base fragments
+- [ ] Recommendations follow proven patterns
+- [ ] No arbitrary or opinion-based feedback
+- [ ] Knowledge fragment references accurate and relevant
+
+### Actionable Feedback
+
+- [ ] Every issue includes recommended fix
+- [ ] Every fix includes code example
+- [ ] Code examples demonstrate correct pattern
+- [ ] Fixes reference knowledge base for more detail
+
+### Severity Classification
+
+- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
+- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
+- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
+- [ ] Low (P3) issues are minor style/preference (verbose tests)
+
+### Context Awareness
+
+- [ ] Review considers project context (some patterns may be justified)
+- [ ] Violations with justification comments noted as acceptable
+- [ ] Edge cases acknowledged
+- [ ] Recommendations are pragmatic, not dogmatic
+
+---
+
+## Integration Points
+
+### Story File Integration
+
+- [ ] Story file discovered correctly (if available)
+- [ ] Acceptance criteria extracted and used for context
+- [ ] Test quality section appended to story (if enabled)
+- [ ] Link to review report added to story
+
+### Test Design Integration
+
+- [ ] Test design document discovered correctly (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted and used
+- [ ] Review validates tests align with prioritization
+- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
+
+### Knowledge Base Integration
+
+- [ ] tea-index.csv loaded successfully
+- [ ] All required fragments loaded
+- [ ] Fragments applied correctly to validation
+- [ ] Fragment references in report are accurate
+
+---
+
+## Edge Cases and Special Situations
+
+### Empty or Minimal Tests
+
+- [ ] If test file is empty, report notes "No tests found"
+- [ ] If test file has only boilerplate, report notes "No meaningful tests"
+- [ ] Score reflects lack of content appropriately
+
+### Legacy Tests
+
+- [ ] Legacy tests acknowledged in context
+- [ ] Review provides practical recommendations for improvement
+- [ ] Recognizes that complete refactor may not be feasible
+- [ ] Prioritizes critical issues (flakiness) over style
+
+### Test Framework Variations
+
+- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
+- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
+- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
+- [ ] Knowledge fragments applied appropriately for framework
+
+### Justified Violations
+
+- [ ] Violations with justification comments in code noted as acceptable
+- [ ] Justifications evaluated for legitimacy
+- [ ] Report acknowledges justified patterns
+- [ ] Score not penalized for justified violations
+
+---
+
+## Final Validation
+
+### Review Completeness
+
+- [ ] All enabled quality criteria evaluated
+- [ ] All test files in scope reviewed
+- [ ] All violations cataloged
+- [ ] All recommendations provided
+- [ ] Review report is comprehensive
+
+### Review Accuracy
+
+- [ ] Quality score is accurate
+- [ ] Violations are correct (no false positives)
+- [ ] Critical issues not missed (no false negatives)
+- [ ] Code locations are correct
+- [ ] Knowledge base references are accurate
+
+### Review Usefulness
+
+- [ ] Feedback is actionable
+- [ ] Recommendations are implementable
+- [ ] Code examples are correct
+- [ ] Review helps developer improve tests
+- [ ] Review educates on best practices
+
+### Workflow Complete
+
+- [ ] All checklist items completed
+- [ ] All outputs validated and saved
+- [ ] User notified with summary
+- [ ] Review ready for developer consumption
+- [ ] Follow-up actions identified (if any)
+
+---
+
+## Notes
+
+Record any issues, observations, or important context during workflow execution:
+
+- **Test Framework**: [Playwright, Jest, Cypress, etc.]
+- **Review Scope**: [single file, directory, full suite]
+- **Quality Score**: [0-100 score, letter grade]
+- **Critical Issues**: [Count of P0/P1 violations]
+- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
+- **Special Considerations**: [Legacy code, justified patterns, edge cases]
+- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]
--- a/src/modules/bmm/workflows/testarch/test-review/instructions.md
+++ b/src/modules/bmm/workflows/testarch/test-review/instructions.md
@@ -0,0 +1,608 @@
+# Test Quality Review - Instructions v4.0
+
+**Workflow:** `testarch-test-review`
+**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)
+
+---
+
+## Overview
+
+This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
+
+**Key Capabilities:**
+
+- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
+- **Quality Scoring**: 0-100 score based on violations and best practices
+- **Multi-Scope**: Review single file, directory, or entire test suite
+- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Integration**: Works with story files, test-design, acceptance criteria
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Test file(s) to review (auto-discovered or explicitly provided)
+- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
+
+**Recommended:**
+
+- Story file with acceptance criteria (for context)
+- Test design document (for priority context)
+- Knowledge base fragments available in tea-index.csv
+
+**Halt Conditions:**
+
+- If test file path is invalid or file doesn't exist, halt and request correction
+- If test_dir is empty (no tests found), halt and notify user
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
+   - `fixture-architecture.md` - Pure function → Fixture → mergeTests composition with auto-cleanup (406 lines, 5 examples)
+   - `network-first.md` - Route intercept before navigate to prevent race conditions (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
+   - `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
+   - `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
+   - `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
+   - `component-tdd.md` - Red-Green-Refactor patterns with provider isolation, accessibility, visual regression (480 lines, 4 examples)
+   - `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
+   - `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
+   - `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
+   - `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
+   - `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
+
+2. Determine review scope:
+   - **single**: Review one test file (`test_file_path` provided)
+   - **directory**: Review all tests in directory (`test_dir` provided)
+   - **suite**: Review entire test suite (discover all test files)
+
+3. Auto-discover related artifacts (if `auto_discover_story: true`):
+   - Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
+   - Search for story file (`story-1.3.md`)
+   - Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
+
+4. Read story file for context (if available):
+   - Extract acceptance criteria
+   - Extract priority classification
+   - Extract expected test IDs
+
+**Output:** Complete knowledge base loaded, review scope determined, context gathered
+
+---
+
+### Step 2: Discover and Parse Test Files
+
+**Actions:**
+
+1. **Discover test files** based on scope:
+   - **single**: Use `test_file_path` variable
+   - **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
+   - **suite**: Use `glob` to find all test files recursively from project root
+
+2. **Parse test file metadata**:
+   - File path and name
+   - File size (warn if >15 KB or >300 lines)
+   - Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+   - Imports and dependencies
+   - Test structure (describe/context/it blocks)
+
+3. **Extract test structure**:
+   - Count of describe blocks (test suites)
+   - Count of it/test blocks (individual tests)
+   - Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
+   - Priority markers (if present, e.g., `test.describe.only` for P0)
+   - BDD structure (Given-When-Then comments or steps)
+
+4. **Identify test patterns**:
+   - Fixtures used
+   - Data factories used
+   - Network interception patterns
+   - Assertions used (expect, assert, toHaveText, etc.)
+   - Waits and timeouts (page.waitFor, sleep, hardcoded delays)
+   - Conditionals (if/else, switch, ternary)
+   - Try/catch blocks
+   - Shared state or globals
+
+**Output:** Complete test file inventory with structure and pattern analysis
+
+---
+
+### Step 3: Validate Against Quality Criteria
+
+**Actions:**
+
+For each test file, validate against quality criteria (configurable via workflow variables):
+
+#### 1. BDD Format Validation (if `check_given_when_then: true`)
+
+- ✅ **PASS**: Tests use Given-When-Then structure (comments or step organization)
+- ⚠️ **WARN**: Tests have some structure but not explicit GWT
+- ❌ **FAIL**: Tests lack clear structure, hard to understand intent
+
+**Knowledge Fragment**: test-quality.md, tdd-cycles.md
+
+---
+
+#### 2. Test ID Conventions (if `check_test_ids: true`)
+
+- ✅ **PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
+- ⚠️ **WARN**: Some test IDs missing or inconsistent
+- ❌ **FAIL**: No test IDs, can't trace tests to requirements
+
+**Knowledge Fragment**: traceability.md, test-quality.md
+
+---
+
+#### 3. Priority Markers (if `check_priority_markers: true`)
+
+- ✅ **PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
+- ⚠️ **WARN**: Some priority classifications missing
+- ❌ **FAIL**: No priority classification, can't determine criticality
+
+**Knowledge Fragment**: test-priorities.md, risk-governance.md
+
+---
+
+#### 4. Hard Waits Detection (if `check_hard_waits: true`)
+
+- ✅ **PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
+- ⚠️ **WARN**: Some hard waits used but with justification comments
+- ❌ **FAIL**: Hard waits detected without justification (flakiness risk)
+
+**Patterns to detect:**
+
+- `sleep(1000)`, `setTimeout()`, `delay()`
+- `page.waitForTimeout(5000)` without explicit reason
+- `await new Promise(resolve => setTimeout(resolve, 3000))`
+
+**Knowledge Fragment**: test-quality.md, network-first.md
+
+---
+
+#### 5. Determinism Check (if `check_determinism: true`)
+
+- ✅ **PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
+- ⚠️ **WARN**: Some conditionals but with clear justification
+- ❌ **FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
+
+**Patterns to detect:**
+
+- `if (condition) { test logic }` - tests should work deterministically
+- `try { test } catch { fallback }` - tests shouldn't swallow errors
+- `Math.random()`, `Date.now()` without factory abstraction
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 6. Isolation Validation (if `check_isolation: true`)
+
+- ✅ **PASS**: Tests clean up resources, no shared state, can run in any order
+- ⚠️ **WARN**: Some cleanup missing but isolated enough
+- ❌ **FAIL**: Tests share state, depend on execution order, leave resources
+
+**Patterns to check:**
+
+- afterEach/afterAll cleanup hooks present
+- No global variables mutated
+- Database/API state cleaned up after tests
+- Test data deleted or marked inactive
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
+
+- ✅ **PASS**: Uses pure function → Fixture → mergeTests pattern
+- ⚠️ **WARN**: Some fixtures used but not consistently
+- ❌ **FAIL**: No fixtures, tests repeat setup code (maintainability risk)
+
+**Patterns to check:**
+
+- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
+- Pure functions used for fixture logic
+- mergeTests used to combine fixtures
+- No beforeEach with complex setup (should be in fixtures)
+
+**Knowledge Fragment**: fixture-architecture.md
+
+---
+
+#### 8. Data Factories (if `check_data_factories: true`)
+
+- ✅ **PASS**: Uses factory functions with overrides, API-first setup
+- ⚠️ **WARN**: Some factories used but also hardcoded data
+- ❌ **FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
+
+**Patterns to check:**
+
+- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
+- Factories use faker.js or similar for realistic data
+- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
+- API-first setup (create via API, test via UI)
+
+**Knowledge Fragment**: data-factories.md
+
+---
+
+#### 9. Network-First Pattern (if `check_network_first: true`)
+
+- ✅ **PASS**: Route interception set up BEFORE navigation (race condition prevention)
+- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
+- ❌ **FAIL**: Route interception after navigation (race condition risk)
+
+**Patterns to check:**
+
+- `page.route()` called before `page.goto()`
+- `page.waitForResponse()` used with explicit URL pattern
+- No navigation followed immediately by route setup
+
+**Knowledge Fragment**: network-first.md
+
+---
+
+#### 10. Assertions (if `check_assertions: true`)
+
+- ✅ **PASS**: Explicit assertions present (expect, assert, toHaveText)
+- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
+- ❌ **FAIL**: Missing assertions, tests don't verify behavior
+
+**Patterns to check:**
+
+- Each test has at least one assertion
+- Assertions are specific (not just truthy checks)
+- Assertions use framework-provided matchers (toHaveText, toBeVisible)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 11. Test Length (if `check_test_length: true`)
+
+- ✅ **PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
+- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
+- ❌ **FAIL**: Test file >500 lines (too large, maintainability risk)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 12. Test Duration (if `check_test_duration: true`)
+
+- ✅ **PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
+- ⚠️ **WARN**: Some tests 1.5-3 minutes (consider optimization)
+- ❌ **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
+
+**Note:** Duration estimation based on complexity analysis if execution data unavailable
+
+**Knowledge Fragment**: test-quality.md, selective-testing.md
+
+---
+
+#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- ✅ **PASS**: No known flaky patterns detected
+- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
+- ❌ **FAIL**: Multiple flaky patterns detected (high flakiness risk)
+
+**Patterns to detect:**
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions (e.g., checking timestamps)
+- Retry logic in tests (hides flakiness)
+- Environment-dependent assumptions (hardcoded URLs, ports)
+
+**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+### Step 4: Calculate Quality Score
+
+**Actions:**
+
+1. **Count violations** by severity:
+   - **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
+   - **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
+   - **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
+   - **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
+
+2. **Calculate quality score** (if `quality_score_enabled: true`):
+
+```
+Starting Score: 100
+
+Critical Violations: -10 points each
+High Violations: -5 points each
+Medium Violations: -2 points each
+Low Violations: -1 point each
+
+Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
+
+Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+3. **Quality Grade**:
+   - **90-100**: Excellent (A+)
+   - **80-89**: Good (A)
+   - **70-79**: Acceptable (B)
+   - **60-69**: Needs Improvement (C)
+   - **<60**: Critical Issues (F)
+
+**Output:** Quality score calculated with violation breakdown
+
+---
+
+### Step 5: Generate Review Report
+
+**Actions:**
+
+1. **Create review report** using `test-review-template.md`:
+
+   **Header Section:**
+   - Test file(s) reviewed
+   - Review date
+   - Review scope (single/directory/suite)
+   - Quality score and grade
+
+   **Executive Summary:**
+   - Overall assessment (Excellent/Good/Needs Improvement/Critical)
+   - Key strengths
+   - Key weaknesses
+   - Recommendation (Approve/Approve with comments/Request changes)
+
+   **Quality Criteria Assessment:**
+   - Table with all criteria evaluated
+   - Status for each (PASS/WARN/FAIL)
+   - Violation count per criterion
+
+   **Critical Issues (Must Fix):**
+   - Priority P0/P1 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended fix
+   - Knowledge base reference
+
+   **Recommendations (Should Fix):**
+   - Priority P2/P3 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended improvement
+   - Knowledge base reference
+
+   **Best Practices Examples:**
+   - Highlight good patterns found in tests
+   - Reference knowledge base fragments
+   - Provide examples for others to follow
+
+   **Knowledge Base References:**
+   - List all fragments consulted
+   - Provide links to detailed guidance
+
+2. **Generate inline comments** (if `generate_inline_comments: true`):
+   - Add TODO comments in test files at violation locations
+   - Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
+   - Never modify test logic, only add comments
+
+3. **Generate quality badge** (if `generate_quality_badge: true`):
+   - Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
+   - Format for inclusion in README or documentation
+
+4. **Append to story file** (if `append_to_story: true` and story file exists):
+   - Add "Test Quality Review" section to story
+   - Include quality score and critical issues
+   - Link to full review report
+
+**Output:** Comprehensive review report with actionable feedback
+
+---
+
+### Step 6: Save Outputs and Notify
+
+**Actions:**
+
+1. **Save review report** to `{output_file}`
+2. **Save inline comments** to test files (if enabled)
+3. **Save quality badge** to output folder (if enabled)
+4. **Update story file** (if enabled)
+5. **Generate summary message** for user:
+   - Quality score and grade
+   - Critical issue count
+   - Recommendation
+
+**Output:** All review artifacts saved and user notified
+
+---
+
+## Quality Criteria Decision Matrix
+
+| Criterion          | PASS                      | WARN           | FAIL                | Knowledge Fragment      |
+| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
+| BDD Format         | Given-When-Then present   | Some structure | No structure        | test-quality.md         |
+| Test IDs           | All tests have IDs        | Some missing   | No IDs              | traceability.md         |
+| Priority Markers   | All classified            | Some missing   | No classification   | test-priorities.md      |
+| Hard Waits         | No hard waits             | Some justified | Hard waits present  | test-quality.md         |
+| Determinism        | No conditionals/random    | Some justified | Conditionals/random | test-quality.md         |
+| Isolation          | Clean up, no shared state | Some gaps      | Shared state        | test-quality.md         |
+| Fixture Patterns   | Pure fn → Fixture         | Some fixtures  | No fixtures         | fixture-architecture.md |
+| Data Factories     | Factory functions         | Some factories | Hardcoded data      | data-factories.md       |
+| Network-First      | Intercept before navigate | Some correct   | Race conditions     | network-first.md        |
+| Assertions         | Explicit assertions       | Some implicit  | Missing assertions  | test-quality.md         |
+| Test Length        | ≤300 lines                | 301-500 lines  | >500 lines          | test-quality.md         |
+| Test Duration      | ≤1.5 min                  | 1.5-3 min      | >3 min              | test-quality.md         |
+| Flakiness Patterns | No flaky patterns         | Some potential | Multiple patterns   | ci-burn-in.md           |
+
+---
+
+## Example Review Summary
+
+````markdown
+# Test Quality Review: auth-login.spec.ts
+
+**Quality Score**: 78/100 (B - Acceptable)
+**Review Date**: 2025-10-14
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
+
+**Strengths:**
+
+- Excellent BDD structure with clear Given-When-Then comments
+- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
+- Comprehensive assertions on authentication state
+
+**Weaknesses:**
+
+- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
+- Hardcoded test data (email: 'test@example.com') - use factories instead
+- Missing fixture for common login setup - DRY violation
+
+**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
+
+## Critical Issues (Must Fix)
+
+### 1. Hard Wait Detected (Line 45)
+
+**Severity**: P0 (Critical)
+**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
+**Fix**: Use explicit wait for element or network request instead
+**Knowledge**: See test-quality.md, network-first.md
+
+```typescript
+// ❌ Bad (current)
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+
+// ✅ Good (recommended)
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+````
+
+## Recommendations (Should Fix)
+
+### 1. Use Data Factory for Test User (Lines 23, 32, 41)
+
+**Severity**: P1 (High)
+**Issue**: Hardcoded email 'test@example.com' - maintainability risk
+**Fix**: Create factory function for test users
+**Knowledge**: See data-factories.md
+
+```typescript
+// ✅ Good (recommended)
+import { createTestUser } from './factories/user-factory';
+
+const testUser = createTestUser({ role: 'admin' });
+await loginPage.login(testUser.email, testUser.password);
+```
+
+### 2. Extract Login Setup to Fixture (Lines 18-28)
+
+**Severity**: P1 (High)
+**Issue**: Login setup repeated across tests - DRY violation
+**Fix**: Create fixture for authenticated state
+**Knowledge**: See fixture-architecture.md
+
+```typescript
+// ✅ Good (recommended)
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    await loginPage.login(user.email, user.password);
+    await use(page);
+  },
+});
+
+test('user can access dashboard', async ({ authenticatedPage }) => {
+  // Test starts already logged in
+});
+```
+
+## Quality Score Breakdown
+
+- Starting Score: 100
+- Critical Violations (1 × -10): -10
+- High Violations (2 × -5): -10
+- Medium Violations (0 × -2): 0
+- Low Violations (1 × -1): -1
+- Bonus (BDD +5, Test IDs +5): +10
+- **Final Score**: 78/100 (B)
+
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+- **atdd**: Generate acceptance tests (TEA reviews them for quality)
+- **automate**: Expand regression suite (TEA reviews new tests)
+- **dev story**: Developer writes implementation tests (TEA reviews them)
+
+### After Test Review
+
+- **Developer**: Addresses critical issues, improves based on recommendations
+- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria context
+- **Test Design**: Review validates tests align with prioritization
+- **Knowledge Base**: Review references fragments for detailed guidance
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified for specific scenarios
+3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
+4. **Actionable**: Every issue includes recommended fix with code examples
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review same tests periodically as patterns evolve
+
+---
+
+## Troubleshooting
+
+**Problem: No test files found**
+- Verify test_dir path is correct
+- Check test file extensions match glob pattern
+- Ensure test files exist in expected location
+
+**Problem: Quality score seems too low/high**
+- Review violation counts - may need to adjust thresholds
+- Consider context - some projects have different standards
+- Focus on critical issues first, not just score
+
+**Problem: Inline comments not generated**
+- Check generate_inline_comments: true in variables
+- Verify write permissions on test files
+- Review append_to_file: false (separate report mode)
+
+**Problem: Knowledge fragments not loading**
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct
+- Ensure auto_load_knowledge: true in variables
+```
--- a/src/modules/bmm/workflows/testarch/test-review/test-review-template.md
+++ b/src/modules/bmm/workflows/testarch/test-review/test-review-template.md
@@ -0,0 +1,388 @@
+# Test Quality Review: {test_filename}
+
+**Quality Score**: {score}/100 ({grade} - {assessment})
+**Review Date**: {YYYY-MM-DD}
+**Review Scope**: {single | directory | suite}
+**Reviewer**: {user_name or TEA Agent}
+
+---
+
+## Executive Summary
+
+**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+### Key Strengths
+
+✅ {strength_1}
+✅ {strength_2}
+✅ {strength_3}
+
+### Key Weaknesses
+
+❌ {weakness_1}
+❌ {weakness_2}
+❌ {weakness_3}
+
+### Summary
+
+{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
+
+---
+
+## Quality Criteria Assessment
+
+| Criterion                            | Status                          | Violations | Notes        |
+| ------------------------------------ | ------------------------------- | ---------- | ------------ |
+| BDD Format (Given-When-Then)         | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test IDs                             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Priority Markers (P0/P1/P2/P3)       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Hard Waits (sleep, waitForTimeout)   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Determinism (no conditionals)        | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Fixture Patterns                     | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Data Factories                       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Network-First Pattern                | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Explicit Assertions                  | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test Length (≤300 lines)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines}    | {brief_note} |
+| Test Duration (≤1.5 min)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
+| Flakiness Patterns                   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+
+**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
+
+---
+
+## Quality Score Breakdown
+
+```
+Starting Score:          100
+Critical Violations:     -{critical_count} × 10 = -{critical_deduction}
+High Violations:         -{high_count} × 5 = -{high_deduction}
+Medium Violations:       -{medium_count} × 2 = -{medium_deduction}
+Low Violations:          -{low_count} × 1 = -{low_deduction}
+
+Bonus Points:
+  Excellent BDD:         +{0|5}
+  Comprehensive Fixtures: +{0|5}
+  Data Factories:        +{0|5}
+  Network-First:         +{0|5}
+  Perfect Isolation:     +{0|5}
+  All Test IDs:          +{0|5}
+                         --------
+Total Bonus:             +{bonus_total}
+
+Final Score:             {final_score}/100
+Grade:                   {grade}
+```
+
+---
+
+## Critical Issues (Must Fix)
+
+{If no critical issues: "No critical issues detected. ✅"}
+
+{For each critical issue:}
+
+### {issue_number}. {Issue Title}
+
+**Severity**: P0 (Critical)
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what the problem is and why it's critical}
+
+**Current Code**:
+
+```typescript
+// ❌ Bad (current implementation)
+{
+  code_snippet_showing_problem;
+}
+```
+
+**Recommended Fix**:
+
+```typescript
+// ✅ Good (recommended approach)
+{
+  code_snippet_showing_solution;
+}
+```
+
+**Why This Matters**:
+{Explanation of impact - flakiness risk, maintainability, reliability}
+
+**Related Violations**:
+{If similar issue appears elsewhere, note line numbers}
+
+---
+
+## Recommendations (Should Fix)
+
+{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
+
+{For each recommendation:}
+
+### {rec_number}. {Recommendation Title}
+
+**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what could be improved and why}
+
+**Current Code**:
+
+```typescript
+// ⚠️ Could be improved (current implementation)
+{
+  code_snippet_showing_current_approach;
+}
+```
+
+**Recommended Improvement**:
+
+```typescript
+// ✅ Better approach (recommended)
+{
+  code_snippet_showing_improvement;
+}
+```
+
+**Benefits**:
+{Explanation of benefits - maintainability, readability, reusability}
+
+**Priority**:
+{Why this is P1/P2/P3 - urgency and impact}
+
+---
+
+## Best Practices Found
+
+{If good patterns found, highlight them}
+
+{For each best practice:}
+
+### {practice_number}. {Best Practice Title}
+
+**Location**: `{filename}:{line_number}`
+**Pattern**: {pattern_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Why This Is Good**:
+{Explanation of why this pattern is excellent}
+
+**Code Example**:
+
+```typescript
+// ✅ Excellent pattern demonstrated in this test
+{
+  code_snippet_showing_best_practice;
+}
+```
+
+**Use as Reference**:
+{Encourage using this pattern in other tests}
+
+---
+
+## Test File Analysis
+
+### File Metadata
+
+- **File Path**: `{relative_path_from_project_root}`
+- **File Size**: {line_count} lines, {kb_size} KB
+- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
+- **Language**: {TypeScript | JavaScript}
+
+### Test Structure
+
+- **Describe Blocks**: {describe_count}
+- **Test Cases (it/test)**: {test_count}
+- **Average Test Length**: {avg_lines_per_test} lines per test
+- **Fixtures Used**: {fixture_count} ({fixture_names})
+- **Data Factories Used**: {factory_count} ({factory_names})
+
+### Test Coverage Scope
+
+- **Test IDs**: {test_id_list}
+- **Priority Distribution**:
+  - P0 (Critical): {p0_count} tests
+  - P1 (High): {p1_count} tests
+  - P2 (Medium): {p2_count} tests
+  - P3 (Low): {p3_count} tests
+  - Unknown: {unknown_count} tests
+
+### Assertions Analysis
+
+- **Total Assertions**: {assertion_count}
+- **Assertions per Test**: {avg_assertions_per_test} (avg)
+- **Assertion Types**: {assertion_types_used}
+
+---
+
+## Context and Integration
+
+### Related Artifacts
+
+{If story file found:}
+
+- **Story File**: [{story_filename}]({story_path})
+- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
+
+{If test-design found:}
+
+- **Test Design**: [{test_design_filename}]({test_design_path})
+- **Risk Assessment**: {risk_level}
+- **Priority Framework**: P0-P3 applied
+
+### Acceptance Criteria Validation
+
+{If story file available, map tests to ACs:}
+
+| Acceptance Criterion | Test ID   | Status                     | Notes   |
+| -------------------- | --------- | -------------------------- | ------- |
+| {AC_1}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_2}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_3}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+
+**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
+
+---
+
+## Knowledge Base References
+
+This review consulted the following knowledge base fragments:
+
+- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function → Fixture → mergeTests pattern
+- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
+- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
+- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
+- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
+- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
+- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
+- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
+- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
+
+See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
+
+---
+
+## Next Steps
+
+### Immediate Actions (Before Merge)
+
+1. **{action_1}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+2. **{action_2}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+### Follow-up Actions (Future PRs)
+
+1. **{action_1}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+2. **{action_2}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+### Re-Review Needed?
+
+{✅ No re-review needed - approve as-is}
+{⚠️ Re-review after critical fixes - request changes, then re-review}
+{❌ Major refactor required - block merge, pair programming recommended}
+
+---
+
+## Decision
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+**Rationale**:
+{1-2 paragraph explanation of recommendation based on findings}
+
+**For Approve**:
+
+> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
+
+**For Approve with Comments**:
+
+> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
+
+**For Request Changes**:
+
+> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
+
+**For Block**:
+
+> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
+
+---
+
+## Appendix
+
+### Violation Summary by Location
+
+{Table of all violations sorted by line number:}
+
+| Line   | Severity      | Criterion   | Issue         | Fix         |
+| ------ | ------------- | ----------- | ------------- | ----------- |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+
+### Quality Trends
+
+{If reviewing same file multiple times, show trend:}
+
+| Review Date  | Score         | Grade     | Critical Issues | Trend       |
+| ------------ | ------------- | --------- | --------------- | ----------- |
+| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1}       | ⬆️ Improved |
+| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2}       | ⬇️ Declined |
+| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3}       | ➡️ Stable   |
+
+### Related Reviews
+
+{If reviewing multiple files in directory/suite:}
+
+| File     | Score       | Grade   | Critical | Status             |
+| -------- | ----------- | ------- | -------- | ------------------ |
+| {file_1} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_2} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_3} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+
+**Suite Average**: {avg_score}/100 ({avg_grade})
+
+---
+
+## Review Metadata
+
+**Generated By**: BMad TEA Agent (Test Architect)
+**Workflow**: testarch-test-review v4.0
+**Review ID**: test-review-{filename}-{YYYYMMDD}
+**Timestamp**: {YYYY-MM-DD HH:MM:SS}
+**Version**: 1.0
+
+---
+
+## Feedback on This Review
+
+If you have questions or feedback on this review:
+
+1. Review patterns in knowledge base: `testarch/knowledge/`
+2. Consult tea-index.csv for detailed guidance
+3. Request clarification on specific violations
+4. Pair with QA engineer to apply patterns
+
+This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.
--- a/src/modules/bmm/workflows/testarch/test-review/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/test-review/workflow.yaml
@@ -0,0 +1,99 @@
+# Test Architect workflow: test-review
+name: testarch-test-review
+description: "Review test quality using comprehensive knowledge base and best practices validation"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-review"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-review-template.md"
+
+# Variables and inputs
+variables:
+  # Review target
+  test_file_path: "" # Explicit test file to review (if not provided, auto-discover)
+  test_dir: "{project-root}/tests"
+  review_scope: "single" # single (one file), directory (folder), suite (all tests)
+
+  # Review configuration
+  quality_score_enabled: true # Calculate 0-100 quality score
+  append_to_file: false # true = inline comments, false = separate report
+  check_against_knowledge: true # Use tea-index.csv fragments for validation
+  strict_mode: false # Strict = fail on any violation, Relaxed = advisory only
+
+  # Quality criteria to check
+  check_given_when_then: true # BDD format validation
+  check_test_ids: true # Test ID conventions (e.g., 1.3-E2E-001)
+  check_priority_markers: true # P0/P1/P2/P3 classification
+  check_hard_waits: true # Detect sleep(), wait(X), hardcoded delays
+  check_determinism: true # No conditionals (if/else), no try/catch abuse
+  check_isolation: true # Tests clean up, no shared state
+  check_fixture_patterns: true # Pure function → Fixture → mergeTests
+  check_data_factories: true # Factory usage vs hardcoded data
+  check_network_first: true # Route intercept before navigate
+  check_assertions: true # Explicit assertions, not implicit waits
+  check_test_length: true # Warn if >300 lines per file
+  check_test_duration: true # Warn if individual test >1.5 min
+  check_flakiness_patterns: true # Common flaky patterns (race conditions, timing)
+
+  # Integration with BMad artifacts
+  use_story_file: true # Load story for context (acceptance criteria)
+  use_test_design: true # Load test-design for priority context
+  auto_discover_story: true # Find related story by test ID
+
+  # Output configuration
+  output_file: "{output_folder}/test-review-{filename}.md"
+  generate_inline_comments: false # Add TODO comments in test files
+  generate_quality_badge: true # Create quality badge/score
+  append_to_story: false # Add review section to story file
+
+  # Knowledge base fragments to load
+  knowledge_fragments:
+    - test-quality.md # Definition of Done for tests
+    - fixture-architecture.md # Pure function → Fixture patterns
+    - network-first.md # Route interception before navigation
+    - data-factories.md # Factory patterns and best practices
+    - test-levels-framework.md # E2E vs API vs Component vs Unit
+    - playwright-config.md # Configuration patterns (if Playwright)
+    - tdd-cycles.md # Red-Green-Refactor patterns
+    - selective-testing.md # Duplicate coverage detection
+
+# Output configuration
+default_output_file: "{output_folder}/test-review.md"
+
+# Required tools
+required_tools:
+  - read_file # Read test files, story, test-design
+  - write_file # Create review report
+  - list_files # Discover test files in directory
+  - search_repo # Find tests by patterns
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - test_file: "Test file to review (single file mode)"
+  - test_dir: "Directory of tests to review (directory mode)"
+  - story: "Related story for acceptance criteria context (optional)"
+  - test_design: "Test design for priority context (optional)"
+
+tags:
+  - qa
+  - test-architect
+  - code-review
+  - quality
+  - best-practices
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true # Can review multiple files
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/trace/README.md
+++ b/src/modules/bmm/workflows/testarch/trace/README.md
@@ -0,0 +1,802 @@
+# Requirements Traceability & Quality Gate Workflow
+
+**Workflow ID:** `testarch-trace`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *trace`
+
+---
+
+## Overview
+
+The **trace** workflow operates in two sequential phases to validate test coverage and deployment readiness:
+
+**PHASE 1 - REQUIREMENTS TRACEABILITY:** Generates comprehensive requirements-to-tests traceability matrix that maps acceptance criteria to implemented tests, identifies coverage gaps, and provides actionable recommendations.
+
+**PHASE 2 - QUALITY GATE DECISION:** Makes deterministic release decisions (PASS/CONCERNS/FAIL/WAIVED) based on traceability results, test execution evidence, and non-functional requirements validation.
+
+**Key Features:**
+
+- Maps acceptance criteria to specific test cases across all levels (E2E, API, Component, Unit)
+- Classifies coverage status (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- Prioritizes gaps by risk level (P0/P1/P2/P3)
+- Applies deterministic decision rules for deployment readiness
+- Generates gate decisions with evidence and rationale
+- Supports waivers for business-approved exceptions
+- Updates workflow status and notifies stakeholders
+- Creates CI/CD-ready YAML snippets for quality gates
+- Detects duplicate coverage across test levels
+- Verifies test quality (assertions, structure, performance)
+
+---
+
+## When to Use This Workflow
+
+Use `*trace` when you need to:
+
+### Phase 1 - Traceability
+
+- ✅ Validate that all acceptance criteria have test coverage
+- ✅ Identify coverage gaps before release or PR merge
+- ✅ Generate traceability documentation for compliance or audits
+- ✅ Ensure critical paths (P0/P1) are fully tested
+- ✅ Detect duplicate coverage across test levels
+- ✅ Assess test quality across your suite
+
+### Phase 2 - Gate Decision (Optional)
+
+- ✅ Make final go/no-go deployment decision
+- ✅ Validate test execution results against thresholds
+- ✅ Evaluate non-functional requirements (security, performance)
+- ✅ Generate audit trail for release approval
+- ✅ Handle business waivers for critical deadlines
+- ✅ Notify stakeholders of gate decision
+
+**Typical Timing:**
+
+- After tests are implemented (post-ATDD or post-development)
+- Before merging a PR (validate P0/P1 coverage)
+- Before release (validate full coverage and make gate decision)
+- During sprint retrospectives (assess test quality)
+
+---
+
+## Prerequisites
+
+### Phase 1 - Traceability (Required)
+
+- Acceptance criteria (from story file OR inline)
+- Implemented test suite (or acknowledged gaps)
+
+### Phase 2 - Gate Decision (Required if `enable_gate_decision: true`)
+
+- Test execution results (CI/CD test reports, pass/fail rates)
+- Test design with risk priorities (P0/P1/P2/P3)
+
+### Recommended
+
+- `test-design.md` - Risk assessment and test priorities
+- `nfr-assessment.md` - Non-functional requirements validation (for release gates)
+- `tech-spec.md` - Technical implementation details
+- Test framework configuration (playwright.config.ts, jest.config.js)
+
+**Halt Conditions:**
+
+- Story lacks any tests AND gaps are not acknowledged → Run `*atdd` first
+- Acceptance criteria are completely missing → Provide criteria or story file
+- Phase 2 enabled but test execution results missing → Warn and skip gate decision
+
+---
+
+## Usage
+
+### Basic Usage (Both Phases)
+
+```bash
+bmad tea *trace
+```
+
+The workflow will:
+
+1. **Phase 1**: Read story file, extract acceptance criteria, auto-discover tests, generate traceability matrix
+2. **Phase 2**: Load test execution results, apply decision rules, generate gate decision document
+3. Save traceability matrix to `bmad/output/traceability-matrix.md`
+4. Save gate decision to `bmad/output/gate-decision-story-X.X.md`
+
+### Phase 1 Only (Skip Gate Decision)
+
+```bash
+bmad tea *trace --enable-gate-decision false
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *trace \
+  --story-file "bmad/output/story-1.3.md" \
+  --test-results "ci-artifacts/test-report.xml" \
+  --min-p0-coverage 100 \
+  --min-p1-coverage 90 \
+  --min-p0-pass-rate 100 \
+  --min-p1-pass-rate 95
+```
+
+### Standalone Mode (No Story File)
+
+```bash
+bmad tea *trace --acceptance-criteria "AC-1: User can login with email..."
+```
+
+---
+
+## Workflow Steps
+
+### PHASE 1: Requirements Traceability
+
+1. **Load Context** - Read story, test design, tech spec, knowledge base
+2. **Discover Tests** - Auto-find tests related to story (by ID, describe blocks, file paths)
+3. **Map Criteria** - Link acceptance criteria to specific test cases
+4. **Analyze Gaps** - Identify missing coverage and prioritize by risk
+5. **Verify Quality** - Check test quality (assertions, structure, performance)
+6. **Generate Deliverables** - Create traceability matrix, gate YAML, coverage badge
+
+### PHASE 2: Quality Gate Decision (if `enable_gate_decision: true`)
+
+7. **Gather Evidence** - Load traceability results, test execution reports, NFR assessments
+8. **Apply Decision Rules** - Evaluate against thresholds (PASS/CONCERNS/FAIL/WAIVED)
+9. **Document Decision** - Create gate decision document with evidence and rationale
+10. **Update Status & Notify** - Append to bmm-workflow-status.md, notify stakeholders
+
+---
+
+## Outputs
+
+### Phase 1: Traceability Matrix (`traceability-matrix.md`)
+
+Comprehensive markdown file with:
+
+- Coverage summary table (by priority)
+- Detailed criterion-to-test mapping
+- Gap analysis with recommendations
+- Quality assessment for each test
+- Gate YAML snippet
+
+**Example:**
+
+```markdown
+# Traceability Matrix - Story 1.3
+
+## Coverage Summary
+
+| Priority | Total | FULL | Coverage % | Status  |
+| -------- | ----- | ---- | ---------- | ------- |
+| P0       | 3     | 3    | 100%       | ✅ PASS |
+| P1       | 5     | 4    | 80%        | ⚠️ WARN |
+
+Gate Status: CONCERNS ⚠️ (P1 coverage below 90%)
+```
+
+### Phase 2: Gate Decision Document (`gate-decision-{type}-{id}.md`)
+
+**Decision Document** with:
+
+- **Decision**: PASS / CONCERNS / FAIL / WAIVED with clear rationale
+- **Evidence Summary**: Test results, coverage, NFRs, quality validation
+- **Decision Criteria Table**: Each criterion with threshold, actual, status
+- **Rationale**: Explanation of decision based on evidence
+- **Residual Risks**: Unresolved issues (for CONCERNS/WAIVED)
+- **Waiver Details**: Approver, justification, remediation plan (for WAIVED)
+- **Next Steps**: Action items for each decision type
+
+**Example:**
+
+```markdown
+# Quality Gate Decision: Story 1.3 - User Login
+
+**Decision**: ⚠️ CONCERNS
+**Date**: 2025-10-15
+
+## Decision Criteria
+
+| Criterion    | Threshold | Actual | Status  |
+| ------------ | --------- | ------ | ------- |
+| P0 Coverage  | ≥100%     | 100%   | ✅ PASS |
+| P1 Coverage  | ≥90%      | 88%    | ⚠️ FAIL |
+| Overall Pass | ≥90%      | 96%    | ✅ PASS |
+
+**Decision**: CONCERNS (P1 coverage 88% below 90% threshold)
+
+## Next Steps
+
+- Deploy with monitoring
+- Create follow-up story for AC-5 test
+```
+
+### Secondary Outputs
+
+- **Gate YAML**: Machine-readable snippet for CI/CD integration
+- **Status Update**: Appends decision to `bmm-workflow-status.md` history
+- **Stakeholder Notification**: Auto-generated summary message
+- **Updated Story File**: Traceability section added (optional)
+
+---
+
+## Decision Logic (Phase 2)
+
+### PASS Decision ✅
+
+**All criteria met:**
+
+- ✅ P0 coverage ≥ 100%
+- ✅ P1 coverage ≥ 90%
+- ✅ Overall coverage ≥ 80%
+- ✅ P0 test pass rate = 100%
+- ✅ P1 test pass rate ≥ 95%
+- ✅ Overall test pass rate ≥ 90%
+- ✅ Security issues = 0
+- ✅ Critical NFR failures = 0
+
+**Action:** Deploy to production with standard monitoring
+
+---
+
+### CONCERNS Decision ⚠️
+
+**P0 criteria met, but P1 criteria degraded:**
+
+- ✅ P0 coverage = 100%
+- ⚠️ P1 coverage 80-89% (below 90% threshold)
+- ⚠️ P1 test pass rate 90-94% (below 95% threshold)
+- ✅ No security issues
+- ✅ No critical NFR failures
+
+**Residual Risks:** Minor P1 issues, edge cases, non-critical gaps
+
+**Action:** Deploy with enhanced monitoring, create backlog stories for fixes
+
+**Note:** CONCERNS does NOT block deployment but requires acknowledgment
+
+---
+
+### FAIL Decision ❌
+
+**Any P0 criterion failed:**
+
+- ❌ P0 coverage <100% (missing critical tests)
+- OR ❌ P0 test pass rate <100% (failing critical tests)
+- OR ❌ P1 coverage <80% (significant gap)
+- OR ❌ Security issues >0
+- OR ❌ Critical NFR failures >0
+
+**Critical Blockers:** P0 test failures, security vulnerabilities, critical NFRs
+
+**Action:** Block deployment, fix critical issues, re-run gate after fixes
+
+---
+
+### WAIVED Decision 🔓
+
+**FAIL status + business-approved waiver:**
+
+- ❌ Original decision: FAIL
+- 🔓 Waiver approved by: {VP Engineering / CTO / Product Owner}
+- 📋 Business justification: {regulatory deadline, contractual obligation}
+- 📅 Waiver expiry: {date - does NOT apply to future releases}
+- 🔧 Remediation plan: {fix in next release, due date}
+
+**Action:** Deploy with business approval, aggressive monitoring, fix ASAP
+
+**Important:** Waivers NEVER apply to P0 security issues or data corruption risks
+
+---
+
+## Coverage Classifications (Phase 1)
+
+- **FULL** ✅ - All scenarios validated at appropriate level(s)
+- **PARTIAL** ⚠️ - Some coverage but missing edge cases or levels
+- **NONE** ❌ - No test coverage at any level
+- **UNIT-ONLY** ⚠️ - Only unit tests (missing integration/E2E validation)
+- **INTEGRATION-ONLY** ⚠️ - Only API/Component tests (missing unit confidence)
+
+---
+
+## Quality Gates
+
+| Priority | Coverage Requirement | Pass Rate Requirement | Severity | Action             |
+| -------- | -------------------- | --------------------- | -------- | ------------------ |
+| P0       | 100%                 | 100%                  | BLOCKER  | Do not release     |
+| P1       | 90%                  | 95%                   | HIGH     | Block PR merge     |
+| P2       | 80% (recommended)    | 85% (recommended)     | MEDIUM   | Address in nightly |
+| P3       | No requirement       | No requirement        | LOW      | Optional           |
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # Target specification
+  story_file: '' # Path to story markdown
+  acceptance_criteria: '' # Inline criteria if no story
+
+  # Test discovery
+  test_dir: '{project-root}/tests'
+  auto_discover_tests: true
+
+  # Traceability configuration
+  coverage_levels: 'e2e,api,component,unit'
+  map_by_test_id: true
+  map_by_describe: true
+  map_by_filename: true
+
+  # Gap analysis
+  prioritize_by_risk: true
+  suggest_missing_tests: true
+  check_duplicate_coverage: true
+
+  # Output configuration
+  output_file: '{output_folder}/traceability-matrix.md'
+  generate_gate_yaml: true
+  generate_coverage_badge: true
+  update_story_file: true
+
+  # Quality gates (Phase 1 recommendations)
+  min_p0_coverage: 100
+  min_p1_coverage: 90
+  min_overall_coverage: 80
+
+  # PHASE 2: Gate Decision Variables
+  enable_gate_decision: true # Run gate decision after traceability
+
+  # Gate target specification
+  gate_type: 'story' # story | epic | release | hotfix
+
+  # Gate decision configuration
+  decision_mode: 'deterministic' # deterministic | manual
+  allow_waivers: true
+  require_evidence: true
+
+  # Input sources for gate
+  nfr_file: '' # Path to nfr-assessment.md (optional)
+  test_results: '' # Path to test execution results (required for Phase 2)
+
+  # Decision criteria thresholds
+  min_p0_pass_rate: 100
+  min_p1_pass_rate: 95
+  min_overall_pass_rate: 90
+  max_critical_nfrs_fail: 0
+  max_security_issues: 0
+
+  # Risk tolerance
+  allow_p2_failures: true
+  allow_p3_failures: true
+  escalate_p1_failures: true
+
+  # Gate output configuration
+  gate_output_file: '{output_folder}/gate-decision-{gate_type}-{story_id}.md'
+  append_to_history: true
+  notify_stakeholders: true
+
+  # Advanced gate options
+  check_all_workflows_complete: true
+  validate_evidence_freshness: true
+  require_sign_off: false
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+**Phase 1 (Traceability):**
+
+- `traceability.md` - Requirements mapping patterns
+- `test-priorities.md` - P0/P1/P2/P3 risk framework
+- `risk-governance.md` - Risk-based testing approach
+- `test-quality.md` - Definition of Done for tests
+- `selective-testing.md` - Duplicate coverage patterns
+
+**Phase 2 (Gate Decision):**
+
+- `risk-governance.md` - Quality gate criteria and decision framework
+- `probability-impact.md` - Risk scoring for residual risks
+- `test-quality.md` - Quality standards validation
+- `test-priorities.md` - Priority classification framework
+
+---
+
+## Example Scenarios
+
+### Example 1: Full Coverage with Gate PASS
+
+```bash
+# Validate coverage and make gate decision
+bmad tea *trace --story-file "bmad/output/story-1.3.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+# Traceability Matrix - Story 1.3
+
+## Coverage Summary
+
+| Priority | Total | FULL | Coverage % | Status  |
+| -------- | ----- | ---- | ---------- | ------- |
+| P0       | 3     | 3    | 100%       | ✅ PASS |
+| P1       | 5     | 5    | 100%       | ✅ PASS |
+
+Gate Status: Ready for Phase 2 ✅
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 1.3
+
+**Decision**: ✅ PASS
+
+Evidence:
+
+- P0 Coverage: 100% ✅
+- P1 Coverage: 100% ✅
+- P0 Pass Rate: 100% (12/12 tests) ✅
+- P1 Pass Rate: 98% (45/46 tests) ✅
+- Overall Pass Rate: 96% ✅
+
+Next Steps:
+
+1. Deploy to staging
+2. Monitor for 24 hours
+3. Deploy to production
+```
+
+---
+
+### Example 2: Gap Identification with CONCERNS Decision
+
+```bash
+# Find gaps and evaluate readiness
+bmad tea *trace --story-file "bmad/output/story-2.1.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+- None ✅
+
+### High Priority Gaps (PR BLOCKER)
+
+1. **AC-3: Password reset email edge cases**
+   - Recommend: Add 1.3-API-001 (email service integration)
+   - Impact: Users may not recover accounts in error scenarios
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 2.1
+
+**Decision**: ⚠️ CONCERNS
+
+Evidence:
+
+- P0 Coverage: 100% ✅
+- P1 Coverage: 88% ⚠️ (below 90%)
+- Test Pass Rate: 96% ✅
+
+Residual Risks:
+
+- AC-3 missing E2E test for email error handling
+
+Next Steps:
+
+- Deploy with monitoring
+- Create follow-up story for AC-3 test
+- Monitor production for edge cases
+```
+
+---
+
+### Example 3: Critical Blocker with FAIL Decision
+
+```bash
+# Critical issues detected
+bmad tea *trace --story-file "bmad/output/story-3.2.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+1. **AC-2: Invalid login security validation**
+   - Priority: P0
+   - Status: NONE (no tests)
+   - Impact: Security vulnerability - users can bypass login
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 3.2
+
+**Decision**: ❌ FAIL
+
+Critical Blockers:
+
+- P0 Coverage: 80% ❌ (AC-2 missing)
+- Security Risk: Login bypass vulnerability
+
+Next Steps:
+
+1. BLOCK DEPLOYMENT IMMEDIATELY
+2. Add P0 test for AC-2: 1.3-E2E-004
+3. Re-run full test suite
+4. Re-run gate after fixes verified
+```
+
+---
+
+### Example 4: Business Override with WAIVED Decision
+
+```bash
+# FAIL with business waiver
+bmad tea *trace --story-file "bmad/output/release-2.4.0.md" \
+  --test-results "ci-artifacts/test-report.xml" \
+  --allow-waivers true
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Release 2.4.0
+
+**Original Decision**: ❌ FAIL
+**Final Decision**: 🔓 WAIVED
+
+Waiver Details:
+
+- Approver: Jane Doe, VP Engineering
+- Reason: GDPR compliance deadline (regulatory, Oct 15)
+- Expiry: 2025-10-15 (does NOT apply to v2.5.0)
+- Monitoring: Enhanced error tracking
+- Remediation: Fix in v2.4.1 hotfix (due Oct 20)
+
+Business Justification:
+Release contains critical GDPR features required by law. Failed
+test affects legacy feature used by <1% of users. Workaround available.
+
+Next Steps:
+
+1. Deploy v2.4.0 with waiver approval
+2. Monitor error rates aggressively
+3. Fix issue in v2.4.1 (Oct 20)
+```
+
+---
+
+## Troubleshooting
+
+### Phase 1 Issues
+
+#### "No tests found for this story"
+
+- Run `*atdd` workflow first to generate failing acceptance tests
+- Check test file naming conventions (may not match story ID pattern)
+- Verify test directory path is correct (`test_dir` variable)
+
+#### "Cannot determine coverage status"
+
+- Tests may lack explicit mapping (no test IDs, unclear describe blocks)
+- Add test IDs: `{STORY_ID}-{LEVEL}-{SEQ}` (e.g., `1.3-E2E-001`)
+- Use Given-When-Then narrative in test descriptions
+
+#### "P0 coverage below 100%"
+
+- This is a **BLOCKER** - do not release
+- Identify missing P0 tests in gap analysis
+- Run `*atdd` workflow to generate missing tests
+- Verify P0 classification is correct with stakeholders
+
+#### "Duplicate coverage detected"
+
+- Review `selective-testing.md` knowledge fragment
+- Determine if overlap is acceptable (defense in depth) or wasteful
+- Consolidate tests at appropriate level (logic → unit, journey → E2E)
+
+### Phase 2 Issues
+
+#### "Test execution results missing"
+
+- Phase 2 gate decision requires `test_results` (CI/CD test reports)
+- If missing, Phase 2 will be skipped with warning
+- Provide JUnit XML, TAP, or JSON test report path via `test_results` variable
+
+#### "Gate decision is FAIL but deployment needed urgently"
+
+- Request business waiver (if `allow_waivers: true`)
+- Document approver, justification, mitigation plan
+- Create follow-up stories to address gaps
+- Use WAIVED decision only for non-P0 gaps
+- **Never waive**: Security issues, data corruption risks
+
+#### "Assessments are stale (>7 days old)"
+
+- Re-run `*test-design` workflow
+- Re-run traceability (Phase 1)
+- Re-run `*nfr-assess` workflow
+- Update evidence files before gate decision
+
+#### "Unclear decision (edge case)"
+
+- Switch to manual mode: `decision_mode: manual`
+- Document assumptions and rationale clearly
+- Escalate to tech lead or architect for guidance
+- Consider waiver if business-critical
+
+---
+
+## Integration with Other Workflows
+
+### Before Trace
+
+1. **testarch-test-design** - Define test priorities (P0/P1/P2/P3)
+2. **testarch-atdd** - Generate failing acceptance tests
+3. **testarch-automate** - Expand regression suite
+
+### After Trace (Phase 2 Decision)
+
+- **PASS**: Proceed to deployment workflow
+- **CONCERNS**: Deploy with monitoring, create remediation backlog stories
+- **FAIL**: Block deployment, fix issues, re-run trace workflow
+- **WAIVED**: Deploy with business approval, escalate monitoring
+
+### Complements
+
+- `*trace` → **testarch-nfr-assess** - Use NFR validation in gate decision
+- `*trace` → **testarch-test-review** - Flag quality issues for review
+- **CI/CD Pipeline** - Use gate YAML for automated quality gates
+
+---
+
+## Best Practices
+
+### Phase 1 - Traceability
+
+1. **Run Trace After Test Implementation**
+   - Don't run `*trace` before tests exist (run `*atdd` first)
+   - Trace is most valuable after initial test suite is written
+
+2. **Prioritize by Risk**
+   - P0 gaps are BLOCKERS (must fix before release)
+   - P1 gaps are HIGH priority (block PR merge)
+   - P3 gaps are acceptable (fix if time permits)
+
+3. **Explicit Mapping**
+   - Use test IDs (`1.3-E2E-001`) for clear traceability
+   - Reference criteria in describe blocks
+   - Use Given-When-Then narrative
+
+4. **Avoid Duplicate Coverage**
+   - Test each behavior at appropriate level only
+   - Unit tests for logic, E2E for journeys
+   - Only overlap for defense in depth on critical paths
+
+### Phase 2 - Gate Decision
+
+5. **Evidence is King**
+   - Never make gate decisions without fresh test results
+   - Validate evidence freshness (<7 days old)
+   - Link to all evidence sources (reports, logs, artifacts)
+
+6. **P0 is Sacred**
+   - P0 failures ALWAYS result in FAIL (no exceptions except waivers)
+   - P0 = Critical user journeys, security, data integrity
+   - Waivers require VP/CTO approval + business justification
+
+7. **Waivers are Temporary**
+   - Waiver applies ONLY to specific release
+   - Issue must be fixed in next release
+   - Never waive: security, data corruption, compliance violations
+
+8. **CONCERNS is Not PASS**
+   - CONCERNS means "deploy with monitoring"
+   - Create follow-up stories for issues
+   - Do not ignore CONCERNS repeatedly
+
+9. **Automate Gate Integration**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Configuration Examples
+
+### Strict Gate (Zero Tolerance)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 100
+min_overall_coverage: 90
+min_p0_pass_rate: 100
+min_p1_pass_rate: 100
+min_overall_pass_rate: 95
+allow_waivers: false
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Financial systems, healthcare, security-critical features
+
+---
+
+### Balanced Gate (Production Standard - Default)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 90
+min_overall_coverage: 80
+min_p0_pass_rate: 100
+min_p1_pass_rate: 95
+min_overall_pass_rate: 90
+allow_waivers: true
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Most production releases
+
+---
+
+### Relaxed Gate (Early Development)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 80
+min_overall_coverage: 70
+min_p0_pass_rate: 100
+min_p1_pass_rate: 85
+min_overall_pass_rate: 80
+allow_waivers: true
+allow_p2_failures: true
+allow_p3_failures: true
+```
+
+Use for: Alpha/beta releases, internal tools, proof-of-concept
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define test priorities and risk assessment
+- `bmad tea *atdd` - Generate failing acceptance tests for gaps
+- `bmad tea *automate` - Expand regression suite based on gaps
+- `bmad tea *nfr-assess` - Validate non-functional requirements (for gate)
+- `bmad tea *test-review` - Review test quality issues flagged by trace
+- `bmad sm story-approved` - Mark story as complete (triggers gate)
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps (both phases)
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./trace-template.md) - Traceability matrix template
+- [Knowledge Base](../../testarch/knowledge/) - Testing best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/checklist.md
+++ b/src/modules/bmm/workflows/testarch/trace/checklist.md
@@ -0,0 +1,654 @@
+# Requirements Traceability & Gate Decision - Validation Checklist
+
+**Workflow:** `testarch-trace`
+**Purpose:** Ensure complete traceability matrix with actionable gap analysis AND make deployment readiness decision (PASS/CONCERNS/FAIL/WAIVED)
+
+This checklist covers **two sequential phases**:
+
+- **PHASE 1**: Requirements Traceability (always executed)
+- **PHASE 2**: Quality Gate Decision (executed if `enable_gate_decision: true`)
+
+---
+
+# PHASE 1: REQUIREMENTS TRACEABILITY
+
+## Prerequisites Validation
+
+- [ ] Acceptance criteria are available (from story file OR inline)
+- [ ] Test suite exists (or gaps are acknowledged and documented)
+- [ ] Test directory path is correct (`test_dir` variable)
+- [ ] Story file is accessible (if using BMad mode)
+- [ ] Knowledge base is loaded (test-priorities, traceability, risk-governance)
+
+---
+
+## Context Loading
+
+- [ ] Story file read successfully (if applicable)
+- [ ] Acceptance criteria extracted correctly
+- [ ] Story ID identified (e.g., 1.3)
+- [ ] `test-design.md` loaded (if available)
+- [ ] `tech-spec.md` loaded (if available)
+- [ ] `PRD.md` loaded (if available)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`
+
+---
+
+## Test Discovery and Cataloging
+
+- [ ] Tests auto-discovered using multiple strategies (test IDs, describe blocks, file paths)
+- [ ] Tests categorized by level (E2E, API, Component, Unit)
+- [ ] Test metadata extracted:
+  - [ ] Test IDs (e.g., 1.3-E2E-001)
+  - [ ] Describe/context blocks
+  - [ ] It blocks (individual test cases)
+  - [ ] Given-When-Then structure (if BDD)
+  - [ ] Priority markers (P0/P1/P2/P3)
+- [ ] All relevant test files found (no tests missed due to naming conventions)
+
+---
+
+## Criteria-to-Test Mapping
+
+- [ ] Each acceptance criterion mapped to tests (or marked as NONE)
+- [ ] Explicit references found (test IDs, describe blocks mentioning criterion)
+- [ ] Test level documented (E2E, API, Component, Unit)
+- [ ] Given-When-Then narrative verified for alignment
+- [ ] Traceability matrix table generated:
+  - [ ] Criterion ID
+  - [ ] Description
+  - [ ] Test ID
+  - [ ] Test File
+  - [ ] Test Level
+  - [ ] Coverage Status
+
+---
+
+## Coverage Classification
+
+- [ ] Coverage status classified for each criterion:
+  - [ ] **FULL** - All scenarios validated at appropriate level(s)
+  - [ ] **PARTIAL** - Some coverage but missing edge cases or levels
+  - [ ] **NONE** - No test coverage at any level
+  - [ ] **UNIT-ONLY** - Only unit tests (missing integration/E2E validation)
+  - [ ] **INTEGRATION-ONLY** - Only API/Component tests (missing unit confidence)
+- [ ] Classification justifications provided
+- [ ] Edge cases considered in FULL vs PARTIAL determination
+
+---
+
+## Duplicate Coverage Detection
+
+- [ ] Duplicate coverage checked across test levels
+- [ ] Acceptable overlap identified (defense in depth for critical paths)
+- [ ] Unacceptable duplication flagged (same validation at multiple levels)
+- [ ] Recommendations provided for consolidation
+- [ ] Selective testing principles applied
+
+---
+
+## Gap Analysis
+
+- [ ] Coverage gaps identified:
+  - [ ] Criteria with NONE status
+  - [ ] Criteria with PARTIAL status
+  - [ ] Criteria with UNIT-ONLY status
+  - [ ] Criteria with INTEGRATION-ONLY status
+- [ ] Gaps prioritized by risk level using test-priorities framework:
+  - [ ] **CRITICAL** - P0 criteria without FULL coverage (BLOCKER)
+  - [ ] **HIGH** - P1 criteria without FULL coverage (PR blocker)
+  - [ ] **MEDIUM** - P2 criteria without FULL coverage (nightly gap)
+  - [ ] **LOW** - P3 criteria without FULL coverage (acceptable)
+- [ ] Specific test recommendations provided for each gap:
+  - [ ] Suggested test level (E2E, API, Component, Unit)
+  - [ ] Test description (Given-When-Then)
+  - [ ] Recommended test ID (e.g., 1.3-E2E-004)
+  - [ ] Explanation of why test is needed
+
+---
+
+## Coverage Metrics
+
+- [ ] Overall coverage percentage calculated (FULL coverage / total criteria)
+- [ ] P0 coverage percentage calculated
+- [ ] P1 coverage percentage calculated
+- [ ] P2 coverage percentage calculated (if applicable)
+- [ ] Coverage by level calculated:
+  - [ ] E2E coverage %
+  - [ ] API coverage %
+  - [ ] Component coverage %
+  - [ ] Unit coverage %
+
+---
+
+## Test Quality Verification
+
+For each mapped test, verify:
+
+- [ ] Explicit assertions are present (not hidden in helpers)
+- [ ] Test follows Given-When-Then structure
+- [ ] No hard waits or sleeps (deterministic waiting only)
+- [ ] Self-cleaning (test cleans up its data)
+- [ ] File size < 300 lines
+- [ ] Test duration < 90 seconds
+
+Quality issues flagged:
+
+- [ ] **BLOCKER** issues identified (missing assertions, hard waits, flaky patterns)
+- [ ] **WARNING** issues identified (large files, slow tests, unclear structure)
+- [ ] **INFO** issues identified (style inconsistencies, missing documentation)
+
+Knowledge fragments referenced:
+
+- [ ] `test-quality.md` for Definition of Done
+- [ ] `fixture-architecture.md` for self-cleaning patterns
+- [ ] `network-first.md` for Playwright best practices
+- [ ] `data-factories.md` for test data patterns
+
+---
+
+## Phase 1 Deliverables Generated
+
+### Traceability Matrix Markdown
+
+- [ ] File created at `{output_folder}/traceability-matrix.md`
+- [ ] Template from `trace-template.md` used
+- [ ] Full mapping table included
+- [ ] Coverage status section included
+- [ ] Gap analysis section included
+- [ ] Quality assessment section included
+- [ ] Recommendations section included
+
+### Coverage Badge/Metric (if enabled)
+
+- [ ] Badge markdown generated
+- [ ] Metrics exported to JSON for CI/CD integration
+
+### Updated Story File (if enabled)
+
+- [ ] "Traceability" section added to story markdown
+- [ ] Link to traceability matrix included
+- [ ] Coverage summary included
+
+---
+
+## Phase 1 Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All acceptance criteria accounted for (none skipped)
+- [ ] Test IDs correctly formatted (e.g., 1.3-E2E-001)
+- [ ] File paths are correct and accessible
+- [ ] Coverage percentages calculated correctly
+- [ ] No false positives (tests incorrectly mapped to criteria)
+- [ ] No false negatives (existing tests missed in mapping)
+
+### Completeness Checks
+
+- [ ] All test levels considered (E2E, API, Component, Unit)
+- [ ] All priorities considered (P0, P1, P2, P3)
+- [ ] All coverage statuses used appropriately (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- [ ] All gaps have recommendations
+- [ ] All quality issues have severity and remediation guidance
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Test IDs suggested for new tests
+- [ ] Given-When-Then provided for recommended tests
+- [ ] Impact explained for each gap
+- [ ] Priorities clear (CRITICAL, HIGH, MEDIUM, LOW)
+
+---
+
+## Phase 1 Documentation
+
+- [ ] Traceability matrix is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+
+---
+
+# PHASE 2: QUALITY GATE DECISION
+
+**Note**: Phase 2 executes only if `enable_gate_decision: true` in workflow.yaml
+
+---
+
+## Prerequisites
+
+### Evidence Gathering
+
+- [ ] Test execution results obtained (CI/CD pipeline, test framework reports)
+- [ ] Story/epic/release file identified and read
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Traceability matrix discovered or explicitly provided (available from Phase 1)
+- [ ] NFR assessment discovered or explicitly provided (if available)
+- [ ] Code coverage report discovered or explicitly provided (if available)
+- [ ] Burn-in results discovered or explicitly provided (if available)
+
+### Evidence Validation
+
+- [ ] Evidence freshness validated (warn if >7 days old, recommend re-running workflows)
+- [ ] All required assessments available or user acknowledged gaps
+- [ ] Test results are complete (not partial or interrupted runs)
+- [ ] Test results match current codebase (not from outdated branch)
+
+### Knowledge Base Loading
+
+- [ ] `risk-governance.md` loaded successfully
+- [ ] `probability-impact.md` loaded successfully
+- [ ] `test-quality.md` loaded successfully
+- [ ] `test-priorities.md` loaded successfully
+- [ ] `ci-burn-in.md` loaded (if burn-in results available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Gate type identified (story/epic/release/hotfix)
+- [ ] Target ID extracted (story_id, epic_num, or release_version)
+- [ ] Decision thresholds loaded from workflow variables
+- [ ] Risk tolerance configuration loaded
+- [ ] Waiver policy loaded
+
+### Step 2: Evidence Parsing
+
+**Test Results:**
+
+- [ ] Total test count extracted
+- [ ] Passed test count extracted
+- [ ] Failed test count extracted
+- [ ] Skipped test count extracted
+- [ ] Test duration extracted
+- [ ] P0 test pass rate calculated
+- [ ] P1 test pass rate calculated
+- [ ] Overall test pass rate calculated
+
+**Quality Assessments:**
+
+- [ ] P0/P1/P2/P3 scenarios extracted from test-design.md (if available)
+- [ ] Risk scores extracted from test-design.md (if available)
+- [ ] Coverage percentages extracted from traceability-matrix.md (available from Phase 1)
+- [ ] Coverage gaps extracted from traceability-matrix.md (available from Phase 1)
+- [ ] NFR status extracted from nfr-assessment.md (if available)
+- [ ] Security issues count extracted from nfr-assessment.md (if available)
+
+**Code Coverage:**
+
+- [ ] Line coverage percentage extracted (if available)
+- [ ] Branch coverage percentage extracted (if available)
+- [ ] Function coverage percentage extracted (if available)
+- [ ] Critical path coverage validated (if available)
+
+**Burn-in Results:**
+
+- [ ] Burn-in iterations count extracted (if available)
+- [ ] Flaky tests count extracted (if available)
+- [ ] Stability score calculated (if available)
+
+### Step 3: Decision Rules Application
+
+**P0 Criteria Evaluation:**
+
+- [ ] P0 test pass rate evaluated (must be 100%)
+- [ ] P0 acceptance criteria coverage evaluated (must be 100%)
+- [ ] Security issues count evaluated (must be 0)
+- [ ] Critical NFR failures evaluated (must be 0)
+- [ ] Flaky tests evaluated (must be 0 if burn-in enabled)
+- [ ] P0 decision recorded: PASS or FAIL
+
+**P1 Criteria Evaluation:**
+
+- [ ] P1 test pass rate evaluated (threshold: min_p1_pass_rate)
+- [ ] P1 acceptance criteria coverage evaluated (threshold: 95%)
+- [ ] Overall test pass rate evaluated (threshold: min_overall_pass_rate)
+- [ ] Code coverage evaluated (threshold: min_coverage)
+- [ ] P1 decision recorded: PASS or CONCERNS
+
+**P2/P3 Criteria Evaluation:**
+
+- [ ] P2 failures tracked (informational, don't block if allow_p2_failures: true)
+- [ ] P3 failures tracked (informational, don't block if allow_p3_failures: true)
+- [ ] Residual risks documented
+
+**Final Decision:**
+
+- [ ] Decision determined: PASS / CONCERNS / FAIL / WAIVED
+- [ ] Decision rationale documented
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+
+### Step 4: Documentation
+
+**Gate Decision Document Created:**
+
+- [ ] Story/epic/release info section complete (ID, title, description, links)
+- [ ] Decision clearly stated (PASS / CONCERNS / FAIL / WAIVED)
+- [ ] Decision date recorded
+- [ ] Evaluator recorded (user or agent name)
+
+**Evidence Summary Documented:**
+
+- [ ] Test results summary complete (total, passed, failed, pass rates)
+- [ ] Coverage summary complete (P0/P1 criteria, code coverage)
+- [ ] NFR validation summary complete (security, performance, reliability, maintainability)
+- [ ] Flakiness summary complete (burn-in iterations, flaky test count)
+
+**Rationale Documented:**
+
+- [ ] Decision rationale clearly explained
+- [ ] Key evidence highlighted
+- [ ] Assumptions and caveats noted (if any)
+
+**Residual Risks Documented (if CONCERNS or WAIVED):**
+
+- [ ] Unresolved P1/P2 issues listed
+- [ ] Probability × impact estimated for each risk
+- [ ] Mitigations or workarounds described
+
+**Waivers Documented (if WAIVED):**
+
+- [ ] Waiver reason documented (business justification)
+- [ ] Waiver approver documented (name, role)
+- [ ] Waiver expiry date documented
+- [ ] Remediation plan documented (fix in next release, due date)
+- [ ] Monitoring plan documented
+
+**Critical Issues Documented (if FAIL or CONCERNS):**
+
+- [ ] Top 5-10 critical issues listed
+- [ ] Priority assigned to each issue (P0/P1/P2)
+- [ ] Owner assigned to each issue
+- [ ] Due date assigned to each issue
+
+**Recommendations Documented:**
+
+- [ ] Next steps clearly stated for decision type
+- [ ] Deployment recommendation provided
+- [ ] Monitoring recommendations provided (if applicable)
+- [ ] Remediation recommendations provided (if applicable)
+
+### Step 5: Status Updates and Notifications
+
+**Status File Updated:**
+
+- [ ] Gate decision appended to bmm-workflow-status.md (if append_to_history: true)
+- [ ] Format correct: `[DATE] Gate Decision: DECISION - Target {ID} - {rationale}`
+- [ ] Status file committed or staged for commit
+
+**Gate YAML Created:**
+
+- [ ] Gate YAML snippet generated with decision and criteria
+- [ ] Evidence references included in YAML
+- [ ] Next steps included in YAML
+- [ ] YAML file saved to output folder
+
+**Stakeholder Notification Generated:**
+
+- [ ] Notification subject line created
+- [ ] Notification body created with summary
+- [ ] Recipients identified (PM, SM, DEV lead, stakeholders)
+- [ ] Notification ready for delivery (if notify_stakeholders: true)
+
+**Outputs Saved:**
+
+- [ ] Gate decision document saved to `{output_file}`
+- [ ] Gate YAML saved to `{output_folder}/gate-decision-{target}.yaml`
+- [ ] All outputs are valid and readable
+
+---
+
+## Phase 2 Output Validation
+
+### Gate Decision Document
+
+**Completeness:**
+
+- [ ] All required sections present (info, decision, evidence, rationale, next steps)
+- [ ] No placeholder text or TODOs left in document
+- [ ] All evidence references are accurate and complete
+- [ ] All links to artifacts are valid
+
+**Accuracy:**
+
+- [ ] Decision matches applied criteria rules
+- [ ] Test results match CI/CD pipeline output
+- [ ] Coverage percentages match reports
+- [ ] NFR status matches assessment document
+- [ ] No contradictions or inconsistencies
+
+**Clarity:**
+
+- [ ] Decision rationale is clear and unambiguous
+- [ ] Technical jargon is explained or avoided
+- [ ] Stakeholders can understand next steps
+- [ ] Recommendations are actionable
+
+### Gate YAML
+
+**Format:**
+
+- [ ] YAML is valid (no syntax errors)
+- [ ] All required fields present (target, decision, date, evaluator, criteria, evidence)
+- [ ] Field values are correct data types (numbers, strings, dates)
+
+**Content:**
+
+- [ ] Criteria values match decision document
+- [ ] Evidence references are accurate
+- [ ] Next steps align with decision type
+
+---
+
+## Phase 2 Quality Checks
+
+### Decision Integrity
+
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+- [ ] P0 failures result in FAIL decision (unless waived)
+- [ ] Security issues result in FAIL decision (unless waived - but should never be waived)
+- [ ] Waivers have business justification and approver (if WAIVED)
+- [ ] Residual risks are documented (if CONCERNS or WAIVED)
+
+### Evidence-Based
+
+- [ ] Decision is based on actual test results (not guesses)
+- [ ] All claims are supported by evidence
+- [ ] No assumptions without documentation
+- [ ] Evidence sources are cited (CI run IDs, report URLs)
+
+### Transparency
+
+- [ ] Decision rationale is transparent and auditable
+- [ ] Criteria evaluation is documented step-by-step
+- [ ] Any deviations from standard process are explained
+- [ ] Waiver justifications are clear (if applicable)
+
+### Consistency
+
+- [ ] Decision aligns with risk-governance knowledge fragment
+- [ ] Priority framework (P0/P1/P2/P3) applied consistently
+- [ ] Terminology consistent with test-quality knowledge fragment
+- [ ] Decision matrix followed correctly
+
+---
+
+## Phase 2 Integration Points
+
+### BMad Workflow Status
+
+- [ ] Gate decision added to `bmm-workflow-status.md`
+- [ ] Format matches existing gate history entries
+- [ ] Timestamp is accurate
+- [ ] Decision summary is concise (<80 chars)
+
+### CI/CD Pipeline
+
+- [ ] Gate YAML is CI/CD-compatible
+- [ ] YAML can be parsed by pipeline automation
+- [ ] Decision can be used to block/allow deployments
+- [ ] Evidence references are accessible to pipeline
+
+### Stakeholders
+
+- [ ] Notification message is clear and actionable
+- [ ] Decision is explained in non-technical terms
+- [ ] Next steps are specific and time-bound
+- [ ] Recipients are appropriate for decision type
+
+---
+
+## Phase 2 Compliance and Audit
+
+### Audit Trail
+
+- [ ] Decision date and time recorded
+- [ ] Evaluator identified (user or agent)
+- [ ] All evidence sources cited
+- [ ] Decision criteria documented
+- [ ] Rationale clearly explained
+
+### Traceability
+
+- [ ] Gate decision traceable to story/epic/release
+- [ ] Evidence traceable to specific test runs
+- [ ] Assessments traceable to workflows that created them
+- [ ] Waiver traceable to approver (if applicable)
+
+### Compliance
+
+- [ ] Security requirements validated (no unresolved vulnerabilities)
+- [ ] Quality standards met or waived with justification
+- [ ] Regulatory requirements addressed (if applicable)
+- [ ] Documentation sufficient for external audit
+
+---
+
+## Phase 2 Edge Cases and Exceptions
+
+### Missing Evidence
+
+- [ ] If test-design.md missing, decision still possible with test results + trace
+- [ ] If traceability-matrix.md missing, decision still possible with test results (but Phase 1 should provide it)
+- [ ] If nfr-assessment.md missing, NFR validation marked as NOT ASSESSED
+- [ ] If code coverage missing, coverage criterion marked as NOT ASSESSED
+- [ ] User acknowledged gaps in evidence or provided alternative proof
+
+### Stale Evidence
+
+- [ ] Evidence freshness checked (if validate_evidence_freshness: true)
+- [ ] Warnings issued for assessments >7 days old
+- [ ] User acknowledged stale evidence or re-ran workflows
+- [ ] Decision document notes any stale evidence used
+
+### Conflicting Evidence
+
+- [ ] Conflicts between test results and assessments resolved
+- [ ] Most recent/authoritative source identified
+- [ ] Conflict resolution documented in decision rationale
+- [ ] User consulted if conflict cannot be resolved
+
+### Waiver Scenarios
+
+- [ ] Waiver only used for FAIL decision (not PASS or CONCERNS)
+- [ ] Waiver has business justification (not technical convenience)
+- [ ] Waiver has named approver with authority (VP/CTO/PO)
+- [ ] Waiver has expiry date (does NOT apply to future releases)
+- [ ] Waiver has remediation plan with concrete due date
+- [ ] Security vulnerabilities are NOT waived (enforced)
+
+---
+
+# FINAL VALIDATION (Both Phases)
+
+## Non-Prescriptive Validation
+
+- [ ] Traceability format adapted to team needs (not rigid template)
+- [ ] Examples are minimal and focused on patterns
+- [ ] Teams can extend with custom classifications
+- [ ] Integration with external systems supported (JIRA, Azure DevOps)
+- [ ] Compliance requirements considered (if applicable)
+
+---
+
+## Documentation and Communication
+
+- [ ] All documents are readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Gate decision is prominent and unambiguous (Phase 2)
+
+---
+
+## Final Validation
+
+**Phase 1 (Traceability):**
+
+- [ ] All prerequisites met
+- [ ] All acceptance criteria mapped or gaps documented
+- [ ] P0 coverage is 100% OR documented as BLOCKER
+- [ ] Gap analysis is complete and prioritized
+- [ ] Test quality issues identified and flagged
+- [ ] Deliverables generated and saved
+
+**Phase 2 (Gate Decision):**
+
+- [ ] All quality evidence gathered
+- [ ] Decision criteria applied correctly
+- [ ] Decision rationale documented
+- [ ] Gate YAML ready for CI/CD integration
+- [ ] Status file updated (if enabled)
+- [ ] Stakeholders notified (if enabled)
+
+**Workflow Complete:**
+
+- [ ] Phase 1 completed successfully
+- [ ] Phase 2 completed successfully (if enabled)
+- [ ] All outputs validated and saved
+- [ ] Ready to proceed based on gate decision
+
+---
+
+## Sign-Off
+
+**Phase 1 - Traceability Status:**
+
+- [ ] ✅ PASS - All quality gates met, no critical gaps
+- [ ] ⚠️ WARN - P1 gaps exist, address before PR merge
+- [ ] ❌ FAIL - P0 gaps exist, BLOCKER for release
+
+**Phase 2 - Gate Decision Status (if enabled):**
+
+- [ ] ✅ PASS - Deploy to production
+- [ ] ⚠️ CONCERNS - Deploy with monitoring
+- [ ] ❌ FAIL - Block deployment, fix issues
+- [ ] 🔓 WAIVED - Deploy with business approval and remediation plan
+
+**Next Actions:**
+
+- If PASS (both phases): Proceed to deployment
+- If WARN/CONCERNS: Address gaps/issues, proceed with monitoring
+- If FAIL (either phase): Run `*atdd` for missing tests, fix issues, re-run `*trace`
+- If WAIVED: Deploy with approved waiver, schedule remediation
+
+---
+
+## Notes
+
+Record any issues, deviations, or important observations during workflow execution:
+
+- **Phase 1 Issues**: [Note any traceability mapping challenges, missing tests, quality concerns]
+- **Phase 2 Issues**: [Note any missing, stale, or conflicting evidence]
+- **Decision Rationale**: [Document any nuanced reasoning or edge cases]
+- **Waiver Details**: [Document waiver negotiations or approvals]
+- **Follow-up Actions**: [List any actions required after gate decision]
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/instructions.md
+++ b/src/modules/bmm/workflows/testarch/trace/instructions.md
--- a/src/modules/bmm/workflows/testarch/trace/trace-template.md
+++ b/src/modules/bmm/workflows/testarch/trace/trace-template.md
@@ -0,0 +1,673 @@
+# Traceability Matrix & Gate Decision - Story {STORY_ID}
+
+**Story:** {STORY_TITLE}
+**Date:** {DATE}
+**Evaluator:** {user_name or TEA Agent}
+
+---
+
+## PHASE 1: REQUIREMENTS TRACEABILITY
+
+### Coverage Summary
+
+| Priority  | Total Criteria | FULL Coverage | Coverage % | Status       |
+| --------- | -------------- | ------------- | ---------- | ------------ |
+| P0        | {P0_TOTAL}     | {P0_FULL}     | {P0_PCT}%  | {P0_STATUS}  |
+| P1        | {P1_TOTAL}     | {P1_FULL}     | {P1_PCT}%  | {P1_STATUS}  |
+| P2        | {P2_TOTAL}     | {P2_FULL}     | {P2_PCT}%  | {P2_STATUS}  |
+| P3        | {P3_TOTAL}     | {P3_FULL}     | {P3_PCT}%  | {P3_STATUS}  |
+| **Total** | **{TOTAL}**    | **{FULL}**    | **{PCT}%** | **{STATUS}** |
+
+**Legend:**
+
+- ✅ PASS - Coverage meets quality gate threshold
+- ⚠️ WARN - Coverage below threshold but not critical
+- ❌ FAIL - Coverage below minimum threshold (blocker)
+
+---
+
+### Detailed Mapping
+
+#### {CRITERION_ID}: {CRITERION_DESCRIPTION} ({PRIORITY})
+
+- **Coverage:** {COVERAGE_STATUS} {STATUS_ICON}
+- **Tests:**
+  - `{TEST_ID}` - {TEST_FILE}:{LINE}
+    - **Given:** {GIVEN}
+    - **When:** {WHEN}
+    - **Then:** {THEN}
+  - `{TEST_ID_2}` - {TEST_FILE_2}:{LINE}
+    - **Given:** {GIVEN_2}
+    - **When:** {WHEN_2}
+    - **Then:** {THEN_2}
+
+- **Gaps:** (if PARTIAL or UNIT-ONLY or INTEGRATION-ONLY)
+  - Missing: {MISSING_SCENARIO_1}
+  - Missing: {MISSING_SCENARIO_2}
+
+- **Recommendation:** {RECOMMENDATION_TEXT}
+
+---
+
+#### Example: AC-1: User can login with email and password (P0)
+
+- **Coverage:** FULL ✅
+- **Tests:**
+  - `1.3-E2E-001` - tests/e2e/auth.spec.ts:12
+    - **Given:** User has valid credentials
+    - **When:** User submits login form
+    - **Then:** User is redirected to dashboard
+  - `1.3-UNIT-001` - tests/unit/auth-service.spec.ts:8
+    - **Given:** Valid email and password hash
+    - **When:** validateCredentials is called
+    - **Then:** Returns user object
+
+---
+
+#### Example: AC-3: User can reset password via email (P1)
+
+- **Coverage:** PARTIAL ⚠️
+- **Tests:**
+  - `1.3-E2E-003` - tests/e2e/auth.spec.ts:44
+    - **Given:** User requests password reset
+    - **When:** User clicks reset link in email
+    - **Then:** User can set new password
+
+- **Gaps:**
+  - Missing: Email delivery validation
+  - Missing: Expired token handling (error path)
+  - Missing: Invalid token handling (security test)
+  - Missing: Unit test for token generation logic
+
+- **Recommendation:** Add `1.3-API-001` for email service integration testing and `1.3-UNIT-003` for token generation logic. Add `1.3-E2E-004` for error path validation (expired/invalid tokens).
+
+---
+
+### Gap Analysis
+
+#### Critical Gaps (BLOCKER) ❌
+
+{CRITICAL_GAP_COUNT} gaps found. **Do not release until resolved.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P0)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+#### High Priority Gaps (PR BLOCKER) ⚠️
+
+{HIGH_GAP_COUNT} gaps found. **Address before PR merge.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P1)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+#### Medium Priority Gaps (Nightly) ⚠️
+
+{MEDIUM_GAP_COUNT} gaps found. **Address in nightly test improvements.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P2)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+
+---
+
+#### Low Priority Gaps (Optional) ℹ️
+
+{LOW_GAP_COUNT} gaps found. **Optional - add if time permits.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P3)
+   - Current Coverage: {COVERAGE_STATUS}
+
+---
+
+### Quality Assessment
+
+#### Tests with Issues
+
+**BLOCKER Issues** ❌
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**WARNING Issues** ⚠️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**INFO Issues** ℹ️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+---
+
+#### Example Quality Issues
+
+**WARNING Issues** ⚠️
+
+- `1.3-E2E-001` - 145 seconds (exceeds 90s target) - Optimize fixture setup to reduce test duration
+- `1.3-UNIT-005` - 320 lines (exceeds 300 line limit) - Split into multiple focused test files
+
+**INFO Issues** ℹ️
+
+- `1.3-E2E-002` - Missing Given-When-Then structure - Refactor describe block to use BDD format
+
+---
+
+#### Tests Passing Quality Gates
+
+**{PASSING_TEST_COUNT}/{TOTAL_TEST_COUNT} tests ({PASSING_PCT}%) meet all quality criteria** ✅
+
+---
+
+### Duplicate Coverage Analysis
+
+#### Acceptable Overlap (Defense in Depth)
+
+- {CRITERION_ID}: Tested at unit (business logic) and E2E (user journey) ✅
+
+#### Unacceptable Duplication ⚠️
+
+- {CRITERION_ID}: Same validation at E2E and Component level
+  - Recommendation: Remove {TEST_ID} or consolidate with {OTHER_TEST_ID}
+
+---
+
+### Coverage by Test Level
+
+| Test Level | Tests             | Criteria Covered     | Coverage %       |
+| ---------- | ----------------- | -------------------- | ---------------- |
+| E2E        | {E2E_COUNT}       | {E2E_CRITERIA}       | {E2E_PCT}%       |
+| API        | {API_COUNT}       | {API_CRITERIA}       | {API_PCT}%       |
+| Component  | {COMP_COUNT}      | {COMP_CRITERIA}      | {COMP_PCT}%      |
+| Unit       | {UNIT_COUNT}      | {UNIT_CRITERIA}      | {UNIT_PCT}%      |
+| **Total**  | **{TOTAL_TESTS}** | **{TOTAL_CRITERIA}** | **{TOTAL_PCT}%** |
+
+---
+
+### Traceability Recommendations
+
+#### Immediate Actions (Before PR Merge)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+#### Short-term Actions (This Sprint)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+#### Long-term Actions (Backlog)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+
+---
+
+#### Example Recommendations
+
+**Immediate Actions (Before PR Merge)**
+
+1. **Add P1 Password Reset Tests** - Implement `1.3-API-001` for email service integration and `1.3-E2E-004` for error path validation. P1 coverage currently at 80%, target is 90%.
+2. **Optimize Slow E2E Test** - Refactor `1.3-E2E-001` to use faster fixture setup. Currently 145s, target is <90s.
+
+**Short-term Actions (This Sprint)**
+
+1. **Enhance P2 Coverage** - Add E2E validation for session timeout (`1.3-E2E-005`). Currently UNIT-ONLY coverage.
+2. **Split Large Test File** - Break `1.3-UNIT-005` (320 lines) into multiple focused test files (<300 lines each).
+
+**Long-term Actions (Backlog)**
+
+1. **Enrich P3 Coverage** - Add tests for edge cases in P3 criteria if time permits.
+
+---
+
+## PHASE 2: QUALITY GATE DECISION
+
+**Gate Type:** {story | epic | release | hotfix}
+**Decision Mode:** {deterministic | manual}
+
+---
+
+### Evidence Summary
+
+#### Test Execution Results
+
+- **Total Tests**: {total_count}
+- **Passed**: {passed_count} ({pass_percentage}%)
+- **Failed**: {failed_count} ({fail_percentage}%)
+- **Skipped**: {skipped_count} ({skip_percentage}%)
+- **Duration**: {total_duration}
+
+**Priority Breakdown:**
+
+- **P0 Tests**: {p0_passed}/{p0_total} passed ({p0_pass_rate}%) {✅ | ❌}
+- **P1 Tests**: {p1_passed}/{p1_total} passed ({p1_pass_rate}%) {✅ | ⚠️ | ❌}
+- **P2 Tests**: {p2_passed}/{p2_total} passed ({p2_pass_rate}%) {informational}
+- **P3 Tests**: {p3_passed}/{p3_total} passed ({p3_pass_rate}%) {informational}
+
+**Overall Pass Rate**: {overall_pass_rate}% {✅ | ⚠️ | ❌}
+
+**Test Results Source**: {CI_run_id | test_report_url | local_run}
+
+---
+
+#### Coverage Summary (from Phase 1)
+
+**Requirements Coverage:**
+
+- **P0 Acceptance Criteria**: {p0_covered}/{p0_total} covered ({p0_coverage}%) {✅ | ❌}
+- **P1 Acceptance Criteria**: {p1_covered}/{p1_total} covered ({p1_coverage}%) {✅ | ⚠️ | ❌}
+- **P2 Acceptance Criteria**: {p2_covered}/{p2_total} covered ({p2_coverage}%) {informational}
+- **Overall Coverage**: {overall_coverage}%
+
+**Code Coverage** (if available):
+
+- **Line Coverage**: {line_coverage}% {✅ | ⚠️ | ❌}
+- **Branch Coverage**: {branch_coverage}% {✅ | ⚠️ | ❌}
+- **Function Coverage**: {function_coverage}% {✅ | ⚠️ | ❌}
+
+**Coverage Source**: {coverage_report_url | coverage_file_path}
+
+---
+
+#### Non-Functional Requirements (NFRs)
+
+**Security**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- Security Issues: {security_issue_count}
+- {details_if_issues}
+
+**Performance**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {performance_metrics_summary}
+
+**Reliability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {reliability_metrics_summary}
+
+**Maintainability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {maintainability_metrics_summary}
+
+**NFR Source**: {nfr_assessment_file_path | not_assessed}
+
+---
+
+#### Flakiness Validation
+
+**Burn-in Results** (if available):
+
+- **Burn-in Iterations**: {iteration_count} (e.g., 10)
+- **Flaky Tests Detected**: {flaky_test_count} {✅ if 0 | ❌ if >0}
+- **Stability Score**: {stability_percentage}%
+
+**Flaky Tests List** (if any):
+
+- {flaky_test_1_name} - {failure_rate}
+- {flaky_test_2_name} - {failure_rate}
+
+**Burn-in Source**: {CI_burn_in_run_id | not_available}
+
+---
+
+### Decision Criteria Evaluation
+
+#### P0 Criteria (Must ALL Pass)
+
+| Criterion             | Threshold | Actual                    | Status   |
+| --------------------- | --------- | ------------------------- | -------- | -------- |
+| P0 Coverage           | 100%      | {p0_coverage}%            | {✅ PASS | ❌ FAIL} |
+| P0 Test Pass Rate     | 100%      | {p0_pass_rate}%           | {✅ PASS | ❌ FAIL} |
+| Security Issues       | 0         | {security_issue_count}    | {✅ PASS | ❌ FAIL} |
+| Critical NFR Failures | 0         | {critical_nfr_fail_count} | {✅ PASS | ❌ FAIL} |
+| Flaky Tests           | 0         | {flaky_test_count}        | {✅ PASS | ❌ FAIL} |
+
+**P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
+
+---
+
+#### P1 Criteria (Required for PASS, May Accept for CONCERNS)
+
+| Criterion              | Threshold                 | Actual               | Status   |
+| ---------------------- | ------------------------- | -------------------- | -------- | ----------- | -------- |
+| P1 Coverage            | ≥{min_p1_coverage}%       | {p1_coverage}%       | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| P1 Test Pass Rate      | ≥{min_p1_pass_rate}%      | {p1_pass_rate}%      | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Overall Test Pass Rate | ≥{min_overall_pass_rate}% | {overall_pass_rate}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Overall Coverage       | ≥{min_coverage}%          | {overall_coverage}%  | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+
+**P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
+
+---
+
+#### P2/P3 Criteria (Informational, Don't Block)
+
+| Criterion         | Actual          | Notes                                                        |
+| ----------------- | --------------- | ------------------------------------------------------------ |
+| P2 Test Pass Rate | {p2_pass_rate}% | {allow_p2_failures ? "Tracked, doesn't block" : "Evaluated"} |
+| P3 Test Pass Rate | {p3_pass_rate}% | {allow_p3_failures ? "Tracked, doesn't block" : "Evaluated"} |
+
+---
+
+### GATE DECISION: {PASS | CONCERNS | FAIL | WAIVED}
+
+---
+
+### Rationale
+
+{Explain decision based on criteria evaluation}
+
+{Highlight key evidence that drove decision}
+
+{Note any assumptions or caveats}
+
+**Example (PASS):**
+
+> All P0 criteria met with 100% coverage and pass rates across critical tests. All P1 criteria exceeded thresholds with 98% overall pass rate and 92% coverage. No security issues detected. No flaky tests in validation. Feature is ready for production deployment with standard monitoring.
+
+**Example (CONCERNS):**
+
+> All P0 criteria met, ensuring critical user journeys are protected. However, P1 coverage (88%) falls below threshold (90%) due to missing E2E test for AC-5 edge case. Overall pass rate (96%) is excellent. Issues are non-critical and have acceptable workarounds. Risk is low enough to deploy with enhanced monitoring.
+
+**Example (FAIL):**
+
+> CRITICAL BLOCKERS DETECTED:
+>
+> 1. P0 coverage incomplete (80%) - AC-2 security validation missing
+> 2. P0 test failures (75% pass rate) in core search functionality
+> 3. Unresolved SQL injection vulnerability in search filter (CRITICAL)
+>
+> Release MUST BE BLOCKED until P0 issues are resolved. Security vulnerability cannot be waived.
+
+**Example (WAIVED):**
+
+> Original decision was FAIL due to P0 test failure in legacy Excel 2007 export module (affects <1% of users). However, release contains critical GDPR compliance features required by regulatory deadline (Oct 15). Business has approved waiver given:
+>
+> - Regulatory priority overrides legacy module risk
+> - Workaround available (use Excel 2010+)
+> - Issue will be fixed in v2.4.1 hotfix (due Oct 20)
+> - Enhanced monitoring in place
+
+---
+
+### {Section: Delete if not applicable}
+
+#### Residual Risks (For CONCERNS or WAIVED)
+
+List unresolved P1/P2 issues that don't block release but should be tracked:
+
+1. **{Risk Description}**
+   - **Priority**: P1 | P2
+   - **Probability**: Low | Medium | High
+   - **Impact**: Low | Medium | High
+   - **Risk Score**: {probability × impact}
+   - **Mitigation**: {workaround or monitoring plan}
+   - **Remediation**: {fix in next sprint/release}
+
+**Overall Residual Risk**: {LOW | MEDIUM | HIGH}
+
+---
+
+#### Waiver Details (For WAIVED only)
+
+**Original Decision**: ❌ FAIL
+
+**Reason for Failure**:
+
+- {list_of_blocking_issues}
+
+**Waiver Information**:
+
+- **Waiver Reason**: {business_justification}
+- **Waiver Approver**: {name}, {role} (e.g., Jane Doe, VP Engineering)
+- **Approval Date**: {YYYY-MM-DD}
+- **Waiver Expiry**: {YYYY-MM-DD} (**NOTE**: Does NOT apply to next release)
+
+**Monitoring Plan**:
+
+- {enhanced_monitoring_1}
+- {enhanced_monitoring_2}
+- {escalation_criteria}
+
+**Remediation Plan**:
+
+- **Fix Target**: {next_release_version} (e.g., v2.4.1 hotfix)
+- **Due Date**: {YYYY-MM-DD}
+- **Owner**: {team_or_person}
+- **Verification**: {how_fix_will_be_verified}
+
+**Business Justification**:
+{detailed_explanation_of_why_waiver_is_acceptable}
+
+---
+
+#### Critical Issues (For FAIL or CONCERNS)
+
+Top blockers requiring immediate attention:
+
+| Priority | Issue         | Description         | Owner        | Due Date     | Status             |
+| -------- | ------------- | ------------------- | ------------ | ------------ | ------------------ |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P1       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+
+**Blocking Issues Count**: {p0_blocker_count} P0 blockers, {p1_blocker_count} P1 issues
+
+---
+
+### Gate Recommendations
+
+#### For PASS Decision ✅
+
+1. **Proceed to deployment**
+   - Deploy to staging environment
+   - Validate with smoke tests
+   - Monitor key metrics for 24-48 hours
+   - Deploy to production with standard monitoring
+
+2. **Post-Deployment Monitoring**
+   - {metric_1_to_monitor}
+   - {metric_2_to_monitor}
+   - {alert_thresholds}
+
+3. **Success Criteria**
+   - {success_criterion_1}
+   - {success_criterion_2}
+
+---
+
+#### For CONCERNS Decision ⚠️
+
+1. **Deploy with Enhanced Monitoring**
+   - Deploy to staging with extended validation period
+   - Enable enhanced logging/monitoring for known risk areas:
+     - {risk_area_1}
+     - {risk_area_2}
+   - Set aggressive alerts for potential issues
+   - Deploy to production with caution
+
+2. **Create Remediation Backlog**
+   - Create story: "{fix_title_1}" (Priority: {priority})
+   - Create story: "{fix_title_2}" (Priority: {priority})
+   - Target sprint: {next_sprint}
+
+3. **Post-Deployment Actions**
+   - Monitor {specific_areas} closely for {time_period}
+   - Weekly status updates on remediation progress
+   - Re-assess after fixes deployed
+
+---
+
+#### For FAIL Decision ❌
+
+1. **Block Deployment Immediately**
+   - Do NOT deploy to any environment
+   - Notify stakeholders of blocking issues
+   - Escalate to tech lead and PM
+
+2. **Fix Critical Issues**
+   - Address P0 blockers listed in Critical Issues section
+   - Owner assignments confirmed
+   - Due dates agreed upon
+   - Daily standup on blocker resolution
+
+3. **Re-Run Gate After Fixes**
+   - Re-run full test suite after fixes
+   - Re-run `bmad tea *trace` workflow
+   - Verify decision is PASS before deploying
+
+---
+
+#### For WAIVED Decision 🔓
+
+1. **Deploy with Business Approval**
+   - Confirm waiver approver has signed off
+   - Document waiver in release notes
+   - Notify all stakeholders of waived risks
+
+2. **Aggressive Monitoring**
+   - {enhanced_monitoring_plan}
+   - {escalation_procedures}
+   - Daily checks on waived risk areas
+
+3. **Mandatory Remediation**
+   - Fix MUST be completed by {due_date}
+   - Issue CANNOT be waived in next release
+   - Track remediation progress weekly
+   - Verify fix in next gate
+
+---
+
+### Next Steps
+
+**Immediate Actions** (next 24-48 hours):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Follow-up Actions** (next sprint/release):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Stakeholder Communication**:
+
+- Notify PM: {decision_summary}
+- Notify SM: {decision_summary}
+- Notify DEV lead: {decision_summary}
+
+---
+
+## Integrated YAML Snippet (CI/CD)
+
+```yaml
+traceability_and_gate:
+  # Phase 1: Traceability
+  traceability:
+    story_id: "{STORY_ID}"
+    date: "{DATE}"
+    coverage:
+      overall: {OVERALL_PCT}%
+      p0: {P0_PCT}%
+      p1: {P1_PCT}%
+      p2: {P2_PCT}%
+      p3: {P3_PCT}%
+    gaps:
+      critical: {CRITICAL_COUNT}
+      high: {HIGH_COUNT}
+      medium: {MEDIUM_COUNT}
+      low: {LOW_COUNT}
+    quality:
+      passing_tests: {PASSING_COUNT}
+      total_tests: {TOTAL_TESTS}
+      blocker_issues: {BLOCKER_COUNT}
+      warning_issues: {WARNING_COUNT}
+    recommendations:
+      - "{RECOMMENDATION_1}"
+      - "{RECOMMENDATION_2}"
+
+  # Phase 2: Gate Decision
+  gate_decision:
+    decision: "{PASS | CONCERNS | FAIL | WAIVED}"
+    gate_type: "{story | epic | release | hotfix}"
+    decision_mode: "{deterministic | manual}"
+    criteria:
+      p0_coverage: {p0_coverage}%
+      p0_pass_rate: {p0_pass_rate}%
+      p1_coverage: {p1_coverage}%
+      p1_pass_rate: {p1_pass_rate}%
+      overall_pass_rate: {overall_pass_rate}%
+      overall_coverage: {overall_coverage}%
+      security_issues: {security_issue_count}
+      critical_nfrs_fail: {critical_nfr_fail_count}
+      flaky_tests: {flaky_test_count}
+    thresholds:
+      min_p0_coverage: 100
+      min_p0_pass_rate: 100
+      min_p1_coverage: {min_p1_coverage}
+      min_p1_pass_rate: {min_p1_pass_rate}
+      min_overall_pass_rate: {min_overall_pass_rate}
+      min_coverage: {min_coverage}
+    evidence:
+      test_results: "{CI_run_id | test_report_url}"
+      traceability: "{trace_file_path}"
+      nfr_assessment: "{nfr_file_path}"
+      code_coverage: "{coverage_report_url}"
+    next_steps: "{brief_summary_of_recommendations}"
+    waiver: # Only if WAIVED
+      reason: "{business_justification}"
+      approver: "{name}, {role}"
+      expiry: "{YYYY-MM-DD}"
+      remediation_due: "{YYYY-MM-DD}"
+```
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH}
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **Test Results:** {TEST_RESULTS_PATH}
+- **NFR Assessment:** {NFR_FILE_PATH} (if available)
+- **Test Files:** {TEST_DIR_PATH}
+
+---
+
+## Sign-Off
+
+**Phase 1 - Traceability Assessment:**
+
+- Overall Coverage: {OVERALL_PCT}%
+- P0 Coverage: {P0_PCT}% {P0_STATUS}
+- P1 Coverage: {P1_PCT}% {P1_STATUS}
+- Critical Gaps: {CRITICAL_COUNT}
+- High Priority Gaps: {HIGH_COUNT}
+
+**Phase 2 - Gate Decision:**
+
+- **Decision**: {PASS | CONCERNS | FAIL | WAIVED} {STATUS_ICON}
+- **P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
+- **P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
+
+**Overall Status:** {STATUS} {STATUS_ICON}
+
+**Next Steps:**
+
+- If PASS ✅: Proceed to deployment
+- If CONCERNS ⚠️: Deploy with monitoring, create remediation backlog
+- If FAIL ❌: Block deployment, fix critical issues, re-run workflow
+- If WAIVED 🔓: Deploy with business approval and aggressive monitoring
+
+**Generated:** {DATE}
+**Workflow:** testarch-trace v4.0 (Enhanced with Gate Decision)
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/trace/workflow.yaml
@@ -1,25 +1,145 @@
-# Test Architect workflow: trace
+# Test Architect workflow: trace (enhanced with gate decision)
 name: testarch-trace
-description: "Trace requirements to implemented automated tests."
+description: "Generate requirements-to-tests traceability matrix, analyze coverage, and make quality gate decision (PASS/CONCERNS/FAIL/WAIVED)"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/trace"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/trace-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target specification
+  story_file: "" # Path to story markdown (e.g., bmad/output/story-1.3.md)
+  acceptance_criteria: "" # Optional - inline criteria if no story file
+
+  # Test discovery
+  test_dir: "{project-root}/tests"
+  source_dir: "{project-root}/src"
+  auto_discover_tests: true # Automatically find tests related to story
+
+  # Traceability configuration
+  coverage_levels: "e2e,api,component,unit" # Which levels to trace (comma-separated)
+  map_by_test_id: true # Use test IDs (e.g., 1.3-E2E-001) for mapping
+  map_by_describe: true # Use describe blocks for mapping
+  map_by_filename: true # Use file paths for mapping
+
+  # Coverage classification
+  require_explicit_mapping: true # Require tests to explicitly reference criteria
+  flag_unit_only: true # Flag criteria covered only by unit tests
+  flag_integration_only: true # Flag criteria covered only by integration tests
+  flag_partial_coverage: true # Flag criteria with incomplete coverage
+
+  # Gap analysis
+  prioritize_by_risk: true # Use test-priorities (P0/P1/P2/P3) for gap severity
+  suggest_missing_tests: true # Recommend specific tests to add
+  check_duplicate_coverage: true # Warn about same behavior tested at multiple levels
+
+  # Integration with BMad artifacts
+  use_test_design: true # Load test-design.md if exists (risk assessment)
+  use_tech_spec: true # Load tech-spec.md if exists (technical context)
+  use_prd: true # Load PRD.md if exists (requirements context)
+
+  # Output configuration
+  output_file: "{output_folder}/traceability-matrix.md"
+  generate_gate_yaml: true # Create gate YAML snippet with coverage summary
+  generate_coverage_badge: true # Create coverage badge/metric
+  update_story_file: true # Add traceability section to story file
+
+  # Quality gates
+  min_p0_coverage: 100 # Percentage (P0 must be 100% covered)
+  min_p1_coverage: 90 # Percentage
+  min_overall_coverage: 80 # Percentage
+
+  # Advanced options
+  auto_load_knowledge: true # Load traceability, risk-governance, test-quality fragments
+  include_code_coverage: false # Integrate with code coverage reports (Istanbul, NYC)
+  check_assertions: true # Verify explicit assertions in tests
+
+  # PHASE 2: Gate Decision Variables (runs after traceability)
+  enable_gate_decision: true # Run gate decision after traceability (Phase 2)
+
+  # Gate target specification
+  gate_type: "story" # story | epic | release | hotfix
+  # story_id, epic_num, release_version inherited from trace context
+
+  # Gate decision configuration
+  decision_mode: "deterministic" # deterministic (rule-based) | manual (team decision)
+  allow_waivers: true # Allow business-approved waivers for FAIL → WAIVED
+  require_evidence: true # Require links to test results, reports, etc.
+
+  # Input sources for gate (auto-discovered from Phase 1 + external)
+  # story_file, test_design_file inherited from trace
+  nfr_file: "" # Path to nfr-assessment.md (optional, recommended for release gates)
+  test_results: "" # Path to test execution results (CI artifacts, reports)
+
+  # Decision criteria thresholds
+  min_p0_pass_rate: 100 # P0 tests must have 100% pass rate
+  min_p1_pass_rate: 95 # P1 tests threshold
+  min_overall_pass_rate: 90 # Overall test pass rate
+  # min_coverage already defined above (min_overall_coverage: 80)
+  max_critical_nfrs_fail: 0 # No critical NFRs can fail
+  max_security_issues: 0 # No unresolved security issues
+
+  # Risk tolerance
+  allow_p2_failures: true # P2 failures don't block release
+  allow_p3_failures: true # P3 failures don't block release
+  escalate_p1_failures: true # P1 failures require escalation approval
+
+  # Gate output configuration
+  gate_output_file: "{output_folder}/gate-decision-{gate_type}-{story_id}{epic_num}{release_version}.md"
+  append_to_history: true # Append to bmm-workflow-status.md gate history
+  notify_stakeholders: true # Generate notification message for team
+
+  # Advanced gate options
+  check_all_workflows_complete: true # Verify test-design, trace, nfr-assess complete
+  validate_evidence_freshness: true # Warn if assessments are >7 days old
+  require_sign_off: false # Require named approver for gate decision
+
+# Output configuration
+default_output_file: "{output_folder}/traceability-matrix.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test files, BMad artifacts
+  - write_file # Create traceability matrix, gate YAML
+  - list_files # Discover test files
+  - search_repo # Find tests by test ID, describe blocks
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required for BMad mode)"
+  - test_files: "Test suite for the feature (auto-discovered if not provided)"
+  - test_design: "Test design with risk/priority assessment (required for Phase 2 gate)"
+  - tech_spec: "Technical specification (optional)"
+  - existing_tests: "Current test suite for analysis"
+  - test_results: "CI/CD test execution results (required for Phase 2 gate)"
+  - nfr_assess: "Non-functional requirements validation (recommended for release gates)"
+  - code_coverage: "Code coverage report (optional)"

 tags:
  - qa
  - traceability
  - test-architect
+  - coverage
+  - requirements
+  - gate
+  - decision
+  - release

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
Author	SHA1	Message	Date
Murat K Ozcan	f99a9ba5e8	Merge branch 'v6-alpha' into feat/migrate-tea-1	2025-10-16 10:22:26 -05:00
Murat Ozcan	ec486af453	feat: integrated new playwright mcp	2025-10-15 17:14:39 -05:00
Murat Ozcan	633abf0c3b	format fixed	2025-10-14 16:14:04 -05:00
Murat Ozcan	93e290bf73	feat: migrate test architect entirely to v6	2025-10-14 16:10:20 -05:00