feat: migrate test architect entirely to v6

2025-10-14 16:10:20 -05:00
parent 4b6f34dff8
commit 93e290bf73
48 changed files with 16592 additions and 389 deletions
--- a/src/modules/bmm/README.md
+++ b/src/modules/bmm/README.md
@@ -62,7 +62,7 @@ Extension modules that add specialized capabilities to BMM.

 ### 🏗️ `/testarch`

-Test architecture and quality assurance components.
+Test architecture and quality assurance components. The **[Test Architect (TEA) Guide](./testarch/README.md)** provides comprehensive testing strategy across 9 workflows: framework setup, CI/CD, test design, ATDD, automation, traceability, NFR assessment, quality gates, and test review.

 ## Quick Start

@@ -119,6 +119,7 @@ BMM integrates seamlessly with the BMad Core framework, leveraging:
 ## Related Documentation

 - [BMM Workflows Guide](./workflows/README.md) - **Start here!**
+- [Test Architect (TEA) Guide](./testarch/README.md) - Quality assurance and testing strategy
 - [Agent Documentation](./agents/README.md) - Individual agent capabilities
 - [Team Configurations](./teams/README.md) - Pre-built team setups
 - [Task Library](./tasks/README.md) - Reusable task components
--- a/src/modules/bmm/agents/tea.agent.yaml
+++ b/src/modules/bmm/agents/tea.agent.yaml
@@ -57,3 +57,7 @@ agent:
    - trigger: gate
      workflow: "{project-root}/bmad/bmm/workflows/testarch/gate/workflow.yaml"
      description: Write/update quality gate decision assessment
+
+    - trigger: test-review
+      workflow: "{project-root}/bmad/bmm/workflows/testarch/test-review/workflow.yaml"
+      description: Review test quality using comprehensive knowledge base and best practices
--- a/src/modules/bmm/testarch/README.md
+++ b/src/modules/bmm/testarch/README.md
@@ -1,5 +1,5 @@
 ---
-last-redoc-date: 2025-09-30
+last-redoc-date: 2025-10-14
 ---

 # Test Architect (TEA) Agent Guide
@@ -10,6 +10,95 @@ last-redoc-date: 2025-09-30
 - **Mission:** Deliver actionable quality strategies, automation coverage, and gate decisions that scale with project level and compliance demands.
 - **Use When:** Project level ≥2, integration risk is non-trivial, brownfield regression risk exists, or compliance/NFR evidence is required.

+## TEA Workflow Lifecycle
+
+TEA integrates across the entire BMad development lifecycle, providing quality assurance at every phase:
+
+```
+┌──────────────────────────────────────────────────────────┐
+│             BMM Phase 2: PLANNING                        │
+│                                                          │
+│  PM: *plan-project                                       │
+│       ↓                                                  │
+│  TEA: *framework ──→ *ci ──→ *test-design                │
+│       └─────────┬─────────────┘                          │
+│                 │ (Setup once per project)               │
+└─────────────────┼──────────────────────────────────────────┘
+                  ↓
+┌──────────────────────────────────────────────────────────┐
+│            BMM Phase 4: IMPLEMENTATION                   │
+│                  (Per Story Cycle)                       │
+│                                                          │
+│  ┌─→ SM: *create-story                                  │
+│  │        ↓                                              │
+│  │   TEA: *atdd (optional, before dev)                  │
+│  │        ↓                                              │
+│  │   DEV: implements story                               │
+│  │        ↓                                              │
+│  │   TEA: *automate ──→ *test-review (optional)         │
+│  │        ↓                                              │
+│  │   TEA: *trace (refresh coverage)                     │
+│  │        ↓                                              │
+│  └───[next story]                                        │
+└─────────────────┼──────────────────────────────────────────┘
+                  ↓
+┌──────────────────────────────────────────────────────────┐
+│                EPIC/RELEASE GATE                         │
+│                                                          │
+│  TEA: *nfr-assess (if not done earlier)                 │
+│       ↓                                                  │
+│  TEA: *test-review (final audit, optional)              │
+│       ↓                                                  │
+│  TEA: *gate ──→ PASS | CONCERNS | FAIL | WAIVED         │
+│                                                          │
+└──────────────────────────────────────────────────────────┘
+```
+
+### TEA Integration with BMad v6 Workflow
+
+TEA operates **across all four BMad phases**, unlike other agents that are phase-specific:
+
+<details>
+<summary><strong>Cross-Phase Integration & Workflow Complexity</strong></summary>
+
+### Phase-Specific Agents (Standard Pattern)
+
+- **Phase 1 (Analysis)**: Analyst agent
+- **Phase 2 (Planning)**: PM agent
+- **Phase 3 (Solutioning)**: Architect agent
+- **Phase 4 (Implementation)**: SM, DEV agents
+
+### TEA: Cross-Phase Quality Agent (Unique Pattern)
+
+TEA is **the only agent that spans all phases**:
+
+```
+Phase 1 (Analysis) → [TEA not typically used]
+    ↓
+Phase 2 (Planning) → TEA: *framework, *ci, *test-design (setup)
+    ↓
+Phase 3 (Solutioning) → [TEA validates architecture testability]
+    ↓
+Phase 4 (Implementation) → TEA: *atdd, *automate, *test-review, *trace (per story)
+    ↓
+Epic/Release Gate → TEA: *nfr-assess, *gate (release decision)
+```
+
+### Why TEA Needs 9 Workflows
+
+**Standard agents**: 1-3 workflows per phase
+**TEA**: 9 workflows across 3+ phases
+
+| Phase       | TEA Workflows                          | Frequency        | Purpose                          |
+| ----------- | -------------------------------------- | ---------------- | -------------------------------- |
+| **Phase 2** | *framework, *ci, \*test-design         | Once per project | Establish quality infrastructure |
+| **Phase 4** | *atdd, *automate, *test-review, *trace | Per story/sprint | Continuous quality validation    |
+| **Release** | *nfr-assess, *gate                     | Per epic/release | Go/no-go decision                |
+
+This complexity **requires specialized documentation** (this guide), **extensive knowledge base** (19+ fragments), and **unique architecture** (`testarch/` directory).
+
+</details>
+
 ## Prerequisites and Setup

 1. Run the core planning workflows first:
@@ -31,8 +120,8 @@ last-redoc-date: 2025-09-30
 | Pre-Implementation | Run `*framework` (if harness missing), `*ci`, and `*test-design`          | Review risk/design/CI guidance, align backlog                                    | Test scaffold, CI pipeline, risk and coverage strategy                                |
 | Story Prep         | -                                                                         | Scrum Master `*create-story`, `*story-context`                                   | Story markdown + context XML                                                          |
 | Implementation     | (Optional) Trigger `*atdd` before dev to supply failing tests + checklist | Implement story guided by ATDD checklist                                         | Failing acceptance tests + implementation checklist                                   |
-| Post-Dev           | Execute `*automate`, re-run `*trace`                                      | Address recommendations, update code/tests                                       | Regression specs, refreshed coverage matrix                                           |
-| Release            | Run `*gate`                                                               | Confirm Definition of Done, share release notes                                  | Gate YAML + release summary (owners, waivers)                                         |
+| Post-Dev           | Execute `*automate`, (Optional) `*test-review`, re-run `*trace`           | Address recommendations, update code/tests                                       | Regression specs, quality report, refreshed coverage matrix                           |
+| Release            | (Optional) `*test-review` for final audit, Run `*gate`                    | Confirm Definition of Done, share release notes                                  | Quality audit, Gate YAML + release summary (owners, waivers)                          |

 <details>
 <summary>Execution Notes</summary>
@@ -40,7 +129,8 @@ last-redoc-date: 2025-09-30
 - Run `*framework` only once per repo or when modern harness support is missing.
 - `*framework` followed by `*ci` establishes install + pipeline; `*test-design` then handles risk scoring, mitigations, and scenario planning in one pass.
 - Use `*atdd` before coding when the team can adopt ATDD; share its checklist with the dev agent.
- Post-implementation, keep `*trace` current, expand coverage with `*automate`, and finish with `*gate`.
+- Post-implementation, keep `*trace` current, expand coverage with `*automate`, optionally review test quality with `*test-review`, and finish with `*gate`.
+- Use `*test-review` after `*atdd` to validate generated tests, after `*automate` to ensure regression quality, or before `*gate` for final audit.

 </details>

@@ -51,21 +141,21 @@ last-redoc-date: 2025-09-30
 2. **Setup:** TEA checks harness via `*framework`, configures `*ci`, and runs `*test-design` to capture risk/coverage plans.
 3. **Story Prep:** Scrum Master generates the story via `*create-story`; PO validates using `*assess-project-ready`.
 4. **Implementation:** TEA optionally runs `*atdd`; Dev implements with guidance from failing tests and the plan.
-5. **Post-Dev and Release:** TEA runs `*automate`, re-runs `*trace`, and finishes with `*gate` to document the decision.
+5. **Post-Dev and Release:** TEA runs `*automate`, optionally `*test-review` to audit test quality, re-runs `*trace`, and finishes with `*gate` to document the decision.

 </details>

 ### Brownfield Feature Enhancement (Level 3–4)

-| Phase             | Test Architect                                                      | Dev / Team                                                 | Outputs                                                 |
-| ----------------- | ------------------------------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |
-| Refresh Context   | -                                                                   | Analyst/PM/Architect rerun planning workflows              | Updated planning artifacts in `{output_folder}`         |
-| Baseline Coverage | Run `*trace` to inventory existing tests                            | Review matrix, flag hotspots                               | Coverage matrix + initial gate snippet                  |
-| Risk Targeting    | Run `*test-design`                                                  | Align remediation/backlog priorities                       | Brownfield risk memo + scenario matrix                  |
-| Story Prep        | -                                                                   | Scrum Master `*create-story`                               | Updated story markdown                                  |
-| Implementation    | (Optional) Run `*atdd` before dev                                   | Implement story, referencing checklist/tests               | Failing acceptance tests + implementation checklist     |
-| Post-Dev          | Apply `*automate`, re-run `*trace`, trigger `*nfr-assess` if needed | Resolve gaps, update docs/tests                            | Regression specs, refreshed coverage matrix, NFR report |
-| Release           | Run `*gate`                                                         | Product Owner `*assess-project-ready`, share release notes | Gate YAML + release summary                             |
+| Phase             | Test Architect                                                                         | Dev / Team                                                 | Outputs                                                                 |
+| ----------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
+| Refresh Context   | -                                                                                      | Analyst/PM/Architect rerun planning workflows              | Updated planning artifacts in `{output_folder}`                         |
+| Baseline Coverage | Run `*trace` to inventory existing tests                                               | Review matrix, flag hotspots                               | Coverage matrix + initial gate snippet                                  |
+| Risk Targeting    | Run `*test-design`                                                                     | Align remediation/backlog priorities                       | Brownfield risk memo + scenario matrix                                  |
+| Story Prep        | -                                                                                      | Scrum Master `*create-story`                               | Updated story markdown                                                  |
+| Implementation    | (Optional) Run `*atdd` before dev                                                      | Implement story, referencing checklist/tests               | Failing acceptance tests + implementation checklist                     |
+| Post-Dev          | Apply `*automate`, (Optional) `*test-review`, re-run `*trace`, `*nfr-assess` if needed | Resolve gaps, update docs/tests                            | Regression specs, quality report, refreshed coverage matrix, NFR report |
+| Release           | (Optional) `*test-review` for final audit, Run `*gate`                                 | Product Owner `*assess-project-ready`, share release notes | Quality audit, Gate YAML + release summary                              |

 <details>
 <summary>Execution Notes</summary>
@@ -73,7 +163,8 @@ last-redoc-date: 2025-09-30
 - Lead with `*trace` so remediation plans target true coverage gaps. Ensure `*framework` and `*ci` are in place early in the engagement; if the brownfield lacks them, run those setup steps immediately after refreshing context.
 - `*test-design` should highlight regression hotspots, mitigations, and P0 scenarios.
 - Use `*atdd` when stories benefit from ATDD; otherwise proceed to implementation and rely on post-dev automation.
- After development, expand coverage with `*automate`, re-run `*trace`, and close with `*gate`. Run `*nfr-assess` now if non-functional risks weren't addressed earlier.
+- After development, expand coverage with `*automate`, optionally review test quality with `*test-review`, re-run `*trace`, and close with `*gate`. Run `*nfr-assess` now if non-functional risks weren't addressed earlier.
+- Use `*test-review` to validate existing brownfield tests or audit new tests before gate.
 - Product Owner `*assess-project-ready` confirms the team has artifacts before handoff or release.

 </details>
@@ -87,26 +178,27 @@ last-redoc-date: 2025-09-30
 4. **Story Prep:** Scrum Master generates `stories/story-1.1.md` via `*create-story`, automatically pulling updated context.
 5. **ATDD First:** TEA runs `*atdd`, producing failing Playwright specs under `tests/e2e/payments/` plus an implementation checklist.
 6. **Implementation:** Dev pairs with the checklist/tests to deliver the story.
-7. **Post-Implementation:** TEA applies `*automate`, re-runs `*trace`, performs `*nfr-assess` to validate SLAs, and closes with `*gate` marking PASS with follow-ups.
+7. **Post-Implementation:** TEA applies `*automate`, optionally `*test-review` to audit test quality, re-runs `*trace`, performs `*nfr-assess` to validate SLAs, and closes with `*gate` marking PASS with follow-ups.

 </details>

 ### Enterprise / Compliance Program (Level 4)

-| Phase               | Test Architect                                   | Dev / Team                                     | Outputs                                                   |
-| ------------------- | ------------------------------------------------ | ---------------------------------------------- | --------------------------------------------------------- |
-| Strategic Planning  | -                                                | Analyst/PM/Architect standard workflows        | Enterprise-grade PRD, epics, architecture                 |
-| Quality Planning    | Run `*framework`, `*test-design`, `*nfr-assess`  | Review guidance, align compliance requirements | Harness scaffold, risk + coverage plan, NFR documentation |
-| Pipeline Enablement | Configure `*ci`                                  | Coordinate secrets, pipeline approvals         | `.github/workflows/test.yml`, helper scripts              |
-| Execution           | Enforce `*atdd`, `*automate`, `*trace` per story | Implement stories, resolve TEA findings        | Tests, fixtures, coverage matrices                        |
-| Release             | Run `*gate`                                      | Capture sign-offs, archive artifacts           | Updated assessments, gate YAML, audit trail               |
+| Phase               | Test Architect                                                   | Dev / Team                                     | Outputs                                                    |
+| ------------------- | ---------------------------------------------------------------- | ---------------------------------------------- | ---------------------------------------------------------- |
+| Strategic Planning  | -                                                                | Analyst/PM/Architect standard workflows        | Enterprise-grade PRD, epics, architecture                  |
+| Quality Planning    | Run `*framework`, `*test-design`, `*nfr-assess`                  | Review guidance, align compliance requirements | Harness scaffold, risk + coverage plan, NFR documentation  |
+| Pipeline Enablement | Configure `*ci`                                                  | Coordinate secrets, pipeline approvals         | `.github/workflows/test.yml`, helper scripts               |
+| Execution           | Enforce `*atdd`, `*automate`, `*test-review`, `*trace` per story | Implement stories, resolve TEA findings        | Tests, fixtures, quality reports, coverage matrices        |
+| Release             | (Optional) `*test-review` for final audit, Run `*gate`           | Capture sign-offs, archive artifacts           | Quality audit, updated assessments, gate YAML, audit trail |

 <details>
 <summary>Execution Notes</summary>

 - Use `*atdd` for every story when feasible so acceptance tests lead implementation in regulated environments.
 - `*ci` scaffolds selective testing scripts, burn-in jobs, caching, and notifications for long-running suites.
- Prior to release, rerun coverage (`*trace`, `*automate`) and formalize the decision in `*gate`; store everything for audits. Call `*nfr-assess` here if compliance/performance requirements weren't captured during planning.
+- Enforce `*test-review` per story or sprint to maintain quality standards and ensure compliance with testing best practices.
+- Prior to release, rerun coverage (`*trace`, `*automate`), perform final quality audit with `*test-review`, and formalize the decision in `*gate`; store everything for audits. Call `*nfr-assess` here if compliance/performance requirements weren't captured during planning.

 </details>

@@ -116,23 +208,26 @@ last-redoc-date: 2025-09-30
 1. **Strategic Planning:** Analyst/PM/Architect complete PRD, epics, and architecture using the standard workflows.
 2. **Quality Planning:** TEA runs `*framework`, `*test-design`, and `*nfr-assess` to establish mitigations, coverage, and NFR targets.
 3. **Pipeline Setup:** TEA configures CI via `*ci` with selective execution scripts.
-4. **Execution:** For each story, TEA enforces `*atdd`, `*automate`, and `*trace`; Dev teams iterate on the findings.
-5. **Release:** TEA re-checks coverage and logs the final gate decision via `*gate`, archiving artifacts for compliance.
+4. **Execution:** For each story, TEA enforces `*atdd`, `*automate`, `*test-review`, and `*trace`; Dev teams iterate on the findings.
+5. **Release:** TEA re-checks coverage, performs final quality audit with `*test-review`, and logs the final gate decision via `*gate`, archiving artifacts for compliance.

 </details>

 ## Command Catalog

-| Command        | Task File                                        | Primary Outputs                                                     | Notes                                            |
-| -------------- | ------------------------------------------------ | ------------------------------------------------------------------- | ------------------------------------------------ |
-| `*framework`   | `workflows/testarch/framework/instructions.md`   | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs | Use when no production-ready harness exists      |
-| `*atdd`        | `workflows/testarch/atdd/instructions.md`        | Failing acceptance tests + implementation checklist                 | Requires approved story + harness                |
-| `*automate`    | `workflows/testarch/automate/instructions.md`    | Prioritized specs, fixtures, README/script updates, DoD summary     | Avoid duplicate coverage (see priority matrix)   |
-| `*ci`          | `workflows/testarch/ci/instructions.md`          | CI workflow, selective test scripts, secrets checklist              | Platform-aware (GitHub Actions default)          |
-| `*test-design` | `workflows/testarch/test-design/instructions.md` | Combined risk assessment, mitigation plan, and coverage strategy    | Handles risk scoring and test design in one pass |
-| `*trace`       | `workflows/testarch/trace/instructions.md`       | Coverage matrix, recommendations, gate snippet                      | Requires access to story/tests repositories      |
-| `*nfr-assess`  | `workflows/testarch/nfr-assess/instructions.md`  | NFR assessment report with actions                                  | Focus on security/performance/reliability        |
-| `*gate`        | `workflows/testarch/gate/instructions.md`        | Gate YAML + summary (PASS/CONCERNS/FAIL/WAIVED)                     | Deterministic decision rules + rationale         |
+| Command        | Workflow README                                   | Primary Outputs                                                     | Notes                                            |
+| -------------- | ------------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------ |
+| `*framework`   | [📖](../workflows/testarch/framework/README.md)   | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs | Use when no production-ready harness exists      |
+| `*ci`          | [📖](../workflows/testarch/ci/README.md)          | CI workflow, selective test scripts, secrets checklist              | Platform-aware (GitHub Actions default)          |
+| `*test-design` | [📖](../workflows/testarch/test-design/README.md) | Combined risk assessment, mitigation plan, and coverage strategy    | Handles risk scoring and test design in one pass |
+| `*atdd`        | [📖](../workflows/testarch/atdd/README.md)        | Failing acceptance tests + implementation checklist                 | Requires approved story + harness                |
+| `*automate`    | [📖](../workflows/testarch/automate/README.md)    | Prioritized specs, fixtures, README/script updates, DoD summary     | Avoid duplicate coverage (see priority matrix)   |
+| `*trace`       | [📖](../workflows/testarch/trace/README.md)       | Coverage matrix, recommendations, gate snippet                      | Requires access to story/tests repositories      |
+| `*nfr-assess`  | [📖](../workflows/testarch/nfr-assess/README.md)  | NFR assessment report with actions                                  | Focus on security/performance/reliability        |
+| `*gate`        | [📖](../workflows/testarch/gate/README.md)        | Gate YAML + summary (PASS/CONCERNS/FAIL/WAIVED)                     | Deterministic decision rules + rationale         |
+| `*test-review` | [📖](../workflows/testarch/test-review/README.md) | Test quality review report with 0-100 score, violations, fixes      | Reviews tests against knowledge base patterns    |
+
+**📖** = Click to view detailed workflow documentation

 <details>
 <summary>Command Guidance and Context Loading</summary>
@@ -144,19 +239,29 @@ last-redoc-date: 2025-09-30

 </details>

-## Workflow Placement
+## Why TEA is Architecturally Different

-The TEA stack has three tightly-linked layers:
+TEA is the only BMM agent with its own top-level module directory (`bmm/testarch/`). This intentional design pattern reflects TEA's unique requirements:

-1. **Agent spec (`agents/tea.md`)** – declares the persona, critical actions, and the `run-workflow` entries for every TEA command. Critical actions instruct the agent to load `tea-index.csv` and then fetch only the fragments it needs from `knowledge/` before giving guidance.
-2. **Knowledge index (`tea-index.csv`)** – catalogues each fragment with tags and file paths. Workflows call out the IDs they need (e.g., `risk-governance`, `fixture-architecture`) so the agent loads targeted guidance instead of a monolithic brief.
-3. **Workflows (`workflows/testarch/*`)** – contain the task flows and reference `tea-index.csv` in their `<flow>`/`<notes>` sections to request specific fragments. Keeping all workflows in this directory ensures consistent discovery during planning (`*framework`), implementation (`*atdd`, `*automate`, `*trace`), and release (`*nfr-assess`, `*gate`).
+<details>
+<summary><strong>Unique Architecture Pattern & Rationale</strong></summary>

-This separation lets us expand the knowledge base without touching agent wiring and keeps every command remote-controllable via the standard BMAD workflow runner. As navigation improves, we can add lightweight entrypoints or tags in the index without changing where workflows live.
+### Directory Structure

-## Appendix
+```
+src/modules/bmm/
+├── agents/
+│   └── tea.agent.yaml          # Agent definition (standard location)
+├── workflows/
+│   └── testarch/               # TEA workflows (standard location)
+└── testarch/                   # Knowledge base (UNIQUE!)
+    ├── knowledge/              # 19+ reusable test pattern fragments
+    ├── tea-index.csv           # Centralized knowledge lookup
+    └── README.md               # This guide
+```

- **Supporting Knowledge:**
-  - `tea-index.csv` – Catalog of knowledge fragments with tags and file paths under `knowledge/` for task-specific loading.
-  - `knowledge/*.md` – Focused summaries (fixtures, network, CI, levels, priorities, etc.) distilled from Murat’s external resources.
-  - `test-resources-for-ai-flat.txt` – Raw 347 KB archive retained for manual deep dives when a fragment needs source validation.
+### Why TEA Gets Special Treatment
+
+TEA uniquely requires **extensive domain knowledge** (19+ fragments: test patterns, CI/CD, fixtures, quality practices), a **centralized reference system** (`tea-index.csv` for on-demand fragment loading), and **cross-cutting concerns** (domain-specific patterns vs project-specific artifacts like PRDs/stories). Other BMM agents don't require this architecture.
+
+</details>
--- a/src/modules/bmm/workflows/testarch/README.md
+++ b/src/modules/bmm/workflows/testarch/README.md
@@ -12,10 +12,14 @@ This directory houses the per-command workflows used by the Test Architect agent
 - `trace` – maps requirements to implemented automated tests.
 - `nfr-assess` – evaluates non-functional requirements.
 - `gate` – records the release decision in the gate file.
+- `test-review` – reviews test quality using knowledge base patterns and generates quality score.

 Each subdirectory contains:

- `instructions.md` – the slim workflow instructions.
- `workflow.yaml` – metadata consumed by the BMAD workflow runner.
+- `README.md` – comprehensive workflow documentation with usage, inputs, outputs, and integration notes.
+- `instructions.md` – detailed workflow steps in pure markdown v4.0 format.
+- `workflow.yaml` – metadata, variables, and configuration for BMAD workflow runner.
+- `checklist.md` – validation checklist for quality assurance and completeness verification.
+- `template.md` – output template for workflow deliverables (where applicable).

 The TEA agent now invokes these workflows via `run-workflow` rather than executing instruction files directly.
--- a/src/modules/bmm/workflows/testarch/atdd/README.md
+++ b/src/modules/bmm/workflows/testarch/atdd/README.md
@@ -0,0 +1,533 @@
+# ATDD (Acceptance Test-Driven Development) Workflow
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. Creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development toward passing tests.
+
+**Core Principle**: Tests fail first (red phase), guide development to green, then enable confident refactoring.
+
+## Usage
+
+```bash
+bmad tea *atdd
+```
+
+The TEA agent runs this workflow when:
+
+- User story is approved with clear acceptance criteria
+- Development is about to begin (before any implementation code)
+- Team is practicing Test-Driven Development (TDD)
+- Need to establish test-first contract with DEV team
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria, functional requirements, and technical constraints
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) from framework workflow
+
+**Workflow Variables:**
+
+- `story_file`: Path to story markdown with acceptance criteria (required)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `test_framework`: Detected from framework workflow (playwright or cypress)
+- `test_levels`: Which test levels to generate (default: "e2e,api,component")
+- `primary_level`: Primary test level for acceptance criteria (default: "e2e")
+- `start_failing`: Tests must fail initially - red phase (default: true)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `network_first`: Route interception before navigation to prevent race conditions (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `generate_factories`: Create data factory stubs using faker (default: true)
+- `generate_fixtures`: Create fixture architecture with auto-cleanup (default: true)
+- `auto_cleanup`: Fixtures clean up their data automatically (default: true)
+- `include_data_testids`: List required data-testid attributes for DEV (default: true)
+- `include_mock_requirements`: Document mock/stub needs (default: true)
+- `auto_load_knowledge`: Load fixture-architecture, data-factories, component-tdd fragments (default: true)
+- `share_with_dev`: Provide implementation checklist to DEV agent (default: true)
+- `output_checklist`: Path for implementation checklist (default: `{output_folder}/atdd-checklist-{story_id}.md`)
+
+**Optional Context:**
+
+- **Test design document**: For risk/priority context alignment (P0-P3 scenarios)
+- **Existing fixtures/helpers**: For consistency with established patterns
+- **Architecture documents**: For understanding system boundaries and integration points
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **ATDD Checklist** (`atdd-checklist-{story_id}.md`): Implementation guide containing:
+  - Story summary and acceptance criteria breakdown
+  - Test files created with paths and line counts
+  - Data factories created with patterns
+  - Fixtures created with auto-cleanup logic
+  - Mock requirements for external services
+  - Required data-testid attributes list
+  - Implementation checklist mapping tests to code tasks
+  - Red-green-refactor workflow guidance
+  - Execution commands for running tests
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Full user journey tests for critical paths
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and service contract tests
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI component behavior tests
+
+**Supporting Infrastructure:**
+
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Factory functions using @faker-js/faker for generating test data with overrides support
+- **Test fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Playwright fixtures with setup/teardown and auto-cleanup
+- **Mock/stub documentation**: Requirements for external service mocking (payment gateways, email services, etc.)
+- **data-testid requirements**: List of required test IDs for stable selectors in UI implementation
+
+**Validation Safeguards:**
+
+- All tests must fail initially (red phase verified by local test run)
+- Failure messages are clear and actionable
+- Tests use Given-When-Then format for readability
+- Network-first pattern applied (route interception before navigation)
+- One assertion per test (atomic test design)
+- No hard waits or sleeps (explicit waits only)
+
+## Key Features
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA Agent responsibility):
+
+- Write failing tests first defining expected behavior
+- Tests fail for right reason (missing implementation, not test bugs)
+- All supporting infrastructure (factories, fixtures, mocks) created
+
+**GREEN Phase** (DEV Agent responsibility):
+
+- Implement minimal code to pass one test at a time
+- Use implementation checklist as guide
+- Run tests frequently to verify progress
+
+**REFACTOR Phase** (DEV Agent responsibility):
+
+- Improve code quality with confidence (tests provide safety net)
+- Extract duplications, optimize performance
+- Ensure tests still pass after changes
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- Visual regression and state management
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+**Selection Strategy**: Avoid duplicate coverage. Use E2E for critical happy path, API for business logic variations, component for UI edge cases, unit for pure logic.
+
+### Given-When-Then Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('should display error for invalid credentials', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits invalid credentials
+  await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+  await page.fill('[data-testid="password-input"]', 'wrongpassword');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: Error message is displayed
+  await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+});
+```
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', 'password123');
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user (automatic)
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures via mergeTests)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### One Assertion Per Test (Atomic Design)
+
+Each test should verify exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Implementation Checklist for DEV
+
+Maps each failing test to concrete implementation tasks:
+
+```markdown
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+- [ ] Implement error handling
+- [ ] Run test: `npm run test:e2e -- login.spec.ts`
+- [ ] ✅ Test passes (green phase)
+```
+
+Provides clear path from red to green for each test.
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Must run first to establish test framework architecture (Playwright or Cypress config, directory structure, base fixtures)
+- **test-design** workflow: Optional but recommended for P0-P3 priority alignment and risk assessment context
+
+**After this workflow:**
+
+- **DEV agent** implements features guided by failing tests and implementation checklist
+- **test-review** workflow: Review generated test quality before sharing with DEV team
+- **automate** workflow: After story completion, expand regression suite with additional edge case coverage
+
+**Coordinates with:**
+
+- **Story approval process**: ATDD runs after story is approved but before DEV begins implementation
+- **Quality gates**: Failing tests serve as acceptance criteria for story completion (all tests must pass)
+
+## Important Notes
+
+### ATDD is Test-First, Not Test-After
+
+**Critical timing**: Tests must be written BEFORE any implementation code. This ensures:
+
+- Tests define the contract (what needs to be built)
+- Implementation is guided by tests (no over-engineering)
+- Tests verify behavior, not implementation details
+- Confidence in refactoring (tests catch regressions)
+
+### All Tests Must Fail Initially
+
+**Red phase verification is mandatory**:
+
+- Run tests locally after creation to confirm RED phase
+- Failure should be due to missing implementation, not test bugs
+- Failure messages should be clear and actionable
+- Document expected failure messages in ATDD checklist
+
+If a test passes before implementation, it's not testing the right thing.
+
+### Use data-testid for Stable Selectors
+
+**Why data-testid?**
+
+- CSS classes change frequently (styling refactors)
+- IDs may not be unique or stable
+- Text content changes with localization
+- data-testid is explicit contract between tests and UI
+
+```typescript
+// ✅ CORRECT: Stable selector
+await page.click('[data-testid="login-button"]');
+
+// ❌ FRAGILE: Class-based selector
+await page.click('.btn.btn-primary.login-btn');
+```
+
+ATDD checklist includes complete list of required data-testid attributes for DEV team.
+
+### No Hard Waits or Sleeps
+
+**Use explicit waits only**:
+
+```typescript
+// ✅ CORRECT: Explicit wait for condition
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Hard wait (flaky, slow)
+await page.waitForTimeout(2000);
+```
+
+Playwright's auto-waiting is preferred (expect() automatically waits up to timeout).
+
+### Component Tests for Complex UI Only
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard navigation)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+Component tests are valuable but should complement, not replace, E2E and API tests.
+
+### Auto-Cleanup is Non-Negotiable
+
+**Every test must clean up its data**:
+
+- Use fixtures with automatic teardown
+- Never leave test data in database/storage
+- Each test should be isolated (no shared state)
+
+**Cleanup patterns:**
+
+- Fixtures: Cleanup in teardown function
+- Factories: Provide deletion helpers
+- Tests: Use `test.afterEach()` for manual cleanup if needed
+
+Without auto-cleanup, tests become flaky and depend on execution order.
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing (@playwright/experimental-ct-react)
+- **network-first.md** - Route interception patterns (intercept before navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping and additional references.
+
+## Example Output
+
+After running this workflow, the ATDD checklist will contain:
+
+````markdown
+# ATDD Checklist - Epic 3, Story 5: User Authentication
+
+## Story Summary
+
+As a user, I want to log in with email and password so that I can access my personalized dashboard.
+
+## Acceptance Criteria
+
+1. User can log in with valid credentials
+2. User sees error message with invalid credentials
+3. User is redirected to dashboard after successful login
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests (3 tests)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - ✅ should log in with valid credentials (RED - missing /login route)
+  - ✅ should display error for invalid credentials (RED - error message not implemented)
+  - ✅ should redirect to dashboard after login (RED - redirect logic missing)
+
+### API Tests (2 tests)
+
+- `tests/api/auth.api.spec.ts` (54 lines)
+  - ✅ POST /api/auth/login - should return token for valid credentials (RED - endpoint not implemented)
+  - ✅ POST /api/auth/login - should return 401 for invalid credentials (RED - validation missing)
+
+## Data Factories Created
+
+- `tests/support/factories/user.factory.ts` - createUser(), createUsers(count)
+
+## Fixtures Created
+
+- `tests/support/fixtures/auth.fixture.ts` - authenticatedUser fixture with auto-cleanup
+
+## Required data-testid Attributes
+
+### Login Page
+
+- `email-input` - Email input field
+- `password-input` - Password input field
+- `login-button` - Submit button
+- `error-message` - Error message container
+
+### Dashboard Page
+
+- `user-name` - User name display
+- `logout-button` - Logout button
+
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add data-testid attributes: `email-input`, `password-input`, `login-button`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Display Error for Invalid Credentials
+
+- [ ] Add error state management
+- [ ] Display error message UI
+- [ ] Add `data-testid="error-message"`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Redirect to Dashboard After Login
+
+- [ ] Implement redirect logic after successful auth
+- [ ] Verify authentication token stored
+- [ ] Add dashboard route protection
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+## Running Tests
+
+```bash
+# Run all failing tests
+npm run test:e2e
+
+# Run specific test file
+npm run test:e2e -- user-authentication.spec.ts
+
+# Run tests in headed mode (see browser)
+npm run test:e2e -- --headed
+
+# Debug specific test
+npm run test:e2e -- user-authentication.spec.ts --debug
+```
+````
+
+## Red-Green-Refactor Workflow
+
+**RED Phase** (Complete):
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created
+- ✅ data-testid requirements documented
+
+**GREEN Phase** (DEV Team - Next Steps):
+
+1. Pick one failing test from checklist
+2. Implement minimal code to make it pass
+3. Run test to verify green
+4. Check off task in checklist
+5. Move to next test
+6. Repeat until all tests pass
+
+**REFACTOR Phase** (DEV Team - After All Tests Pass):
+
+1. All tests passing (green)
+2. Improve code quality (extract functions, optimize)
+3. Remove duplications
+4. Ensure tests still pass after each refactor
+
+## Next Steps
+
+1. Review this checklist with team
+2. Run failing tests to confirm RED phase: `npm run test:e2e`
+3. Begin implementation using checklist as guide
+4. Share progress in daily standup
+5. When all tests pass, run `bmad sm story-approved` to move story to DONE
+
+```
+
+This comprehensive checklist guides DEV team from red to green with clear tasks and validation steps.
+```
--- a/src/modules/bmm/workflows/testarch/atdd/atdd-checklist-template.md
+++ b/src/modules/bmm/workflows/testarch/atdd/atdd-checklist-template.md
@@ -0,0 +1,363 @@
+# ATDD Checklist - Epic {epic_num}, Story {story_num}: {story_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Primary Test Level:** {primary_level}
+
+---
+
+## Story Summary
+
+{Brief 2-3 sentence summary of the user story}
+
+**As a** {user_role}
+**I want** {feature_description}
+**So that** {business_value}
+
+---
+
+## Acceptance Criteria
+
+{List all testable acceptance criteria from the story}
+
+1. {Acceptance criterion 1}
+2. {Acceptance criterion 2}
+3. {Acceptance criterion 3}
+
+---
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests ({e2e_test_count} tests)
+
+**File:** `{e2e_test_file_path}` ({line_count} lines)
+
+{List each E2E test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### API Tests ({api_test_count} tests)
+
+**File:** `{api_test_file_path}` ({line_count} lines)
+
+{List each API test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### Component Tests ({component_test_count} tests)
+
+**File:** `{component_test_file_path}` ({line_count} lines)
+
+{List each component test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+---
+
+## Data Factories Created
+
+{List all data factory files created with their exports}
+
+### {Entity} Factory
+
+**File:** `tests/support/factories/{entity}.factory.ts`
+
+**Exports:**
+
+- `create{Entity}(overrides?)` - Create single entity with optional overrides
+- `create{Entity}s(count)` - Create array of entities
+
+**Example Usage:**
+
+```typescript
+const user = createUser({ email: 'specific@example.com' });
+const users = createUsers(5); // Generate 5 random users
+```
+
+---
+
+## Fixtures Created
+
+{List all test fixture files created with their fixture names and descriptions}
+
+### {Feature} Fixtures
+
+**File:** `tests/support/fixtures/{feature}.fixture.ts`
+
+**Fixtures:**
+
+- `{fixtureName}` - {description_of_what_fixture_provides}
+  - **Setup:** {what_setup_does}
+  - **Provides:** {what_test_receives}
+  - **Cleanup:** {what_cleanup_does}
+
+**Example Usage:**
+
+```typescript
+import { test } from './fixtures/{feature}.fixture';
+
+test('should do something', async ({ {fixtureName} }) => {
+  // {fixtureName} is ready to use with auto-cleanup
+});
+```
+
+---
+
+## Mock Requirements
+
+{Document external services that need mocking and their requirements}
+
+### {Service Name} Mock
+
+**Endpoint:** `{HTTP_METHOD} {endpoint_url}`
+
+**Success Response:**
+
+```json
+{
+  {success_response_example}
+}
+```
+
+**Failure Response:**
+
+```json
+{
+  {failure_response_example}
+}
+```
+
+**Notes:** {any_special_mock_requirements}
+
+---
+
+## Required data-testid Attributes
+
+{List all data-testid attributes required in UI implementation for test stability}
+
+### {Page or Component Name}
+
+- `{data-testid-name}` - {description_of_element}
+- `{data-testid-name}` - {description_of_element}
+
+**Implementation Example:**
+
+```tsx
+<button data-testid="login-button">Log In</button>
+<input data-testid="email-input" type="email" />
+<div data-testid="error-message">{errorText}</div>
+```
+
+---
+
+## Implementation Checklist
+
+{Map each failing test to concrete implementation tasks that will make it pass}
+
+### Test: {test_name_1}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+### Test: {test_name_2}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+## Running Tests
+
+```bash
+# Run all failing tests for this story
+{test_command_all}
+
+# Run specific test file
+{test_command_specific_file}
+
+# Run tests in headed mode (see browser)
+{test_command_headed}
+
+# Debug specific test
+{test_command_debug}
+
+# Run tests with coverage
+{test_command_coverage}
+```
+
+---
+
+## Red-Green-Refactor Workflow
+
+### RED Phase (Complete) ✅
+
+**TEA Agent Responsibilities:**
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created with auto-cleanup
+- ✅ Mock requirements documented
+- ✅ data-testid requirements listed
+- ✅ Implementation checklist created
+
+**Verification:**
+
+- All tests run and fail as expected
+- Failure messages are clear and actionable
+- Tests fail due to missing implementation, not test bugs
+
+---
+
+### GREEN Phase (DEV Team - Next Steps)
+
+**DEV Agent Responsibilities:**
+
+1. **Pick one failing test** from implementation checklist (start with highest priority)
+2. **Read the test** to understand expected behavior
+3. **Implement minimal code** to make that specific test pass
+4. **Run the test** to verify it now passes (green)
+5. **Check off the task** in implementation checklist
+6. **Move to next test** and repeat
+
+**Key Principles:**
+
+- One test at a time (don't try to fix all at once)
+- Minimal implementation (don't over-engineer)
+- Run tests frequently (immediate feedback)
+- Use implementation checklist as roadmap
+
+**Progress Tracking:**
+
+- Check off tasks as you complete them
+- Share progress in daily standup
+- Mark story as IN PROGRESS in `bmm-workflow-status.md`
+
+---
+
+### REFACTOR Phase (DEV Team - After All Tests Pass)
+
+**DEV Agent Responsibilities:**
+
+1. **Verify all tests pass** (green phase complete)
+2. **Review code for quality** (readability, maintainability, performance)
+3. **Extract duplications** (DRY principle)
+4. **Optimize performance** (if needed)
+5. **Ensure tests still pass** after each refactor
+6. **Update documentation** (if API contracts change)
+
+**Key Principles:**
+
+- Tests provide safety net (refactor with confidence)
+- Make small refactors (easier to debug if tests fail)
+- Run tests after each change
+- Don't change test behavior (only implementation)
+
+**Completion:**
+
+- All tests pass
+- Code quality meets team standards
+- No duplications or code smells
+- Ready for code review and story approval
+
+---
+
+## Next Steps
+
+1. **Review this checklist** with team in standup or planning
+2. **Run failing tests** to confirm RED phase: `{test_command_all}`
+3. **Begin implementation** using implementation checklist as guide
+4. **Work one test at a time** (red → green for each)
+5. **Share progress** in daily standup
+6. **When all tests pass**, refactor code for quality
+7. **When refactoring complete**, run `bmad sm story-approved` to move story to DONE
+
+---
+
+## Knowledge Base References Applied
+
+This ATDD workflow consulted the following knowledge fragments:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's `test.extend()`
+- **data-factories.md** - Factory patterns using `@faker-js/faker` for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing
+- **network-first.md** - Route interception patterns (intercept BEFORE navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Test Execution Evidence
+
+### Initial Test Run (RED Phase Verification)
+
+**Command:** `{test_command_all}`
+
+**Results:**
+
+```
+{paste_test_run_output_showing_all_tests_failing}
+```
+
+**Summary:**
+
+- Total tests: {total_test_count}
+- Passing: 0 (expected)
+- Failing: {total_test_count} (expected)
+- Status: ✅ RED phase verified
+
+**Expected Failure Messages:**
+{list_expected_failure_messages_for_each_test}
+
+---
+
+## Notes
+
+{Any additional notes, context, or special considerations for this story}
+
+- {Note 1}
+- {Note 2}
+- {Note 3}
+
+---
+
+## Contact
+
+**Questions or Issues?**
+
+- Ask in team standup
+- Tag @{tea_agent_username} in Slack/Discord
+- Refer to `testarch/README.md` for workflow documentation
+- Consult `testarch/knowledge/` for testing best practices
+
+---
+
+**Generated by BMad TEA Agent** - {date}
--- a/src/modules/bmm/workflows/testarch/atdd/checklist.md
+++ b/src/modules/bmm/workflows/testarch/atdd/checklist.md
@@ -0,0 +1,373 @@
+# ATDD Workflow Validation Checklist
+
+Use this checklist to validate that the ATDD workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Story approved with clear acceptance criteria (AC must be testable)
+- [ ] Development sandbox/environment ready
+- [ ] Framework scaffolding exists (run `framework` workflow if missing)
+- [ ] Test framework configuration available (playwright.config.ts or cypress.config.ts)
+- [ ] Package.json has test dependencies installed (Playwright or Cypress)
+
+**Halt if missing:** Framework scaffolding or story acceptance criteria
+
+---
+
+## Step 1: Story Context and Requirements
+
+- [ ] Story markdown file loaded and parsed successfully
+- [ ] All acceptance criteria identified and extracted
+- [ ] Affected systems and components identified
+- [ ] Technical constraints documented
+- [ ] Framework configuration loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from config
+- [ ] Existing fixture patterns reviewed for consistency
+- [ ] Similar test patterns searched and found in `{test_dir}`
+- [ ] Knowledge base fragments loaded:
+  - [ ] `fixture-architecture.md`
+  - [ ] `data-factories.md`
+  - [ ] `component-tdd.md`
+  - [ ] `network-first.md`
+  - [ ] `test-quality.md`
+
+---
+
+## Step 2: Test Level Selection and Strategy
+
+- [ ] Each acceptance criterion analyzed for appropriate test level
+- [ ] Test level selection framework applied (E2E vs API vs Component vs Unit)
+- [ ] E2E tests: Critical user journeys and multi-system integration identified
+- [ ] API tests: Business logic and service contracts identified
+- [ ] Component tests: UI component behavior and interactions identified
+- [ ] Unit tests: Pure logic and edge cases identified (if applicable)
+- [ ] Duplicate coverage avoided (same behavior not tested at multiple levels unnecessarily)
+- [ ] Tests prioritized using P0-P3 framework (if test-design document exists)
+- [ ] Primary test level set in `primary_level` variable (typically E2E or API)
+- [ ] Test levels documented in ATDD checklist
+
+---
+
+## Step 3: Failing Tests Generated
+
+### Test File Structure Created
+
+- [ ] Test files organized in appropriate directories:
+  - [ ] `tests/e2e/` for end-to-end tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/support/` for infrastructure (fixtures, factories, helpers)
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] Tests use `data-testid` selectors (not CSS classes or fragile selectors)
+- [ ] One assertion per test (atomic test design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Tests fail initially (RED phase verified by local test run)
+- [ ] Failure messages are clear and actionable
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] Tests follow Given-When-Then format
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes all required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] Tests fail initially (RED phase verified)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] Tests follow Given-When-Then format
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management within component validated
+- [ ] Props and events tested
+- [ ] Tests fail initially (RED phase verified)
+
+### Test Quality Validation
+
+- [ ] All tests use Given-When-Then structure with clear comments
+- [ ] All tests have descriptive names explaining what they test
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+
+---
+
+## Step 4: Data Infrastructure Built
+
+### Data Factories Created
+
+- [ ] Factory files created in `tests/support/factories/`
+- [ ] All factories use `@faker-js/faker` for random data generation (no hardcoded values)
+- [ ] Factories support overrides for specific test scenarios
+- [ ] Factories generate complete valid objects matching API contracts
+- [ ] Helper functions for bulk creation provided (e.g., `createUsers(count)`)
+- [ ] Factory exports are properly typed (TypeScript)
+
+### Test Fixtures Created
+
+- [ ] Fixture files created in `tests/support/fixtures/`
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] Fixtures have setup phase (arrange test preconditions)
+- [ ] Fixtures provide data to tests via `await use(data)`
+- [ ] Fixtures have teardown phase with auto-cleanup (delete created data)
+- [ ] Fixtures are composable (can use other fixtures if needed)
+- [ ] Fixtures are isolated (each test gets fresh data)
+- [ ] Fixtures are type-safe (TypeScript types defined)
+
+### Mock Requirements Documented
+
+- [ ] External service mocking requirements identified
+- [ ] Mock endpoints documented with URLs and methods
+- [ ] Success response examples provided
+- [ ] Failure response examples provided
+- [ ] Mock requirements documented in ATDD checklist for DEV team
+
+### data-testid Requirements Listed
+
+- [ ] All required data-testid attributes identified from E2E tests
+- [ ] data-testid list organized by page or component
+- [ ] Each data-testid has clear description of element it targets
+- [ ] data-testid list included in ATDD checklist for DEV team
+
+---
+
+## Step 5: Implementation Checklist Created
+
+- [ ] Implementation checklist created with clear structure
+- [ ] Each failing test mapped to concrete implementation tasks
+- [ ] Tasks include:
+  - [ ] Route/component creation
+  - [ ] Business logic implementation
+  - [ ] API integration
+  - [ ] data-testid attribute additions
+  - [ ] Error handling
+  - [ ] Test execution command
+  - [ ] Completion checkbox
+- [ ] Red-Green-Refactor workflow documented in checklist
+- [ ] RED phase marked as complete (TEA responsibility)
+- [ ] GREEN phase tasks listed for DEV team
+- [ ] REFACTOR phase guidance provided
+- [ ] Execution commands provided:
+  - [ ] Run all tests: `npm run test:e2e`
+  - [ ] Run specific test file
+  - [ ] Run in headed mode
+  - [ ] Debug specific test
+- [ ] Estimated effort included (hours or story points)
+
+---
+
+## Step 6: Deliverables Generated
+
+### ATDD Checklist Document Created
+
+- [ ] Output file created at `{output_folder}/atdd-checklist-{story_id}.md`
+- [ ] Document follows template structure from `atdd-checklist-template.md`
+- [ ] Document includes all required sections:
+  - [ ] Story summary
+  - [ ] Acceptance criteria breakdown
+  - [ ] Failing tests created (paths and line counts)
+  - [ ] Data factories created
+  - [ ] Fixtures created
+  - [ ] Mock requirements
+  - [ ] Required data-testid attributes
+  - [ ] Implementation checklist
+  - [ ] Red-green-refactor workflow
+  - [ ] Execution commands
+  - [ ] Next steps for DEV team
+
+### All Tests Verified to Fail (RED Phase)
+
+- [ ] Full test suite run locally before finalizing
+- [ ] All tests fail as expected (RED phase confirmed)
+- [ ] No tests passing before implementation (if passing, test is invalid)
+- [ ] Failure messages documented in ATDD checklist
+- [ ] Failures are due to missing implementation, not test bugs
+- [ ] Test run output captured for reference
+
+### Summary Provided
+
+- [ ] Summary includes:
+  - [ ] Story ID
+  - [ ] Primary test level
+  - [ ] Test counts (E2E, API, Component)
+  - [ ] Test file paths
+  - [ ] Factory count
+  - [ ] Fixture count
+  - [ ] Mock requirements count
+  - [ ] data-testid count
+  - [ ] Implementation task count
+  - [ ] Estimated effort
+  - [ ] Next steps for DEV team
+  - [ ] Output file path
+  - [ ] Knowledge base references applied
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories and fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+
+### Knowledge Base Integration
+
+- [ ] fixture-architecture.md patterns applied to all fixtures
+- [ ] data-factories.md patterns applied to all factories
+- [ ] network-first.md patterns applied to E2E tests with network requests
+- [ ] component-tdd.md patterns applied to component tests
+- [ ] test-quality.md principles applied to all test design
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+
+---
+
+## Integration Points
+
+### With DEV Agent
+
+- [ ] ATDD checklist provides clear implementation guidance
+- [ ] Implementation tasks are granular and actionable
+- [ ] data-testid requirements are complete and clear
+- [ ] Mock requirements include all necessary details
+- [ ] Execution commands work correctly
+
+### With Story Workflow
+
+- [ ] Story ID correctly referenced in output files
+- [ ] Acceptance criteria from story accurately reflected in tests
+- [ ] Technical constraints from story considered in test design
+
+### With Framework Workflow
+
+- [ ] Test framework configuration correctly detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With test-design Workflow (If Available)
+
+- [ ] P0 scenarios from test-design prioritized in ATDD
+- [ ] Risk assessment from test-design considered in test coverage
+- [ ] Coverage strategy from test-design aligned with ATDD tests
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Story acceptance criteria analyzed** and mapped to appropriate test levels
+- [ ] **Failing tests created** at all appropriate levels (E2E, API, Component)
+- [ ] **Given-When-Then format** used consistently across all tests
+- [ ] **RED phase verified** by local test run (all tests failing as expected)
+- [ ] **Network-first pattern** applied to E2E tests with network requests
+- [ ] **Data factories created** using faker (no hardcoded test data)
+- [ ] **Fixtures created** with auto-cleanup in teardown
+- [ ] **Mock requirements documented** for external services
+- [ ] **data-testid attributes listed** for DEV team
+- [ ] **Implementation checklist created** mapping tests to code tasks
+- [ ] **Red-green-refactor workflow documented** in ATDD checklist
+- [ ] **Execution commands provided** and verified to work
+- [ ] **ATDD checklist document created** and saved to correct location
+- [ ] **Output file formatted correctly** using template structure
+- [ ] **Knowledge base references applied** and documented in summary
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: Tests pass before implementation
+
+**Problem:** A test passes even though no implementation code exists yet.
+
+**Resolution:**
+
+- Review test to ensure it's testing actual behavior, not mocked/stubbed behavior
+- Check if test is accidentally using existing functionality
+- Verify test assertions are correct and meaningful
+- Rewrite test to fail until implementation is complete
+
+### Issue: Network-first pattern not applied
+
+**Problem:** Route interception happens after navigation, causing race conditions.
+
+**Resolution:**
+
+- Move `await page.route()` calls BEFORE `await page.goto()`
+- Review `network-first.md` knowledge fragment
+- Update all E2E tests to follow network-first pattern
+
+### Issue: Hardcoded test data in tests
+
+**Problem:** Tests use hardcoded strings/numbers instead of factories.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use `faker` for all random data generation
+- Update data-factories to support all required test scenarios
+
+### Issue: Fixtures missing auto-cleanup
+
+**Problem:** Fixtures create data but don't clean it up in teardown.
+
+**Resolution:**
+
+- Add cleanup logic after `await use(data)` in fixture
+- Call deletion/cleanup functions in teardown
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests have multiple assertions
+
+**Problem:** Tests verify multiple behaviors in single test (not atomic).
+
+**Resolution:**
+
+- Split into separate tests (one assertion per test)
+- Each test should verify exactly one behavior
+- Use descriptive test names to clarify what each test verifies
+
+### Issue: Tests depend on execution order
+
+**Problem:** Tests fail when run in isolation or different order.
+
+**Resolution:**
+
+- Remove shared state between tests
+- Each test should create its own test data
+- Use fixtures for consistent setup across tests
+- Verify tests can run with `.only` flag
+
+---
+
+## Notes for TEA Agent
+
+- **Preflight halt is critical:** Do not proceed if story has no acceptance criteria or framework is missing
+- **RED phase verification is mandatory:** Tests must fail before sharing with DEV team
+- **Network-first pattern:** Route interception BEFORE navigation prevents race conditions
+- **One assertion per test:** Atomic tests provide clear failure diagnosis
+- **Auto-cleanup is non-negotiable:** Every fixture must clean up data in teardown
+- **Use knowledge base:** Load relevant fragments (fixture-architecture, data-factories, network-first, component-tdd, test-quality) for guidance
+- **Share with DEV agent:** ATDD checklist provides implementation roadmap from red to green
--- a/src/modules/bmm/workflows/testarch/atdd/instructions.md
+++ b/src/modules/bmm/workflows/testarch/atdd/instructions.md
@@ -1,43 +1,669 @@
 <!-- Powered by BMAD-CORE™ -->

-# Acceptance TDD v3.0
+# Acceptance Test-Driven Development (ATDD)

-```xml
-<task id="bmad/bmm/testarch/atdd" name="Acceptance Test Driven Development">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Story is approved with clear acceptance criteria.</i>
-    <i>- Development sandbox/environment is ready.</i>
-    <i>- Framework scaffolding exists (run `*framework` if missing).</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm each requirement above; halt if any are missing.</action>
-    </step>
-    <step n="2" title="Author Failing Acceptance Tests">
-      <action>Clarify acceptance criteria and affected systems.</action>
-      <action>Select appropriate test level (E2E/API/Component).</action>
-      <action>Create failing tests using Given-When-Then with network interception before navigation.</action>
-      <action>Build data factories and fixture stubs for required entities.</action>
-      <action>Outline mocks/fixtures infrastructure the dev team must provide.</action>
-      <action>Generate component tests for critical UI logic.</action>
-      <action>Compile an implementation checklist mapping each test to code work.</action>
-      <action>Share failing tests and checklist with the dev agent, maintaining red → green → refactor loop.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Output failing acceptance test files, component test stubs, fixture/mocks skeleton, implementation checklist, and data-testid requirements.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If acceptance criteria are ambiguous or the framework is missing, halt and request clarification/set up.</i>
-  </halt>
-  <notes>
-    <i>Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify ATDD-related fragments (fixture-architecture, data-factories, component-tdd) and load them from `knowledge/`.</i>
-    <i>Start red; one assertion per test; keep setup visible (no hidden shared state).</i>
-    <i>Remind devs to run tests before writing production code; update checklist as tests turn green.</i>
-  </notes>
-  <output>
-    <i>Failing acceptance/component test suite plus implementation checklist.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/atdd`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. This workflow creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development.
+
+**Core Principle**: Tests fail first (red phase), then guide development to green, then enable confident refactoring.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story approved with clear acceptance criteria
+- ✅ Development sandbox/environment ready
+- ✅ Framework scaffolding exists (run `framework` workflow if missing)
+- ✅ Test framework configuration available (playwright.config.ts or cypress.config.ts)
+
+---
+
+## Step 1: Load Story Context and Requirements
+
+### Actions
+
+1. **Read Story Markdown**
+   - Load story file from `{story_file}` variable
+   - Extract acceptance criteria (all testable requirements)
+   - Identify affected systems and components
+   - Note any technical constraints or dependencies
+
+2. **Load Framework Configuration**
+   - Read framework config (playwright.config.ts or cypress.config.ts)
+   - Identify test directory structure
+   - Check existing fixture patterns
+   - Note test runner capabilities
+
+3. **Load Existing Test Patterns**
+   - Search `{test_dir}` for similar tests
+   - Identify reusable fixtures and helpers
+   - Check data factory patterns
+   - Note naming conventions
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `fixture-architecture.md` - Test fixture patterns with auto-cleanup
+   - `data-factories.md` - Factory patterns using faker
+   - `component-tdd.md` - Component test strategies
+   - `network-first.md` - Route interception patterns
+   - `test-quality.md` - Test design principles
+
+**Halt Condition:** If story has no acceptance criteria or framework is missing, HALT with message: "ATDD requires clear acceptance criteria and test framework setup"
+
+---
+
+## Step 2: Select Test Levels and Strategy
+
+### Actions
+
+1. **Analyze Acceptance Criteria**
+
+   For each acceptance criterion, determine:
+   - Does it require full user journey? → E2E test
+   - Does it test business logic/API contract? → API test
+   - Does it validate UI component behavior? → Component test
+   - Can it be unit tested? → Unit test
+
+2. **Apply Test Level Selection Framework**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   **E2E (End-to-End)**:
+   - Critical user journeys (login, checkout, core workflow)
+   - Multi-system integration
+   - User-facing acceptance criteria
+   - **Characteristics**: High confidence, slow execution, brittle
+
+   **API (Integration)**:
+   - Business logic validation
+   - Service contracts
+   - Data transformations
+   - **Characteristics**: Fast feedback, good balance, stable
+
+   **Component**:
+   - UI component behavior (buttons, forms, modals)
+   - Interaction testing
+   - Visual regression
+   - **Characteristics**: Fast, isolated, granular
+
+   **Unit**:
+   - Pure business logic
+   - Edge cases
+   - Error handling
+   - **Characteristics**: Fastest, most granular
+
+3. **Avoid Duplicate Coverage**
+
+   Don't test same behavior at multiple levels unless necessary:
+   - Use E2E for critical happy path only
+   - Use API tests for complex business logic variations
+   - Use component tests for UI interaction edge cases
+   - Use unit tests for pure logic edge cases
+
+4. **Prioritize Tests**
+
+   If test-design document exists, align with priority levels:
+   - P0 scenarios → Must cover in failing tests
+   - P1 scenarios → Should cover if time permits
+   - P2/P3 scenarios → Optional for this iteration
+
+**Decision Point:** Set `primary_level` variable to main test level for this story (typically E2E or API)
+
+---
+
+## Step 3: Generate Failing Tests
+
+### Actions
+
+1. **Create Test File Structure**
+
+   ```
+   tests/
+   ├── e2e/
+   │   └── {feature-name}.spec.ts        # E2E acceptance tests
+   ├── api/
+   │   └── {feature-name}.api.spec.ts    # API contract tests
+   ├── component/
+   │   └── {ComponentName}.test.tsx      # Component tests
+   └── support/
+       ├── fixtures/                      # Test fixtures
+       ├── factories/                     # Data factories
+       └── helpers/                       # Utility functions
+   ```
+
+2. **Write Failing E2E Tests (If Applicable)**
+
+   **Use Given-When-Then format:**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User Login', () => {
+     test('should display error for invalid credentials', async ({ page }) => {
+       // GIVEN: User is on login page
+       await page.goto('/login');
+
+       // WHEN: User submits invalid credentials
+       await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+       await page.fill('[data-testid="password-input"]', 'wrongpassword');
+       await page.click('[data-testid="login-button"]');
+
+       // THEN: Error message is displayed
+       await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+     });
+   });
+   ```
+
+   **Critical patterns:**
+   - One assertion per test (atomic tests)
+   - Explicit waits (no hard waits/sleeps)
+   - Network-first approach (route interception before navigation)
+   - data-testid selectors for stability
+   - Clear Given-When-Then structure
+
+3. **Apply Network-First Pattern**
+
+   **Knowledge Base Reference**: `network-first.md`
+
+   ```typescript
+   test('should load user dashboard after login', async ({ page }) => {
+     // CRITICAL: Intercept routes BEFORE navigation
+     await page.route('**/api/user', (route) =>
+       route.fulfill({
+         status: 200,
+         body: JSON.stringify({ id: 1, name: 'Test User' }),
+       }),
+     );
+
+     // NOW navigate
+     await page.goto('/dashboard');
+
+     await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+   });
+   ```
+
+4. **Write Failing API Tests (If Applicable)**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User API', () => {
+     test('POST /api/users - should create new user', async ({ request }) => {
+       // GIVEN: Valid user data
+       const userData = {
+         email: 'newuser@example.com',
+         name: 'New User',
+       };
+
+       // WHEN: Creating user via API
+       const response = await request.post('/api/users', {
+         data: userData,
+       });
+
+       // THEN: User is created successfully
+       expect(response.status()).toBe(201);
+       const body = await response.json();
+       expect(body).toMatchObject({
+         email: userData.email,
+         name: userData.name,
+         id: expect.any(Number),
+       });
+     });
+   });
+   ```
+
+5. **Write Failing Component Tests (If Applicable)**
+
+   **Knowledge Base Reference**: `component-tdd.md`
+
+   ```typescript
+   import { test, expect } from '@playwright/experimental-ct-react';
+   import { LoginForm } from './LoginForm';
+
+   test.describe('LoginForm Component', () => {
+     test('should disable submit button when fields are empty', async ({ mount }) => {
+       // GIVEN: LoginForm is mounted
+       const component = await mount(<LoginForm />);
+
+       // WHEN: Form is initially rendered
+       const submitButton = component.locator('button[type="submit"]');
+
+       // THEN: Submit button is disabled
+       await expect(submitButton).toBeDisabled();
+     });
+   });
+   ```
+
+6. **Verify Tests Fail Initially**
+
+   **Critical verification:**
+   - Run tests locally to confirm they fail
+   - Failure should be due to missing implementation, not test errors
+   - Failure messages should be clear and actionable
+   - All tests must be in RED phase before sharing with DEV
+
+**Important:** Tests MUST fail initially. If a test passes before implementation, it's not a valid acceptance test.
+
+---
+
+## Step 4: Build Data Infrastructure
+
+### Actions
+
+1. **Create Data Factories**
+
+   **Knowledge Base Reference**: `data-factories.md`
+
+   ```typescript
+   // tests/support/factories/user.factory.ts
+   import { faker } from '@faker-js/faker';
+
+   export const createUser = (overrides = {}) => ({
+     id: faker.number.int(),
+     email: faker.internet.email(),
+     name: faker.person.fullName(),
+     createdAt: faker.date.recent().toISOString(),
+     ...overrides,
+   });
+
+   export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+   ```
+
+   **Factory principles:**
+   - Use faker for random data (no hardcoded values)
+   - Support overrides for specific scenarios
+   - Generate complete valid objects
+   - Include helper functions for bulk creation
+
+2. **Create Test Fixtures**
+
+   **Knowledge Base Reference**: `fixture-architecture.md`
+
+   ```typescript
+   // tests/support/fixtures/auth.fixture.ts
+   import { test as base } from '@playwright/test';
+
+   export const test = base.extend({
+     authenticatedUser: async ({ page }, use) => {
+       // Setup: Create and authenticate user
+       const user = await createUser();
+       await page.goto('/login');
+       await page.fill('[data-testid="email"]', user.email);
+       await page.fill('[data-testid="password"]', 'password123');
+       await page.click('[data-testid="login-button"]');
+       await page.waitForURL('/dashboard');
+
+       // Provide to test
+       await use(user);
+
+       // Cleanup: Delete user
+       await deleteUser(user.id);
+     },
+   });
+   ```
+
+   **Fixture principles:**
+   - Auto-cleanup (always delete created data)
+   - Composable (fixtures can use other fixtures)
+   - Isolated (each test gets fresh data)
+   - Type-safe
+
+3. **Document Mock Requirements**
+
+   If external services need mocking, document requirements:
+
+   ```markdown
+   ### Mock Requirements for DEV Team
+
+   **Payment Gateway Mock**:
+
+   - Endpoint: `POST /api/payments`
+   - Success response: `{ status: 'success', transactionId: '123' }`
+   - Failure response: `{ status: 'failed', error: 'Insufficient funds' }`
+
+   **Email Service Mock**:
+
+   - Should not send real emails in test environment
+   - Log email contents for verification
+   ```
+
+4. **List Required data-testid Attributes**
+
+   ```markdown
+   ### Required data-testid Attributes
+
+   **Login Page**:
+
+   - `email-input` - Email input field
+   - `password-input` - Password input field
+   - `login-button` - Submit button
+   - `error-message` - Error message container
+
+   **Dashboard Page**:
+
+   - `user-name` - User name display
+   - `logout-button` - Logout button
+   ```
+
+---
+
+## Step 5: Create Implementation Checklist
+
+### Actions
+
+1. **Map Tests to Implementation Tasks**
+
+   For each failing test, create corresponding implementation task:
+
+   ```markdown
+   ## Implementation Checklist
+
+   ### Epic X - User Authentication
+
+   #### Test: User Login with Valid Credentials
+
+   - [ ] Create `/login` route
+   - [ ] Implement login form component
+   - [ ] Add email/password validation
+   - [ ] Integrate authentication API
+   - [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+   - [ ] Implement error handling
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+
+   #### Test: Display Error for Invalid Credentials
+
+   - [ ] Add error state management
+   - [ ] Display error message UI
+   - [ ] Add `data-testid="error-message"`
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+   ```
+
+2. **Include Red-Green-Refactor Guidance**
+
+   ```markdown
+   ## Red-Green-Refactor Workflow
+
+   **RED Phase** (Complete):
+
+   - ✅ All tests written and failing
+   - ✅ Fixtures and factories created
+   - ✅ Mock requirements documented
+
+   **GREEN Phase** (DEV Team):
+
+   1. Pick one failing test
+   2. Implement minimal code to make it pass
+   3. Run test to verify green
+   4. Move to next test
+   5. Repeat until all tests pass
+
+   **REFACTOR Phase** (DEV Team):
+
+   1. All tests passing (green)
+   2. Improve code quality
+   3. Extract duplications
+   4. Optimize performance
+   5. Ensure tests still pass
+   ```
+
+3. **Add Execution Commands**
+
+   ````markdown
+   ## Running Tests
+
+   ```bash
+   # Run all failing tests
+   npm run test:e2e
+
+   # Run specific test file
+   npm run test:e2e -- login.spec.ts
+
+   # Run tests in headed mode (see browser)
+   npm run test:e2e -- --headed
+
+   # Debug specific test
+   npm run test:e2e -- login.spec.ts --debug
+   ```
+   ````
+
+   ```
+
+   ```
+
+---
+
+## Step 6: Generate Deliverables
+
+### Actions
+
+1. **Create ATDD Checklist Document**
+
+   Use template structure at `{installed_path}/atdd-checklist-template.md`:
+   - Story summary
+   - Acceptance criteria breakdown
+   - Test files created (with paths)
+   - Data factories created
+   - Fixtures created
+   - Mock requirements
+   - Required data-testid attributes
+   - Implementation checklist
+   - Red-green-refactor workflow
+   - Execution commands
+
+2. **Verify All Tests Fail**
+
+   Before finalizing:
+   - Run full test suite locally
+   - Confirm all tests in RED phase
+   - Document expected failure messages
+   - Ensure failures are due to missing implementation, not test bugs
+
+3. **Write to Output File**
+
+   Save to `{output_folder}/atdd-checklist-{story_id}.md`
+
+---
+
+## Important Notes
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA responsibility):
+
+- Write failing tests first
+- Tests define expected behavior
+- Tests must fail for right reason (missing implementation)
+
+**GREEN Phase** (DEV responsibility):
+
+- Implement minimal code to pass tests
+- One test at a time
+- Don't over-engineer
+
+**REFACTOR Phase** (DEV responsibility):
+
+- Improve code quality with confidence
+- Tests provide safety net
+- Extract duplications, optimize
+
+### Given-When-Then Structure
+
+**GIVEN** (Setup):
+
+- Arrange test preconditions
+- Create necessary data
+- Navigate to starting point
+
+**WHEN** (Action):
+
+- Execute the behavior being tested
+- Single action per test
+
+**THEN** (Assertion):
+
+- Verify expected outcome
+- One assertion per test (atomic)
+
+### Network-First Testing
+
+**Critical pattern:**
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
 ```
+
+### Data Factory Best Practices
+
+**Use faker for all test data:**
+
+```typescript
+// ✅ CORRECT: Random data
+email: faker.internet.email();
+
+// ❌ WRONG: Hardcoded data (collisions, maintenance burden)
+email: 'test@example.com';
+```
+
+**Auto-cleanup principle:**
+
+- Every factory that creates data must provide cleanup
+- Fixtures automatically cleanup in teardown
+- No manual cleanup in test code
+
+### One Assertion Per Test
+
+**Atomic test design:**
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid.
+
+### Component Test Strategy
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard nav)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+### Knowledge Base Integration
+
+**Auto-load enabled:**
+
+- `fixture-architecture.md` - Fixture patterns
+- `data-factories.md` - Factory patterns
+- `component-tdd.md` - Component testing
+- `network-first.md` - Route interception
+
+**Manual reference:**
+
+- Use `tea-index.csv` to find additional fragments
+- Load `test-levels-framework.md` for level selection
+- Load `test-quality.md` for test design principles
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## ATDD Complete - Tests in RED Phase
+
+**Story**: {story_id}
+**Primary Test Level**: {primary_level}
+
+**Failing Tests Created**:
+
+- E2E tests: {e2e_count} tests in {e2e_files}
+- API tests: {api_count} tests in {api_files}
+- Component tests: {component_count} tests in {component_files}
+
+**Supporting Infrastructure**:
+
+- Data factories: {factory_count} factories created
+- Fixtures: {fixture_count} fixtures with auto-cleanup
+- Mock requirements: {mock_count} services documented
+
+**Implementation Checklist**:
+
+- Total tasks: {task_count}
+- Estimated effort: {effort_estimate} hours
+
+**Required data-testid Attributes**: {data_testid_count} attributes documented
+
+**Next Steps for DEV Team**:
+
+1. Run failing tests: `npm run test:e2e`
+2. Review implementation checklist
+3. Implement one test at a time (RED → GREEN)
+4. Refactor with confidence (tests provide safety net)
+5. Share progress in daily standup
+
+**Output File**: {output_file}
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture patterns
+- Data factory patterns with faker
+- Network-first route interception
+- Component TDD strategies
+- Test quality principles
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Story acceptance criteria analyzed and mapped to tests
+- [ ] Appropriate test levels selected (E2E, API, Component)
+- [ ] All tests written in Given-When-Then format
+- [ ] All tests fail initially (RED phase verified)
+- [ ] Network-first pattern applied (route interception before navigation)
+- [ ] Data factories created with faker
+- [ ] Fixtures created with auto-cleanup
+- [ ] Mock requirements documented for DEV team
+- [ ] Required data-testid attributes listed
+- [ ] Implementation checklist created with clear tasks
+- [ ] Red-green-refactor workflow documented
+- [ ] Execution commands provided
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/atdd/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/atdd/workflow.yaml
@@ -1,25 +1,81 @@
 # Test Architect workflow: atdd
 name: testarch-atdd
-description: "Generate failing acceptance tests before implementation."
+description: "Generate failing acceptance tests before implementation using TDD red-green-refactor cycle"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/atdd"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/atdd-checklist-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Story context
+  story_file: "" # Path to story markdown with acceptance criteria
+  test_dir: "{project-root}/tests"
+  test_framework: "" # Detected from framework workflow (playwright, cypress)
+
+  # Test level selection
+  test_levels: "e2e,api,component" # Which levels to generate
+  primary_level: "e2e" # Primary test level for acceptance criteria
+  include_component_tests: true # Generate component tests for UI logic
+
+  # ATDD approach
+  start_failing: true # Tests must fail initially (red phase)
+  use_given_when_then: true # BDD-style test structure
+  network_first: true # Route interception before navigation
+  one_assertion_per_test: true # Atomic test design
+
+  # Data and fixtures
+  generate_factories: true # Create data factory stubs
+  generate_fixtures: true # Create fixture architecture
+  auto_cleanup: true # Fixtures clean up their data
+
+  # Output configuration
+  output_checklist: "{output_folder}/atdd-checklist-{story_id}.md"
+  include_data_testids: true # List required data-testid attributes
+  include_mock_requirements: true # Document mock/stub needs
+
+  # Advanced options
+  auto_load_knowledge: true # Load fixture-architecture, data-factories, component-tdd fragments
+  share_with_dev: true # Provide implementation checklist to DEV agent
+
+# Output configuration
+default_output_file: "{output_folder}/atdd-checklist-{story_id}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story markdown, framework config
+  - write_file # Create test files, checklist, factory stubs
+  - create_directory # Create test directories
+  - list_files # Find existing fixtures and helpers
+  - search_repo # Search for similar test patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+  - existing_fixtures: "Current fixture patterns for consistency"
+  - test_design: "Test design document (optional, for risk/priority context)"

 tags:
  - qa
  - atdd
  - test-architect
+  - tdd
+  - red-green-refactor

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/automate/README.md
+++ b/src/modules/bmm/workflows/testarch/automate/README.md
@@ -0,0 +1,594 @@
+# Automate Workflow
+
+Expands test automation coverage by generating comprehensive test suites at appropriate levels (E2E, API, Component, Unit) with supporting infrastructure. This workflow operates in **dual mode** - works seamlessly WITH or WITHOUT BMad artifacts.
+
+**Core Principle**: Generate prioritized, deterministic tests that avoid duplicate coverage and follow testing best practices.
+
+## Usage
+
+```bash
+bmad tea *automate
+```
+
+The TEA agent runs this workflow when:
+
+- **BMad-Integrated**: After story implementation to expand coverage beyond ATDD tests
+- **Standalone**: Point at any codebase/feature and generate tests independently ("work out of thin air")
+- **Auto-discover**: No targets specified - scans codebase for features needing tests
+
+## Inputs
+
+**Execution Modes:**
+
+1. **BMad-Integrated Mode** (story available) - OPTIONAL
+2. **Standalone Mode** (no BMad artifacts) - Direct code analysis
+3. **Auto-discover Mode** (no targets) - Scan for coverage gaps
+
+**Required Context Files:**
+
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) - REQUIRED
+
+**Optional Context (BMad-Integrated Mode):**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria (enhances coverage targeting but NOT required)
+- **Tech spec**: Technical specification (provides architectural context)
+- **Test design**: Risk/priority context (P0-P3 alignment)
+- **PRD**: Product requirements (business context)
+
+**Optional Context (Standalone Mode):**
+
+- **Source code**: Feature implementation to analyze
+- **Existing tests**: Current test suite for gap analysis
+
+**Workflow Variables:**
+
+- `standalone_mode`: Can work without BMad artifacts (default: true)
+- `story_file`: Path to story markdown (optional)
+- `target_feature`: Feature name or directory to analyze (e.g., "user-authentication" or "src/auth/")
+- `target_files`: Specific files to analyze (comma-separated paths)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `source_dir`: Source code directory (default: `{project-root}/src`)
+- `auto_discover_features`: Automatically find features needing tests (default: true)
+- `analyze_coverage`: Check existing test coverage gaps (default: true)
+- `coverage_target`: Coverage strategy - "critical-paths", "comprehensive", "selective" (default: "critical-paths")
+- `test_levels`: Which levels to generate - "e2e,api,component,unit" (default: all)
+- `avoid_duplicate_coverage`: Don't test same behavior at multiple levels (default: true)
+- `include_p0`: Include P0 critical path tests (default: true)
+- `include_p1`: Include P1 high priority tests (default: true)
+- `include_p2`: Include P2 medium priority tests (default: true)
+- `include_p3`: Include P3 low priority tests (default: false)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `network_first`: Route interception before navigation (default: true)
+- `deterministic_waits`: No hard waits or sleeps (default: true)
+- `generate_fixtures`: Create/enhance fixture architecture (default: true)
+- `generate_factories`: Create/enhance data factories (default: true)
+- `update_helpers`: Add utility functions (default: true)
+- `use_test_design`: Load test-design.md if exists (default: true)
+- `use_tech_spec`: Load tech-spec.md if exists (default: true)
+- `use_prd`: Load PRD.md if exists (default: true)
+- `update_readme`: Update test README with new specs (default: true)
+- `update_package_scripts`: Add test execution scripts (default: true)
+- `output_summary`: Path for automation summary (default: `{output_folder}/automation-summary.md`)
+- `max_test_duration`: Maximum seconds per test (default: 90)
+- `max_file_lines`: Maximum lines per test file (default: 300)
+- `require_self_cleaning`: All tests must clean up data (default: true)
+- `auto_load_knowledge`: Load relevant knowledge fragments (default: true)
+- `run_tests_after_generation`: Verify tests pass/fail as expected (default: true)
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **Automation Summary** (`automation-summary.md`): Comprehensive report containing:
+  - Execution mode (BMad-Integrated, Standalone, Auto-discover)
+  - Feature analysis (source files analyzed, coverage gaps)
+  - Tests created (E2E, API, Component, Unit) with counts and paths
+  - Infrastructure created (fixtures, factories, helpers)
+  - Test execution instructions
+  - Coverage analysis (P0-P3 breakdown, coverage percentage)
+  - Definition of Done checklist
+  - Next steps and recommendations
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Critical user journeys (P0-P1)
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and contracts (P1-P2)
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI behavior (P1-P2)
+- **Unit tests** (`tests/unit/{module-name}.test.ts`): Pure logic (P2-P3)
+
+**Supporting Infrastructure:**
+
+- **Fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Setup/teardown with auto-cleanup
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Random test data using faker
+- **Helpers** (`tests/support/helpers/{utility}.ts`): Utility functions (waitFor, retry, etc.)
+
+**Documentation Updates:**
+
+- **Test README** (`tests/README.md`): Test suite overview, execution instructions, priority tagging, patterns
+- **package.json scripts**: Test execution commands (test:e2e, test:e2e:p0, test:api, etc.)
+
+**Validation Safeguards:**
+
+- All tests follow Given-When-Then format
+- All tests have priority tags ([P0], [P1], [P2], [P3])
+- All tests use data-testid selectors (stable, not CSS classes)
+- All tests are self-cleaning (fixtures with auto-cleanup)
+- No hard waits or flaky patterns (deterministic)
+- Test files under 300 lines (lean and focused)
+- Tests run under 1.5 minutes each (fast feedback)
+
+## Key Features
+
+### Dual-Mode Operation
+
+**BMad-Integrated Mode** (story available):
+
+- Uses story acceptance criteria for coverage targeting
+- Aligns with test-design risk/priority assessment
+- Expands ATDD tests with edge cases and negative paths
+- Optional - story enhances coverage but not required
+
+**Standalone Mode** (no story):
+
+- Analyzes source code independently
+- Identifies coverage gaps automatically
+- Generates tests based on code analysis
+- Works with any project (BMad or non-BMad)
+
+**Auto-discover Mode** (no targets):
+
+- Scans codebase for features needing tests
+- Prioritizes features with no coverage
+- Generates comprehensive test plan
+
+### Avoid Duplicate Coverage
+
+**Critical principle**: Don't test same behavior at multiple levels
+
+**Good coverage strategy:**
+
+- **E2E**: User can login → Dashboard loads (critical happy path only)
+- **API**: POST /auth/login returns correct status codes (variations: 200, 401, 400)
+- **Component**: LoginForm validates input (UI edge cases: empty fields, invalid format)
+- **Unit**: validateEmail() logic (pure function edge cases)
+
+**Bad coverage (duplicate):**
+
+- E2E: User can login → Dashboard loads
+- E2E: User can login with different emails → Dashboard loads (unnecessary duplication)
+- API: POST /auth/login returns 200 (already covered in E2E)
+
+Use E2E sparingly for critical paths. Use API/Component/Unit for variations and edge cases.
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- State management within component
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical - Every commit)**:
+
+- Critical user paths that must always work
+- Security-critical functionality (auth, permissions)
+- Data integrity scenarios
+- Run in pre-commit hooks or PR checks
+
+**P1 (High - PR to main)**:
+
+- Important features with high user impact
+- Integration points between systems
+- Error handling for common failures
+- Run before merging to main branch
+
+**P2 (Medium - Nightly)**:
+
+- Edge cases with moderate impact
+- Less-critical feature variations
+- Performance/load testing
+- Run in nightly CI builds
+
+**P3 (Low - On-demand)**:
+
+- Nice-to-have validations
+- Rarely-used features
+- Exploratory testing scenarios
+- Run manually or weekly
+
+**Priority tagging enables selective execution:**
+
+```bash
+npm run test:e2e:p0  # Run only P0 tests (critical paths)
+npm run test:e2e:p1  # Run P0 + P1 tests (pre-merge)
+```
+
+### Given-When-Then Test Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('[P0] should login with valid credentials and load dashboard', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits valid credentials
+  await page.fill('[data-testid="email-input"]', 'user@example.com');
+  await page.fill('[data-testid="password-input"]', 'Password123!');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: User is redirected to dashboard
+  await expect(page).toHaveURL('/dashboard');
+  await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+});
+```
+
+### One Assertion Per Test (Atomic Design)
+
+Each test verifies exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('[P0] should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('[P0] should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+test('should load user dashboard after login', async ({ page }) => {
+  // CRITICAL: Intercept routes BEFORE navigation
+  await page.route('**/api/user', (route) =>
+    route.fulfill({
+      status: 200,
+      body: JSON.stringify({ id: 1, name: 'Test User' }),
+    }),
+  );
+
+  // NOW navigate
+  await page.goto('/dashboard');
+
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+});
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+import { createUser, deleteUser } from '../factories/user.factory';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', user.password);
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user automatically
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  password: faker.internet.password(),
+  name: faker.person.fullName(),
+  role: 'user',
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+
+// API helper for cleanup
+export const deleteUser = async (userId: number) => {
+  await fetch(`/api/users/${userId}`, { method: 'DELETE' });
+};
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation and cleanup
+
+### No Page Objects
+
+**Do NOT create page object classes.** Keep tests simple and direct:
+
+```typescript
+// ✅ CORRECT: Direct test
+test('should login', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', 'user@example.com');
+  await page.click('[data-testid="login-button"]');
+  await expect(page).toHaveURL('/dashboard');
+});
+
+// ❌ WRONG: Page object abstraction
+class LoginPage {
+  async login(email, password) { ... }
+}
+```
+
+Use fixtures for setup/teardown, not page objects for actions.
+
+### Deterministic Tests Only
+
+**No flaky patterns allowed:**
+
+```typescript
+// ❌ WRONG: Hard wait
+await page.waitForTimeout(2000);
+
+// ✅ CORRECT: Explicit wait
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Conditional flow
+if (await element.isVisible()) {
+  await element.click();
+}
+
+// ✅ CORRECT: Deterministic assertion
+await expect(element).toBeVisible();
+await element.click();
+
+// ❌ WRONG: Try-catch for test logic
+try {
+  await element.click();
+} catch (e) {
+  // Test shouldn't catch errors
+}
+
+// ✅ CORRECT: Let test fail if element not found
+await element.click();
+```
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Establish test framework architecture (Playwright/Cypress config, directory structure) - REQUIRED
+- **test-design** workflow: Optional for P0-P3 priority alignment and risk assessment context (BMad-Integrated mode only)
+- **atdd** workflow: Optional - automate expands beyond ATDD tests with edge cases (BMad-Integrated mode only)
+
+**After this workflow:**
+
+- **trace** workflow: Update traceability matrix with new test coverage
+- **gate** workflow: Quality gate decision using test results
+- **CI pipeline**: Run tests in burn-in loop to detect flaky patterns
+
+**Coordinates with:**
+
+- **DEV agent**: Tests validate implementation correctness
+- **Story workflow**: Tests cover acceptance criteria (BMad-Integrated mode only)
+
+## Important Notes
+
+### Works Out of Thin Air
+
+**automate does NOT require BMad artifacts:**
+
+- Can analyze any codebase independently
+- User can point TEA at a feature: "automate tests for src/auth/"
+- Works on non-BMad projects
+- BMad artifacts (story, tech-spec, PRD) are OPTIONAL enhancements, not requirements
+
+**Similar to:**
+
+- **framework**: Can scaffold tests on any project
+- **ci**: Can generate CI config without BMad context
+
+**Different from:**
+
+- **atdd**: REQUIRES story with acceptance criteria (halt if missing)
+- **test-design**: REQUIRES PRD/epic context (halt if missing)
+- **gate**: REQUIRES test results (halt if missing)
+
+### File Size Limits
+
+**Keep test files lean (under 300 lines):**
+
+- If file exceeds limit, split into multiple files by feature area
+- Group related tests in describe blocks
+- Extract common setup to fixtures
+
+### Quality Standards Enforced
+
+**Every test must:**
+
+- ✅ Use Given-When-Then format
+- ✅ Have clear, descriptive name with priority tag
+- ✅ One assertion per test (atomic)
+- ✅ No hard waits or sleeps
+- ✅ Use data-testid selectors (not CSS classes)
+- ✅ Self-cleaning (fixtures with auto-cleanup)
+- ✅ Deterministic (no flaky patterns)
+- ✅ Fast (under 90 seconds)
+
+**Forbidden patterns:**
+
+- ❌ Hard waits: `await page.waitForTimeout(2000)`
+- ❌ Conditional flow: `if (await element.isVisible()) { ... }`
+- ❌ Try-catch for test logic
+- ❌ Hardcoded test data (use factories with faker)
+- ❌ Page objects
+- ❌ Shared state between tests
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-levels-framework.md** - Test level selection (E2E vs API vs Component vs Unit) with characteristics and use cases
+- **test-priorities.md** - Priority classification (P0-P3) with execution timing and risk alignment
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides
+- **selective-testing.md** - Targeted test execution strategies for CI optimization
+- **ci-burn-in.md** - Flaky test detection patterns (10 iterations to catch intermittent failures)
+- **test-quality.md** - Test design principles (Given-When-Then, determinism, isolation, atomic assertions)
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+## Example Output
+
+### BMad-Integrated Mode
+
+````markdown
+# Automation Summary - User Authentication
+
+**Date:** 2025-10-14
+**Story:** Epic 3, Story 5
+**Coverage Target:** critical-paths
+
+## Tests Created
+
+### E2E Tests (2 tests, P0-P1)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - [P0] Login with valid credentials → Dashboard loads
+  - [P1] Display error for invalid credentials
+
+### API Tests (3 tests, P1-P2)
+
+- `tests/api/auth.api.spec.ts` (102 lines)
+  - [P1] POST /auth/login - valid credentials → 200 + token
+  - [P1] POST /auth/login - invalid credentials → 401 + error
+  - [P2] POST /auth/login - missing fields → 400 + validation
+
+### Component Tests (2 tests, P1)
+
+- `tests/component/LoginForm.test.tsx` (45 lines)
+  - [P1] Empty fields → submit button disabled
+  - [P1] Valid input → submit button enabled
+
+## Infrastructure Created
+
+- Fixtures: `tests/support/fixtures/auth.fixture.ts`
+- Factories: `tests/support/factories/user.factory.ts`
+
+## Test Execution
+
+```bash
+npm run test:e2e       # Run all tests
+npm run test:e2e:p0    # Critical paths only
+npm run test:e2e:p1    # P0 + P1 tests
+```
+````
+
+## Coverage Analysis
+
+**Total:** 7 tests (P0: 1, P1: 5, P2: 1)
+**Levels:** E2E: 2, API: 3, Component: 2
+
+✅ All acceptance criteria covered
+✅ Happy path (E2E + API)
+✅ Error cases (API)
+✅ UI validation (Component)
+
+````
+
+### Standalone Mode
+
+```markdown
+# Automation Summary - src/auth/
+
+**Date:** 2025-10-14
+**Target:** src/auth/ (standalone analysis)
+**Coverage Target:** critical-paths
+
+## Feature Analysis
+
+**Source Files Analyzed:**
+- `src/auth/login.ts`
+- `src/auth/session.ts`
+- `src/auth/validation.ts`
+
+**Existing Coverage:** 0 tests found
+
+**Coverage Gaps:**
+- ❌ No E2E tests for login flow
+- ❌ No API tests for /auth/login endpoint
+- ❌ No unit tests for validateEmail()
+
+## Tests Created
+
+{Same structure as BMad-Integrated mode}
+
+## Recommendations
+
+1. **High Priority (P0-P1):**
+   - Add E2E test for password reset flow
+   - Add API tests for token refresh endpoint
+
+2. **Medium Priority (P2):**
+   - Add unit tests for session timeout logic
+````
+
+Ready to continue?
--- a/src/modules/bmm/workflows/testarch/automate/checklist.md
+++ b/src/modules/bmm/workflows/testarch/automate/checklist.md
@@ -0,0 +1,509 @@
+# Automate Workflow Validation Checklist
+
+Use this checklist to validate that the automate workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Framework scaffolding configured (playwright.config.ts or cypress.config.ts exists)
+- [ ] Test directory structure exists (tests/ folder with subdirectories)
+- [ ] Package.json has test framework dependencies installed
+
+**Halt only if:** Framework scaffolding is completely missing (run `framework` workflow first)
+
+**Note:** BMad artifacts (story, tech-spec, PRD) are OPTIONAL - workflow can run without them
+
+---
+
+## Step 1: Execution Mode Determination and Context Loading
+
+### Mode Detection
+
+- [ ] Execution mode correctly determined:
+  - [ ] BMad-Integrated Mode (story_file variable set) OR
+  - [ ] Standalone Mode (target_feature or target_files set) OR
+  - [ ] Auto-discover Mode (no targets specified)
+
+### BMad Artifacts (If Available - OPTIONAL)
+
+- [ ] Story markdown loaded (if `{story_file}` provided)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Tech-spec.md loaded (if `{use_tech_spec}` true and file exists)
+- [ ] Test-design.md loaded (if `{use_test_design}` true and file exists)
+- [ ] PRD.md loaded (if `{use_prd}` true and file exists)
+- [ ] **Note**: Absence of BMad artifacts does NOT halt workflow
+
+### Framework Configuration
+
+- [ ] Test framework config loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from `{test_dir}`
+- [ ] Existing test patterns reviewed
+- [ ] Test runner capabilities noted (parallel execution, fixtures, etc.)
+
+### Coverage Analysis
+
+- [ ] Existing test files searched in `{test_dir}` (if `{analyze_coverage}` true)
+- [ ] Tested features vs untested features identified
+- [ ] Coverage gaps mapped (tests to source files)
+- [ ] Existing fixture and factory patterns checked
+
+### Knowledge Base Fragments Loaded
+
+- [ ] `test-levels-framework.md` - Test level selection
+- [ ] `test-priorities.md` - Priority classification (P0-P3)
+- [ ] `fixture-architecture.md` - Fixture patterns with auto-cleanup
+- [ ] `data-factories.md` - Factory patterns using faker
+- [ ] `selective-testing.md` - Targeted test execution strategies
+- [ ] `ci-burn-in.md` - Flaky test detection patterns
+- [ ] `test-quality.md` - Test design principles
+
+---
+
+## Step 2: Automation Targets Identification
+
+### Target Determination
+
+**BMad-Integrated Mode (if story available):**
+
+- [ ] Acceptance criteria mapped to test scenarios
+- [ ] Features implemented in story identified
+- [ ] Existing ATDD tests checked (if any)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+
+**Standalone Mode (if no story):**
+
+- [ ] Specific feature analyzed (if `{target_feature}` specified)
+- [ ] Specific files analyzed (if `{target_files}` specified)
+- [ ] Features auto-discovered (if `{auto_discover_features}` true)
+- [ ] Features prioritized by:
+  - [ ] No test coverage (highest priority)
+  - [ ] Complex business logic
+  - [ ] External integrations (API, database, auth)
+  - [ ] Critical user paths (login, checkout, etc.)
+
+### Test Level Selection
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] E2E tests identified: Critical user journeys, multi-system integration
+- [ ] API tests identified: Business logic, service contracts, data transformations
+- [ ] Component tests identified: UI behavior, interactions, state management
+- [ ] Unit tests identified: Pure logic, edge cases, error handling
+
+### Duplicate Coverage Avoidance
+
+- [ ] Same behavior NOT tested at multiple levels unnecessarily
+- [ ] E2E used for critical happy path only
+- [ ] API tests used for business logic variations
+- [ ] Component tests used for UI interaction edge cases
+- [ ] Unit tests used for pure logic edge cases
+
+### Priority Assignment
+
+- [ ] Test priorities assigned using `test-priorities.md` framework
+- [ ] P0 tests: Critical paths, security-critical, data integrity
+- [ ] P1 tests: Important features, integration points, error handling
+- [ ] P2 tests: Edge cases, less-critical variations, performance
+- [ ] P3 tests: Nice-to-have, rarely-used features, exploratory
+- [ ] Priority variables respected:
+  - [ ] `{include_p0}` = true (always include)
+  - [ ] `{include_p1}` = true (high priority)
+  - [ ] `{include_p2}` = true (medium priority)
+  - [ ] `{include_p3}` = false (low priority, skip by default)
+
+### Coverage Plan Created
+
+- [ ] Test coverage plan documented
+- [ ] What will be tested at each level listed
+- [ ] Priorities assigned to each test
+- [ ] Coverage strategy clear (critical-paths, comprehensive, or selective)
+
+---
+
+## Step 3: Test Infrastructure Generated
+
+### Fixture Architecture
+
+- [ ] Existing fixtures checked in `tests/support/fixtures/`
+- [ ] Fixture architecture created/enhanced (if `{generate_fixtures}` true)
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] All fixtures have auto-cleanup in teardown
+- [ ] Common fixtures created/enhanced:
+  - [ ] authenticatedUser (with auto-delete)
+  - [ ] apiRequest (authenticated client)
+  - [ ] mockNetwork (external service mocking)
+  - [ ] testDatabase (with auto-cleanup)
+
+### Data Factories
+
+- [ ] Existing factories checked in `tests/support/factories/`
+- [ ] Factory architecture created/enhanced (if `{generate_factories}` true)
+- [ ] All factories use `@faker-js/faker` for random data (no hardcoded values)
+- [ ] All factories support overrides for specific scenarios
+- [ ] Common factories created/enhanced:
+  - [ ] User factory (email, password, name, role)
+  - [ ] Product factory (name, price, SKU)
+  - [ ] Order factory (items, total, status)
+- [ ] Cleanup helpers provided (e.g., deleteUser(), deleteProduct())
+
+### Helper Utilities
+
+- [ ] Existing helpers checked in `tests/support/helpers/` (if `{update_helpers}` true)
+- [ ] Common utilities created/enhanced:
+  - [ ] waitFor (polling for complex conditions)
+  - [ ] retry (retry helper for flaky operations)
+  - [ ] testData (test data generation)
+  - [ ] assertions (custom assertion helpers)
+
+---
+
+## Step 4: Test Files Generated
+
+### Test File Structure
+
+- [ ] Test files organized correctly:
+  - [ ] `tests/e2e/` for E2E tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/unit/` for unit tests
+  - [ ] `tests/support/` for fixtures/factories/helpers
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags ([P0], [P1], [P2], [P3]) in test name
+- [ ] All tests use data-testid selectors (not CSS classes)
+- [ ] One assertion per test (atomic design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Clear Given-When-Then comments in test code
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] JWT token format validated (if auth tests)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management validated
+- [ ] Props and events tested
+
+### Unit Tests (If Applicable)
+
+- [ ] Unit test files created in `tests/unit/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Pure logic tested (no dependencies)
+- [ ] Edge cases covered
+- [ ] Error handling tested
+
+### Quality Standards Enforced
+
+- [ ] All tests use Given-When-Then format with clear comments
+- [ ] All tests have descriptive names with priority tags
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+- [ ] All tests use data-testid selectors (E2E tests)
+- [ ] No hard waits: `await page.waitForTimeout()` (forbidden)
+- [ ] No conditional flow: `if (await element.isVisible())` (forbidden)
+- [ ] No try-catch for test logic (only for cleanup)
+- [ ] No hardcoded test data (use factories with faker)
+- [ ] No page object classes (tests are direct and simple)
+- [ ] No shared state between tests
+
+### Network-First Pattern Applied
+
+- [ ] Route interception set up BEFORE navigation (E2E tests with network requests)
+- [ ] `page.route()` called before `page.goto()` to prevent race conditions
+- [ ] Network-first pattern verified in all E2E tests that make API calls
+
+---
+
+## Step 5: Documentation and Scripts Updated
+
+### Test README Updated
+
+- [ ] `tests/README.md` created or updated (if `{update_readme}` true)
+- [ ] Test suite structure overview included
+- [ ] Test execution instructions provided (all, specific files, by priority)
+- [ ] Fixture usage examples provided
+- [ ] Factory usage examples provided
+- [ ] Priority tagging convention explained ([P0], [P1], [P2], [P3])
+- [ ] How to write new tests documented
+- [ ] Common patterns documented
+- [ ] Anti-patterns documented (what to avoid)
+
+### package.json Scripts Updated
+
+- [ ] package.json scripts added/updated (if `{update_package_scripts}` true)
+- [ ] `test:e2e` script for all E2E tests
+- [ ] `test:e2e:p0` script for P0 tests only
+- [ ] `test:e2e:p1` script for P0 + P1 tests
+- [ ] `test:api` script for API tests
+- [ ] `test:component` script for component tests
+- [ ] `test:unit` script for unit tests (if applicable)
+
+### Test Suite Executed
+
+- [ ] Test suite run locally (if `{run_tests_after_generation}` true)
+- [ ] Test results captured (passing/failing counts)
+- [ ] No flaky patterns detected (tests are deterministic)
+- [ ] Setup requirements documented (if any)
+- [ ] Known issues documented (if any)
+
+---
+
+## Step 6: Automation Summary Generated
+
+### Automation Summary Document
+
+- [ ] Output file created at `{output_summary}`
+- [ ] Document includes execution mode (BMad-Integrated, Standalone, Auto-discover)
+- [ ] Feature analysis included (source files, coverage gaps) - Standalone mode
+- [ ] Tests created listed (E2E, API, Component, Unit) with counts and paths
+- [ ] Infrastructure created listed (fixtures, factories, helpers)
+- [ ] Test execution instructions provided
+- [ ] Coverage analysis included:
+  - [ ] Total test count
+  - [ ] Priority breakdown (P0, P1, P2, P3 counts)
+  - [ ] Test level breakdown (E2E, API, Component, Unit counts)
+  - [ ] Coverage percentage (if calculated)
+  - [ ] Coverage status (acceptance criteria covered, gaps identified)
+- [ ] Definition of Done checklist included
+- [ ] Next steps provided
+- [ ] Recommendations included (if Standalone mode)
+
+### Summary Provided to User
+
+- [ ] Concise summary output provided
+- [ ] Total tests created across test levels
+- [ ] Priority breakdown (P0, P1, P2, P3 counts)
+- [ ] Infrastructure counts (fixtures, factories, helpers)
+- [ ] Test execution command provided
+- [ ] Output file path provided
+- [ ] Next steps listed
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories/fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+- [ ] Tests are lean (files under {max_file_lines} lines)
+
+### Knowledge Base Integration
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] Priority classification applied (from `test-priorities.md`)
+- [ ] Fixture architecture patterns applied (from `fixture-architecture.md`)
+- [ ] Data factory patterns applied (from `data-factories.md`)
+- [ ] Selective testing strategies considered (from `selective-testing.md`)
+- [ ] Flaky test detection patterns considered (from `ci-burn-in.md`)
+- [ ] Test quality principles applied (from `test-quality.md`)
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+- [ ] No console.log or debug statements in test code
+
+---
+
+## Integration Points
+
+### With Framework Workflow
+
+- [ ] Test framework configuration detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With BMad Workflows (If Available - OPTIONAL)
+
+**With Story Workflow:**
+
+- [ ] Story ID correctly referenced in output (if story available)
+- [ ] Acceptance criteria from story reflected in tests (if story available)
+- [ ] Technical constraints from story considered (if story available)
+
+**With test-design Workflow:**
+
+- [ ] P0 scenarios from test-design prioritized (if test-design available)
+- [ ] Risk assessment from test-design considered (if test-design available)
+- [ ] Coverage strategy aligned with test-design (if test-design available)
+
+**With atdd Workflow:**
+
+- [ ] Existing ATDD tests checked (if story had ATDD workflow run)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+- [ ] No duplicate coverage with ATDD tests
+
+### With CI Pipeline
+
+- [ ] Tests can run in CI environment
+- [ ] Tests are parallelizable (no shared state)
+- [ ] Tests have appropriate timeouts
+- [ ] Tests clean up their data (no CI environment pollution)
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Execution mode determined** (BMad-Integrated, Standalone, or Auto-discover)
+- [ ] **Framework configuration loaded** and validated
+- [ ] **Coverage analysis completed** (gaps identified if analyze_coverage true)
+- [ ] **Automation targets identified** (what needs testing)
+- [ ] **Test levels selected** appropriately (E2E, API, Component, Unit)
+- [ ] **Duplicate coverage avoided** (same behavior not tested at multiple levels)
+- [ ] **Test priorities assigned** (P0, P1, P2, P3)
+- [ ] **Fixture architecture created/enhanced** with auto-cleanup
+- [ ] **Data factories created/enhanced** using faker (no hardcoded data)
+- [ ] **Helper utilities created/enhanced** (if needed)
+- [ ] **Test files generated** at appropriate levels (E2E, API, Component, Unit)
+- [ ] **Given-When-Then format used** consistently across all tests
+- [ ] **Priority tags added** to all test names ([P0], [P1], [P2], [P3])
+- [ ] **data-testid selectors used** in E2E tests (not CSS classes)
+- [ ] **Network-first pattern applied** (route interception before navigation)
+- [ ] **Quality standards enforced** (no hard waits, no flaky patterns, self-cleaning, deterministic)
+- [ ] **Test README updated** with execution instructions and patterns
+- [ ] **package.json scripts updated** with test execution commands
+- [ ] **Test suite run locally** (if run_tests_after_generation true)
+- [ ] **Automation summary created** and saved to correct location
+- [ ] **Output file formatted correctly**
+- [ ] **Knowledge base references applied** and documented
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data, page objects)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: BMad artifacts not found
+
+**Problem:** Story, tech-spec, or PRD files not found when variables are set.
+
+**Resolution:**
+
+- **automate does NOT require BMad artifacts** - they are OPTIONAL enhancements
+- If files not found, switch to Standalone Mode automatically
+- Analyze source code directly without BMad context
+- Continue workflow without halting
+
+### Issue: Framework configuration not found
+
+**Problem:** No playwright.config.ts or cypress.config.ts found.
+
+**Resolution:**
+
+- **HALT workflow** - framework is required
+- Message: "Framework scaffolding required. Run `bmad tea *framework` first."
+- User must run framework workflow before automate
+
+### Issue: No automation targets identified
+
+**Problem:** Neither story, target_feature, nor target_files specified, and auto-discover finds nothing.
+
+**Resolution:**
+
+- Check if source_dir variable is correct
+- Verify source code exists in project
+- Ask user to specify target_feature or target_files explicitly
+- Provide examples: `target_feature: "src/auth/"` or `target_files: "src/auth/login.ts,src/auth/session.ts"`
+
+### Issue: Duplicate coverage detected
+
+**Problem:** Same behavior tested at multiple levels (E2E + API + Component).
+
+**Resolution:**
+
+- Review test level selection framework (test-levels-framework.md)
+- Use E2E for critical happy path ONLY
+- Use API for business logic variations
+- Use Component for UI edge cases
+- Remove redundant tests that duplicate coverage
+
+### Issue: Tests have hardcoded data
+
+**Problem:** Tests use hardcoded email addresses, passwords, or other data.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use faker for all random data generation
+- Update data-factories to support all required test scenarios
+- Example: `createUser({ email: faker.internet.email() })`
+
+### Issue: Tests are flaky
+
+**Problem:** Tests fail intermittently, pass on retry.
+
+**Resolution:**
+
+- Remove all hard waits (`page.waitForTimeout()`)
+- Use explicit waits (`page.waitForSelector()`)
+- Apply network-first pattern (route interception before navigation)
+- Remove conditional flow (`if (await element.isVisible())`)
+- Ensure tests are deterministic (no race conditions)
+- Run burn-in loop (10 iterations) to detect flakiness
+
+### Issue: Fixtures don't clean up data
+
+**Problem:** Test data persists after test run, causing test pollution.
+
+**Resolution:**
+
+- Ensure all fixtures have cleanup in teardown phase
+- Cleanup happens AFTER `await use(data)`
+- Call deletion/cleanup functions (deleteUser, deleteProduct, etc.)
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests too slow
+
+**Problem:** Tests take longer than 90 seconds (max_test_duration).
+
+**Resolution:**
+
+- Remove unnecessary waits and delays
+- Use parallel execution where possible
+- Mock external services (don't make real API calls)
+- Use API tests instead of E2E for business logic
+- Optimize test data creation (use in-memory database, etc.)
+
+---
+
+## Notes for TEA Agent
+
+- **automate is flexible:** Can work with or without BMad artifacts (story, tech-spec, PRD are OPTIONAL)
+- **Standalone mode is powerful:** Analyze any codebase and generate tests independently
+- **Auto-discover mode:** Scan codebase for features needing tests when no targets specified
+- **Framework is the ONLY hard requirement:** HALT if framework config missing, otherwise proceed
+- **Avoid duplicate coverage:** E2E for critical paths only, API/Component for variations
+- **Priority tagging enables selective execution:** P0 tests run on every commit, P1 on PR, P2 nightly
+- **Network-first pattern prevents race conditions:** Route interception BEFORE navigation
+- **No page objects:** Keep tests simple, direct, and maintainable
+- **Use knowledge base:** Load relevant fragments (test-levels, test-priorities, fixture-architecture, data-factories) for guidance
+- **Deterministic tests only:** No hard waits, no conditional flow, no flaky patterns allowed
--- a/src/modules/bmm/workflows/testarch/automate/instructions.md
+++ b/src/modules/bmm/workflows/testarch/automate/instructions.md
--- a/src/modules/bmm/workflows/testarch/automate/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/automate/workflow.yaml
@@ -1,25 +1,109 @@
 # Test Architect workflow: automate
 name: testarch-automate
-description: "Expand automation coverage after implementation."
+description: "Expand test automation coverage after implementation or analyze existing codebase to generate comprehensive test suite"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/automate"
 instructions: "{installed_path}/instructions.md"
-
+validation: "{installed_path}/checklist.md"
 template: false

+# Variables and inputs
+variables:
+  # Execution mode
+  standalone_mode: true # Can work without BMad artifacts (true) or integrate with BMad (false)
+
+  # Target specification (flexible - can be story, feature, or directory)
+  story_file: "" # Path to story markdown (optional - only if BMad workflow)
+  target_feature: "" # Feature name or directory to analyze (e.g., "user-authentication" or "src/auth/")
+  target_files: "" # Specific files to analyze (comma-separated paths)
+
+  # Discovery and analysis
+  test_dir: "{project-root}/tests"
+  source_dir: "{project-root}/src"
+  auto_discover_features: true # Automatically find features needing tests
+  analyze_coverage: true # Check existing test coverage gaps
+
+  # Coverage strategy
+  coverage_target: "critical-paths" # critical-paths, comprehensive, selective
+  test_levels: "e2e,api,component,unit" # Which levels to generate (comma-separated)
+  avoid_duplicate_coverage: true # Don't test same behavior at multiple levels
+
+  # Test priorities (from test-priorities.md knowledge fragment)
+  include_p0: true # Critical paths (every commit)
+  include_p1: true # High priority (PR to main)
+  include_p2: true # Medium priority (nightly)
+  include_p3: false # Low priority (on-demand)
+
+  # Test design principles
+  use_given_when_then: true # BDD-style test structure
+  one_assertion_per_test: true # Atomic test design
+  network_first: true # Route interception before navigation
+  deterministic_waits: true # No hard waits or sleeps
+
+  # Infrastructure generation
+  generate_fixtures: true # Create/enhance fixture architecture
+  generate_factories: true # Create/enhance data factories
+  update_helpers: true # Add utility functions
+
+  # Integration with BMad artifacts (when available)
+  use_test_design: true # Load test-design.md if exists
+  use_tech_spec: true # Load tech-spec.md if exists
+  use_prd: true # Load PRD.md if exists
+
+  # Output configuration
+  update_readme: true # Update test README with new specs
+  update_package_scripts: true # Add test execution scripts
+  output_summary: "{output_folder}/automation-summary.md"
+
+  # Quality gates
+  max_test_duration: 90 # seconds (1.5 minutes per test)
+  max_file_lines: 300 # lines (keep tests lean)
+  require_self_cleaning: true # All tests must clean up data
+
+  # Advanced options
+  auto_load_knowledge: true # Load test-levels, test-priorities, fixture-architecture, selective-testing, ci-burn-in
+  run_tests_after_generation: true # Verify tests pass/fail as expected
+
+# Output configuration
+default_output_file: "{output_folder}/automation-summary.md"
+
+# Required tools
+required_tools:
+  - read_file # Read source code, existing tests, BMad artifacts
+  - write_file # Create test files, fixtures, factories, summaries
+  - create_directory # Create test directories
+  - list_files # Discover features and existing tests
+  - search_repo # Find coverage gaps and patterns
+  - glob # Find test files and source files
+
+# Recommended inputs (optional - depends on mode)
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (optional - BMad mode only)"
+  - tech_spec: "Technical specification (optional - BMad mode only)"
+  - test_design: "Test design document with risk/priority (optional - BMad mode only)"
+  - source_code: "Feature implementation to analyze (required for standalone mode)"
+  - existing_tests: "Current test suite for gap analysis (always helpful)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+
 tags:
  - qa
  - automation
  - test-architect
+  - regression
+  - coverage

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/ci/README.md
+++ b/src/modules/bmm/workflows/testarch/ci/README.md
@@ -0,0 +1,493 @@
+# CI/CD Pipeline Setup Workflow
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, and artifact collection. This workflow creates platform-specific CI configuration optimized for fast feedback (< 45 min total) and reliable test execution with 20× speedup over sequential runs.
+
+## Usage
+
+```bash
+bmad tea *ci
+```
+
+The TEA agent runs this workflow when:
+
+- Test framework is configured and tests pass locally
+- Team is ready to enable continuous integration
+- Existing CI pipeline needs optimization or modernization
+- Burn-in loop is needed for flaky test detection
+
+## Inputs
+
+**Required Context Files:**
+
+- **Framework config** (playwright.config.ts, cypress.config.ts): Determines test commands and configuration
+- **package.json**: Dependencies and scripts for caching strategy
+- **.nvmrc**: Node version for CI (optional, defaults to Node 20 LTS)
+
+**Optional Context Files:**
+
+- **Existing CI config**: To update rather than create new
+- **.git/config**: For CI platform auto-detection
+
+**Workflow Variables:**
+
+- `ci_platform`: Auto-detected (github-actions/gitlab-ci/circle-ci) or explicit
+- `test_framework`: Detected from framework config (playwright/cypress)
+- `parallel_jobs`: Number of parallel shards (default: 4)
+- `burn_in_enabled`: Enable burn-in loop (default: true)
+- `burn_in_iterations`: Burn-in iterations (default: 10)
+- `selective_testing_enabled`: Run only changed tests (default: true)
+- `artifact_retention_days`: Artifact storage duration (default: 30)
+- `cache_enabled`: Enable dependency caching (default: true)
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - Platform-specific optimizations and best practices
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (<2 min)
+   - **Test**: Parallel execution with 4 shards (<10 min per shard)
+   - **Burn-In**: Flaky test detection with 10 iterations (<30 min)
+   - **Report**: Aggregate results and publish artifacts
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh`: Selective testing (run only affected tests)
+   - `scripts/ci-local.sh`: Local CI mirror for debugging
+   - `scripts/burn-in.sh`: Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md`: Pipeline guide, debugging, secrets setup
+   - `docs/ci-secrets-checklist.md`: Required secrets and configuration
+   - Inline comments in CI configuration files
+
+5. **Optimization Features**
+   - Dependency caching (npm + browser binaries): 2-5 min savings
+   - Parallel sharding: 75% time reduction
+   - Retry logic: Handles transient failures (2 retries)
+   - Failure-only artifacts: Cost-effective debugging
+
+**Performance Targets:**
+
+- Lint: <2 minutes
+- Test (per shard): <10 minutes
+- Burn-in: <30 minutes
+- **Total: <45 minutes** (20× faster than sequential)
+
+**Validation Safeguards:**
+
+- ✅ Git repository initialized
+- ✅ Local tests pass before CI setup
+- ✅ Framework configuration exists
+- ✅ CI platform accessible
+
+## Key Features
+
+### Burn-In Loop for Flaky Test Detection
+
+**Critical production pattern:**
+
+```yaml
+burn-in:
+  runs-on: ubuntu-latest
+  steps:
+    - run: |
+        for i in {1..10}; do
+          echo "🔥 Burn-in iteration $i/10"
+          npm run test:e2e || exit 1
+        done
+```
+
+**Purpose**: Runs tests 10 times to catch non-deterministic failures before they reach main branch.
+
+**When to run:**
+
+- On PRs to main/develop
+- Weekly on cron schedule
+- After test infrastructure changes
+
+**Failure threshold**: Even ONE failure → tests are flaky, must fix before merging.
+
+### Parallel Sharding
+
+**Splits tests across 4 jobs:**
+
+```yaml
+strategy:
+  matrix:
+    shard: [1, 2, 3, 4]
+steps:
+  - run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+```
+
+**Benefits:**
+
+- 75% time reduction (40 min → 10 min per shard)
+- Faster feedback on PRs
+- Configurable shard count
+
+### Smart Caching
+
+**Node modules + browser binaries:**
+
+```yaml
+- uses: actions/cache@v4
+  with:
+    path: ~/.npm
+    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+```
+
+**Benefits:**
+
+- 2-5 min savings per run
+- Consistent across builds
+- Automatic invalidation on dependency changes
+
+### Selective Testing
+
+**Run only tests affected by code changes:**
+
+```bash
+# scripts/test-changed.sh
+CHANGED_FILES=$(git diff --name-only HEAD~1)
+npm run test:e2e -- --grep="$AFFECTED_TESTS"
+```
+
+**Benefits:**
+
+- 50-80% time reduction for focused PRs
+- Faster feedback cycle
+- Full suite still runs on main branch
+
+### Failure-Only Artifacts
+
+**Upload debugging materials only on test failures:**
+
+- Traces (Playwright): 5-10 MB per test
+- Screenshots: 100-500 KB each
+- Videos: 2-5 MB per test
+- HTML reports: 1-2 MB
+
+**Benefits:**
+
+- Reduces storage costs by 90%
+- Maintains full debugging capability
+- 30-day retention default
+
+### Local CI Mirror
+
+**Debug CI failures locally:**
+
+```bash
+./scripts/ci-local.sh
+# Runs: lint → test → burn-in (3 iterations)
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same commands
+- Reduced burn-in (3 vs 10 for faster feedback)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `ci-burn-in.md` - Burn-in loop patterns and iterations
+- `selective-testing.md` - Changed test detection strategies
+- `visual-debugging.md` - Artifact collection best practices
+- `test-quality.md` - CI-specific quality criteria
+
+## Integration with Other Workflows
+
+**Before ci:**
+
+- **framework**: Sets up test infrastructure and configuration
+- **test-design** (optional): Plans test coverage strategy
+
+**After ci:**
+
+- **atdd**: Generate failing tests that run in CI
+- **automate**: Expand test coverage that CI executes
+- **gate**: Use CI results for quality gate decisions
+
+**Coordinates with:**
+
+- **dev-story**: Tests run in CI after story implementation
+- **retrospective**: CI metrics inform process improvements
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds CI setup to Quality & Testing Progress section
+
+## Important Notes
+
+### CI Platform Auto-Detection
+
+**GitHub Actions** (default):
+
+- Auto-selected if `github.com` in git remote
+- Free 2000 min/month for private repos
+- Unlimited for public repos
+- `.github/workflows/test.yml`
+
+**GitLab CI**:
+
+- Auto-selected if `gitlab.com` in git remote
+- Free 400 min/month
+- `.gitlab-ci.yml`
+
+**Circle CI** / **Jenkins**:
+
+- User must specify explicitly
+- Templates provided for both
+
+### Burn-In Strategy
+
+**Iterations:**
+
+- **3**: Quick feedback (local development)
+- **10**: Standard (PR checks) ← recommended
+- **100**: High-confidence (release branches)
+
+**When to run:**
+
+- ✅ On PRs to main/develop
+- ✅ Weekly scheduled (cron)
+- ✅ After test infra changes
+- ❌ Not on every commit (too slow)
+
+**Cost-benefit:**
+
+- 30 minutes of CI time → Prevents hours of debugging flaky tests
+
+### Artifact Collection Strategy
+
+**Failure-only collection:**
+
+- Saves 90% storage costs
+- Maintains debugging capability
+- Automatic cleanup after retention period
+
+**What to collect:**
+
+- Traces: Full execution context (Playwright)
+- Screenshots: Visual evidence
+- Videos: Interaction playback
+- HTML reports: Detailed results
+- Console logs: Error messages
+
+**What NOT to collect:**
+
+- Passing test artifacts (waste of space)
+- Large binaries
+- Sensitive data (use secrets instead)
+
+### Selective Testing Trade-offs
+
+**Benefits:**
+
+- 50-80% time reduction for focused changes
+- Faster feedback loop
+- Lower CI costs
+
+**Risks:**
+
+- May miss integration issues
+- Relies on accurate change detection
+- False positives if detection is too aggressive
+
+**Mitigation:**
+
+- Always run full suite on merge to main
+- Use burn-in loop on main branch
+- Monitor for missed issues
+
+### Parallelism Configuration
+
+**4 shards** (default):
+
+- Optimal for 40-80 test files
+- ~10 min per shard
+- Balances speed vs resource usage
+
+**Adjust if:**
+
+- Tests complete in <5 min → reduce shards
+- Tests take >15 min → increase shards
+- CI limits concurrent jobs → reduce shards
+
+**Formula:**
+
+```
+Total test time / Target shard time = Optimal shards
+Example: 40 min / 10 min = 4 shards
+```
+
+### Retry Logic
+
+**2 retries** (default):
+
+- Handles transient network issues
+- Mitigates race conditions
+- Does NOT mask flaky tests (burn-in catches those)
+
+**When retries trigger:**
+
+- Network timeouts
+- Service unavailability
+- Resource constraints
+
+**When retries DON'T help:**
+
+- Assertion failures (logic errors)
+- Flaky tests (non-deterministic)
+- Configuration errors
+
+### Notification Setup (Optional)
+
+**Supported channels:**
+
+- Slack: Webhook integration
+- Email: SMTP configuration
+- Discord: Webhook integration
+
+**Configuration:**
+
+```yaml
+notify_on_failure: true
+notification_channels: 'slack'
+# Requires SLACK_WEBHOOK secret in CI settings
+```
+
+**Best practice:** Enable for main/develop branches only, not PRs.
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+- [ ] First CI run triggered and passes
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New GitHub Actions setup**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - GitHub repository (github.com in git remote)
+# - Playwright framework
+# - Node 20 from .nvmrc
+# - 60 test files
+
+# TEA scaffolds:
+# - .github/workflows/test.yml
+# - 4-shard parallel execution
+# - Burn-in loop (10 iterations)
+# - Dependency + browser caching
+# - Failure artifacts (traces, screenshots)
+# - Helper scripts
+# - Documentation
+
+# Result:
+# Total CI time: 42 minutes (was 8 hours sequential)
+# - Lint: 1.5 min
+# - Test (4 shards): 9 min each
+# - Burn-in: 28 min
+```
+
+**Scenario 2: Update existing GitLab CI**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - Existing .gitlab-ci.yml
+# - Cypress framework
+# - No caching configured
+
+# TEA asks: "Update existing CI or create new?"
+# User: "Update"
+
+# TEA enhances:
+# - Adds burn-in job
+# - Configures caching (cache: paths)
+# - Adds parallel: 4
+# - Updates artifact collection
+# - Documents secrets needed
+
+# Result:
+# CI time reduced from 45 min → 12 min
+```
+
+**Scenario 3: Standalone burn-in setup**
+
+```bash
+# User wants only burn-in, no full CI
+bmad tea *ci
+# Set burn_in_enabled: true, skip other stages
+
+# TEA creates:
+# - Minimal workflow with burn-in only
+# - scripts/burn-in.sh for local testing
+# - Documentation for running burn-in
+
+# Use case:
+# - Validate test stability before full CI setup
+# - Debug intermittent failures
+# - Confidence check before release
+```
+
+## Troubleshooting
+
+**Issue: "Git repository not found"**
+
+- **Cause**: No .git/ directory
+- **Solution**: Run `git init` and `git remote add origin <url>`
+
+**Issue: "Tests fail locally but should set up CI anyway"**
+
+- **Cause**: Workflow halts if local tests fail
+- **Solution**: Fix tests first, or temporarily skip preflight (not recommended)
+
+**Issue: "CI takes longer than 10 min per shard"**
+
+- **Cause**: Too many tests per shard
+- **Solution**: Increase shard count (e.g., 4 → 8)
+
+**Issue: "Burn-in passes locally but fails in CI"**
+
+- **Cause**: Environment differences (timing, resources)
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue: "Caching not working"**
+
+- **Cause**: Cache key mismatch or cache limit exceeded
+- **Solution**: Check cache key formula, verify platform limits
+
+## Related Workflows
+
+- **framework**: Set up test infrastructure → [framework/README.md](../framework/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand test coverage → [automate/README.md](../automate/README.md)
+- **gate**: Quality gate decisions → [gate/README.md](../gate/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, burn-in loop integration
+- **v3.x**: XML format instructions, basic CI setup
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/ci/checklist.md
+++ b/src/modules/bmm/workflows/testarch/ci/checklist.md
@@ -0,0 +1,246 @@
+# CI/CD Pipeline Setup - Validation Checklist
+
+## Prerequisites
+
+- [ ] Git repository initialized (`.git/` exists)
+- [ ] Git remote configured (`git remote -v` shows origin)
+- [ ] Test framework configured (playwright.config._ or cypress.config._)
+- [ ] Local tests pass (`npm run test:e2e` succeeds)
+- [ ] Team agrees on CI platform
+- [ ] Access to CI platform settings (if updating)
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] Git repository validated
+- [ ] Framework configuration detected
+- [ ] Local test execution successful
+- [ ] CI platform detected or selected
+- [ ] Node version identified (.nvmrc or default)
+- [ ] No blocking issues found
+
+### Step 2: CI Pipeline Configuration
+
+- [ ] CI configuration file created (`.github/workflows/test.yml` or `.gitlab-ci.yml`)
+- [ ] File is syntactically valid (no YAML errors)
+- [ ] Correct framework commands configured
+- [ ] Node version matches project
+- [ ] Test directory paths correct
+
+### Step 3: Parallel Sharding
+
+- [ ] Matrix strategy configured (4 shards default)
+- [ ] Shard syntax correct for framework
+- [ ] fail-fast set to false
+- [ ] Shard count appropriate for test suite size
+
+### Step 4: Burn-In Loop
+
+- [ ] Burn-in job created
+- [ ] 10 iterations configured
+- [ ] Proper exit on failure (`|| exit 1`)
+- [ ] Runs on appropriate triggers (PR, cron)
+- [ ] Failure artifacts uploaded
+
+### Step 5: Caching Configuration
+
+- [ ] Dependency cache configured (npm/yarn)
+- [ ] Cache key uses lockfile hash
+- [ ] Browser cache configured (Playwright/Cypress)
+- [ ] Restore-keys defined for fallback
+- [ ] Cache paths correct for platform
+
+### Step 6: Artifact Collection
+
+- [ ] Artifacts upload on failure only
+- [ ] Correct artifact paths (test-results/, traces/, etc.)
+- [ ] Retention days set (30 default)
+- [ ] Artifact names unique per shard
+- [ ] No sensitive data in artifacts
+
+### Step 7: Retry Logic
+
+- [ ] Retry action/strategy configured
+- [ ] Max attempts: 2-3
+- [ ] Timeout appropriate (30 min)
+- [ ] Retry only on transient errors
+
+### Step 8: Helper Scripts
+
+- [ ] `scripts/test-changed.sh` created
+- [ ] `scripts/ci-local.sh` created
+- [ ] `scripts/burn-in.sh` created (optional)
+- [ ] Scripts are executable (`chmod +x`)
+- [ ] Scripts use correct test commands
+- [ ] Shebang present (`#!/bin/bash`)
+
+### Step 9: Documentation
+
+- [ ] `docs/ci.md` created with pipeline guide
+- [ ] `docs/ci-secrets-checklist.md` created
+- [ ] Required secrets documented
+- [ ] Setup instructions clear
+- [ ] Troubleshooting section included
+- [ ] Badge URLs provided (optional)
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] CI file loads without errors
+- [ ] All paths resolve correctly
+- [ ] No hardcoded values (use env vars)
+- [ ] Triggers configured (push, pull_request, schedule)
+- [ ] Platform-specific syntax correct
+
+### Execution Validation
+
+- [ ] First CI run triggered (push to remote)
+- [ ] Pipeline starts without errors
+- [ ] All jobs appear in CI dashboard
+- [ ] Caching works (check logs for cache hit)
+- [ ] Tests execute in parallel
+- [ ] Artifacts collected on failure
+
+### Performance Validation
+
+- [ ] Lint stage: <2 minutes
+- [ ] Test stage (per shard): <10 minutes
+- [ ] Burn-in stage: <30 minutes
+- [ ] Total pipeline: <45 minutes
+- [ ] Cache reduces install time by 2-5 minutes
+
+## Quality Checks
+
+### Best Practices Compliance
+
+- [ ] Burn-in loop follows production patterns
+- [ ] Parallel sharding configured optimally
+- [ ] Failure-only artifact collection
+- [ ] Selective testing enabled (optional)
+- [ ] Retry logic handles transient failures only
+- [ ] No secrets in configuration files
+
+### Knowledge Base Alignment
+
+- [ ] Burn-in pattern matches `ci-burn-in.md`
+- [ ] Selective testing matches `selective-testing.md`
+- [ ] Artifact collection matches `visual-debugging.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in CI configuration
+- [ ] Secrets use platform secret management
+- [ ] Environment variables for sensitive data
+- [ ] Artifact retention appropriate (not too long)
+- [ ] No debug output exposing secrets
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] CI setup logged in Quality & Testing Progress section
+- [ ] Status updated with completion timestamp
+- [ ] Platform and configuration noted
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments loaded
+- [ ] Patterns applied from knowledge base
+- [ ] Documentation references knowledge base
+- [ ] Knowledge base references in README
+
+### Workflow Dependencies
+
+- [ ] `framework` workflow completed first
+- [ ] Can proceed to `atdd` workflow after CI setup
+- [ ] Can proceed to `automate` workflow
+- [ ] CI integrates with `gate` workflow
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] First CI run successful
+- [ ] Performance targets met
+- [ ] Documentation complete
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Commit CI configuration
+2. [ ] Push to remote repository
+3. [ ] Configure required secrets in CI platform
+4. [ ] Open PR to trigger first CI run
+5. [ ] Monitor and verify pipeline execution
+6. [ ] Adjust parallelism if needed (based on actual run times)
+7. [ ] Set up notifications (optional)
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for test generation
+2. [ ] Run `automate` workflow for coverage expansion
+3. [ ] Run `gate` workflow for quality gates
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete CI configuration file
+2. [ ] Remove helper scripts directory
+3. [ ] Remove documentation (docs/ci.md, etc.)
+4. [ ] Clear CI platform secrets (if added)
+5. [ ] Review error logs
+6. [ ] Fix issues and retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: CI file syntax errors
+
+- **Solution**: Validate YAML syntax online or with linter
+
+**Issue**: Tests fail in CI but pass locally
+
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue**: Caching not working
+
+- **Solution**: Check cache key formula, verify paths
+
+**Issue**: Burn-in too slow
+
+- **Solution**: Reduce iterations or run on cron only
+
+### Platform-Specific
+
+**GitHub Actions:**
+
+- Secrets: Repository Settings → Secrets and variables → Actions
+- Runners: Ubuntu latest recommended
+- Concurrency limits: 20 jobs for free tier
+
+**GitLab CI:**
+
+- Variables: Project Settings → CI/CD → Variables
+- Runners: Shared or project-specific
+- Pipeline quota: 400 minutes/month free tier
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Platform:** **\*\***\_\_\_**\*\*** (GitHub Actions / GitLab CI)
+**Notes:** **********\*\***********\_\_\_**********\*\***********
--- a/src/modules/bmm/workflows/testarch/ci/github-actions-template.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/github-actions-template.yaml
@@ -0,0 +1,165 @@
+# GitHub Actions CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+name: Test Pipeline
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+  schedule:
+    # Weekly burn-in on Sundays at 2 AM UTC
+    - cron: "0 2 * * 0"
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  # Lint stage - Code quality checks
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run linter
+        run: npm run lint
+
+  # Test stage - Parallel execution with sharding
+  test:
+    name: Test (Shard ${{ matrix.shard }})
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    needs: lint
+
+    strategy:
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+          restore-keys: |
+            ${{ runner.os }}-playwright-
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run tests (shard ${{ matrix.shard }}/4)
+        run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-${{ matrix.shard }}
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Burn-in stage - Flaky test detection
+  burn-in:
+    name: Burn-In (Flaky Detection)
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    needs: test
+    # Only run burn-in on PRs to main/develop or on schedule
+    if: github.event_name == 'pull_request' || github.event_name == 'schedule'
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run burn-in loop (10 iterations)
+        run: |
+          echo "🔥 Starting burn-in loop - detecting flaky tests"
+          for i in {1..10}; do
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            echo "🔥 Burn-in iteration $i/10"
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            npm run test:e2e || exit 1
+          done
+          echo "✅ Burn-in complete - no flaky tests detected"
+
+      - name: Upload burn-in failure artifacts
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: burn-in-failures
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Report stage - Aggregate and publish results
+  report:
+    name: Test Report
+    runs-on: ubuntu-latest
+    needs: [test, burn-in]
+    if: always()
+
+    steps:
+      - name: Download all artifacts
+        uses: actions/download-artifact@v4
+        with:
+          path: artifacts
+
+      - name: Generate summary
+        run: |
+          echo "## Test Execution Summary" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "- **Status**: ${{ needs.test.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Burn-in**: ${{ needs.burn-in.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Shards**: 4" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+
+          if [ "${{ needs.burn-in.result }}" == "failure" ]; then
+            echo "⚠️ **Flaky tests detected** - Review burn-in artifacts" >> $GITHUB_STEP_SUMMARY
+          fi
--- a/src/modules/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
@@ -0,0 +1,128 @@
+# GitLab CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+stages:
+  - lint
+  - test
+  - burn-in
+  - report
+
+variables:
+  # Disable git depth for accurate change detection
+  GIT_DEPTH: 0
+  # Use npm ci for faster, deterministic installs
+  npm_config_cache: "$CI_PROJECT_DIR/.npm"
+  # Playwright browser cache
+  PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.cache/ms-playwright"
+
+# Caching configuration
+cache:
+  key:
+    files:
+      - package-lock.json
+  paths:
+    - .npm/
+    - .cache/ms-playwright/
+    - node_modules/
+
+# Lint stage - Code quality checks
+lint:
+  stage: lint
+  image: node:20
+  script:
+    - npm ci
+    - npm run lint
+  timeout: 5 minutes
+
+# Test stage - Parallel execution with sharding
+.test-template: &test-template
+  stage: test
+  image: node:20
+  needs:
+    - lint
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 30 minutes
+
+test:shard-1:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=1/4
+
+test:shard-2:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=2/4
+
+test:shard-3:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=3/4
+
+test:shard-4:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=4/4
+
+# Burn-in stage - Flaky test detection
+burn-in:
+  stage: burn-in
+  image: node:20
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+  # Only run burn-in on merge requests to main/develop or on schedule
+  rules:
+    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
+    - if: '$CI_PIPELINE_SOURCE == "schedule"'
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  script:
+    - |
+      echo "🔥 Starting burn-in loop - detecting flaky tests"
+      for i in {1..10}; do
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        echo "🔥 Burn-in iteration $i/10"
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        npm run test:e2e || exit 1
+      done
+      echo "✅ Burn-in complete - no flaky tests detected"
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 60 minutes
+
+# Report stage - Aggregate results
+report:
+  stage: report
+  image: alpine:latest
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+    - burn-in
+  when: always
+  script:
+    - |
+      echo "## Test Execution Summary"
+      echo ""
+      echo "- Pipeline: $CI_PIPELINE_ID"
+      echo "- Shards: 4"
+      echo "- Branch: $CI_COMMIT_REF_NAME"
+      echo ""
+      echo "View detailed results in job artifacts"
--- a/src/modules/bmm/workflows/testarch/ci/instructions.md
+++ b/src/modules/bmm/workflows/testarch/ci/instructions.md
@@ -1,43 +1,516 @@
 <!-- Powered by BMAD-CORE™ -->

-# CI/CD Enablement v3.0
+# CI/CD Pipeline Setup

-```xml
-<task id="bmad/bmm/testarch/ci" name="CI/CD Enablement">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Git repository is initialized.</i>
-    <i>- Local test suite passes.</i>
-    <i>- Team agrees on target environments.</i>
-    <i>- Access to CI platform settings/secrets is available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm all items above; halt if prerequisites are unmet.</action>
-    </step>
-    <step n="2" title="Configure Pipeline">
-      <action>Detect CI platform (default GitHub Actions; ask about GitLab/CircleCI/etc.).</action>
-      <action>Scaffold workflow (e.g., `.github/workflows/test.yml`) with appropriate triggers and caching (Node version from `.nvmrc`, browsers).</action>
-      <action>Stage jobs sequentially (lint → unit → component → e2e) with matrix parallelization (shard by file, not test).</action>
-      <action>Add selective execution script(s) for affected tests plus burn-in job rerunning changed specs 3x to catch flakiness.</action>
-      <action>Attach artifacts on failure (traces/videos/HAR) and configure retries/backoff/concurrency controls.</action>
-      <action>Document required secrets/environment variables and wire Slack/email notifications; provide local mirror script.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce workflow file(s), helper scripts (`test-changed`, burn-in), README/ci.md updates, secrets checklist, and any dashboard/badge configuration.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If git repo is absent, tests fail, or CI platform is unspecified, halt and request setup.</i>
-  </halt>
-  <notes>
-    <i>Use `{project-root}/bmad/bmm/testarch/tea-index.csv` to load CI-focused fragments (ci-burn-in, selective-testing, visual-debugging) before finalising recommendations.</i>
-    <i>Target ~20× speedups via parallel shards and caching; keep jobs under 10 minutes.</i>
-    <i>Use `wait-on-timeout` ≈120s for app startup; ensure local `npm test` mirrors CI run.</i>
-    <i>Mention alternative platform paths when not on GitHub.</i>
-  </notes>
-  <output>
-    <i>CI pipeline configuration and guidance ready for team adoption.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/ci`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, artifact collection, and notification configuration. This workflow creates platform-specific CI configuration optimized for fast feedback and reliable test execution.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Git repository is initialized (`.git/` directory exists)
+- ✅ Local test suite passes (`npm run test:e2e` succeeds)
+- ✅ Test framework is configured (from `framework` workflow)
+- ✅ Team agrees on target CI platform (GitHub Actions, GitLab CI, Circle CI, etc.)
+- ✅ Access to CI platform settings/secrets available (if updating existing pipeline)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Verify Git Repository**
+   - Check for `.git/` directory
+   - Confirm remote repository configured (`git remote -v`)
+   - If not initialized, HALT with message: "Git repository required for CI/CD setup"
+
+2. **Validate Test Framework**
+   - Look for `playwright.config.*` or `cypress.config.*`
+   - Read framework configuration to extract:
+     - Test directory location
+     - Test command
+     - Reporter configuration
+     - Timeout settings
+   - If not found, HALT with message: "Run `framework` workflow first to set up test infrastructure"
+
+3. **Run Local Tests**
+   - Execute `npm run test:e2e` (or equivalent from package.json)
+   - Ensure tests pass before CI setup
+   - If tests fail, HALT with message: "Fix failing tests before setting up CI/CD"
+
+4. **Detect CI Platform**
+   - Check for existing CI configuration:
+     - `.github/workflows/*.yml` (GitHub Actions)
+     - `.gitlab-ci.yml` (GitLab CI)
+     - `.circleci/config.yml` (Circle CI)
+     - `Jenkinsfile` (Jenkins)
+   - If found, ask user: "Update existing CI configuration or create new?"
+   - If not found, detect platform from git remote:
+     - `github.com` → GitHub Actions (default)
+     - `gitlab.com` → GitLab CI
+     - Ask user if unable to auto-detect
+
+5. **Read Environment Configuration**
+   - Check for `.nvmrc` to determine Node version
+   - Default to Node 20 LTS if not found
+   - Read `package.json` to identify dependencies (affects caching strategy)
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold CI Pipeline
+
+### Actions
+
+1. **Select CI Platform Template**
+
+   Based on detection or user preference, use the appropriate template:
+
+   **GitHub Actions** (`.github/workflows/test.yml`):
+   - Most common platform
+   - Excellent caching and matrix support
+   - Free for public repos, generous free tier for private
+
+   **GitLab CI** (`.gitlab-ci.yml`):
+   - Integrated with GitLab
+   - Built-in registry and runners
+   - Powerful pipeline features
+
+   **Circle CI** (`.circleci/config.yml`):
+   - Fast execution with parallelism
+   - Docker-first approach
+   - Enterprise features
+
+   **Jenkins** (`Jenkinsfile`):
+   - Self-hosted option
+   - Maximum customization
+   - Requires infrastructure management
+
+2. **Generate Pipeline Configuration**
+
+   Use templates from `{installed_path}/` directory:
+   - `github-actions-template.yml`
+   - `gitlab-ci-template.yml`
+
+   **Key pipeline stages:**
+
+   ```yaml
+   stages:
+     - lint # Code quality checks
+     - test # Test execution (parallel shards)
+     - burn-in # Flaky test detection
+     - report # Aggregate results and publish
+   ```
+
+3. **Configure Test Execution**
+
+   **Parallel Sharding:**
+
+   ```yaml
+   strategy:
+     fail-fast: false
+     matrix:
+       shard: [1, 2, 3, 4]
+
+   steps:
+     - name: Run tests
+       run: npm run test:e2e -- --shard=${{ matrix.shard }}/${{ strategy.job-total }}
+   ```
+
+   **Purpose:** Splits tests into N parallel jobs for faster execution (target: <10 min per shard)
+
+4. **Add Burn-In Loop**
+
+   **Critical pattern from production systems:**
+
+   ```yaml
+   burn-in:
+     name: Flaky Test Detection
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@v4
+
+       - name: Setup Node
+         uses: actions/setup-node@v4
+         with:
+           node-version-file: '.nvmrc'
+
+       - name: Install dependencies
+         run: npm ci
+
+       - name: Run burn-in loop (10 iterations)
+         run: |
+           for i in {1..10}; do
+             echo "🔥 Burn-in iteration $i/10"
+             npm run test:e2e || exit 1
+           done
+
+       - name: Upload failure artifacts
+         if: failure()
+         uses: actions/upload-artifact@v4
+         with:
+           name: burn-in-failures
+           path: test-results/
+           retention-days: 30
+   ```
+
+   **Purpose:** Runs tests multiple times to catch non-deterministic failures before they reach main branch.
+
+   **When to run:**
+   - On pull requests to main/develop
+   - Weekly on cron schedule
+   - After significant test infrastructure changes
+
+5. **Configure Caching**
+
+   **Node modules cache:**
+
+   ```yaml
+   - name: Cache dependencies
+     uses: actions/cache@v4
+     with:
+       path: ~/.npm
+       key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+       restore-keys: |
+         ${{ runner.os }}-node-
+   ```
+
+   **Browser binaries cache (Playwright):**
+
+   ```yaml
+   - name: Cache Playwright browsers
+     uses: actions/cache@v4
+     with:
+       path: ~/.cache/ms-playwright
+       key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+   ```
+
+   **Purpose:** Reduces CI execution time by 2-5 minutes per run.
+
+6. **Configure Artifact Collection**
+
+   **Failure artifacts only:**
+
+   ```yaml
+   - name: Upload test results
+     if: failure()
+     uses: actions/upload-artifact@v4
+     with:
+       name: test-results-${{ matrix.shard }}
+       path: |
+         test-results/
+         playwright-report/
+       retention-days: 30
+   ```
+
+   **Artifacts to collect:**
+   - Traces (Playwright) - full debugging context
+   - Screenshots - visual evidence of failures
+   - Videos - interaction playback
+   - HTML reports - detailed test results
+   - Console logs - error messages and warnings
+
+7. **Add Retry Logic**
+
+   ```yaml
+   - name: Run tests with retries
+     uses: nick-invision/retry@v2
+     with:
+       timeout_minutes: 30
+       max_attempts: 3
+       retry_on: error
+       command: npm run test:e2e
+   ```
+
+   **Purpose:** Handles transient failures (network issues, race conditions)
+
+8. **Configure Notifications** (Optional)
+
+   If `notify_on_failure` is enabled:
+
+   ```yaml
+   - name: Notify on failure
+     if: failure()
+     uses: 8398a7/action-slack@v3
+     with:
+       status: ${{ job.status }}
+       text: 'Test failures detected in PR #${{ github.event.pull_request.number }}'
+       webhook_url: ${{ secrets.SLACK_WEBHOOK }}
+   ```
+
+9. **Generate Helper Scripts**
+
+   **Selective testing script** (`scripts/test-changed.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Run only tests for changed files
+
+   CHANGED_FILES=$(git diff --name-only HEAD~1)
+
+   if echo "$CHANGED_FILES" | grep -q "src/.*\.ts$"; then
+     echo "Running affected tests..."
+     npm run test:e2e -- --grep="$(echo $CHANGED_FILES | sed 's/src\///g' | sed 's/\.ts//g')"
+   else
+     echo "No test-affecting changes detected"
+   fi
+   ```
+
+   **Local mirror script** (`scripts/ci-local.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Mirror CI execution locally for debugging
+
+   echo "🔍 Running CI pipeline locally..."
+
+   # Lint
+   npm run lint || exit 1
+
+   # Tests
+   npm run test:e2e || exit 1
+
+   # Burn-in (reduced iterations)
+   for i in {1..3}; do
+     echo "🔥 Burn-in $i/3"
+     npm run test:e2e || exit 1
+   done
+
+   echo "✅ Local CI pipeline passed"
+   ```
+
+10. **Generate Documentation**
+
+    **CI README** (`docs/ci.md`):
+    - Pipeline stages and purpose
+    - How to run locally
+    - Debugging failed CI runs
+    - Secrets and environment variables needed
+    - Notification setup
+    - Badge URLs for README
+
+    **Secrets checklist** (`docs/ci-secrets-checklist.md`):
+    - Required secrets list (SLACK_WEBHOOK, etc.)
+    - Where to configure in CI platform
+    - Security best practices
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - `.circleci/config.yml` (Circle CI)
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (ESLint, Prettier)
+   - **Test**: Parallel test execution (4 shards)
+   - **Burn-in**: Flaky test detection (10 iterations)
+   - **Report**: Result aggregation and publishing
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh` - Selective testing
+   - `scripts/ci-local.sh` - Local CI mirror
+   - `scripts/burn-in.sh` - Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md` - CI pipeline guide
+   - `docs/ci-secrets-checklist.md` - Required secrets
+   - Inline comments in CI configuration
+
+5. **Optimization Features**
+   - Dependency caching (npm, browser binaries)
+   - Parallel sharding (4 jobs default)
+   - Retry logic (2 retries on failure)
+   - Failure-only artifact upload
+
+### Performance Targets
+
+- **Lint stage**: <2 minutes
+- **Test stage** (per shard): <10 minutes
+- **Burn-in stage**: <30 minutes (10 iterations)
+- **Total pipeline**: <45 minutes
+
+**Speedup:** 20× faster than sequential execution through parallelism and caching.
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `ci-burn-in.md` - Burn-in loop patterns and configuration
+- `selective-testing.md` - Changed test detection strategies
+- `visual-debugging.md` - Artifact collection best practices
+- `test-quality.md` - CI-specific test quality criteria
+
+### CI Platform-Specific Guidance
+
+**GitHub Actions:**
+
+- Use `actions/cache` for caching
+- Matrix strategy for parallelism
+- Secrets in repository settings
+- Free 2000 minutes/month for private repos
+
+**GitLab CI:**
+
+- Use `.gitlab-ci.yml` in root
+- `cache:` directive for caching
+- Parallel execution with `parallel: 4`
+- Variables in project CI/CD settings
+
+**Circle CI:**
+
+- Use `.circleci/config.yml`
+- Docker executors recommended
+- Parallelism with `parallelism: 4`
+- Context for shared secrets
+
+### Burn-In Loop Strategy
+
+**When to run:**
+
+- ✅ On PRs to main/develop branches
+- ✅ Weekly on schedule (cron)
+- ✅ After test infrastructure changes
+- ❌ Not on every commit (too slow)
+
+**Iterations:**
+
+- **10 iterations** for thorough detection
+- **3 iterations** for quick feedback
+- **100 iterations** for high-confidence stability
+
+**Failure threshold:**
+
+- Even ONE failure in burn-in → tests are flaky
+- Must fix before merging
+
+### Artifact Retention
+
+**Failure artifacts only:**
+
+- Saves storage costs
+- Maintains debugging capability
+- 30-day retention default
+
+**Artifact types:**
+
+- Traces (Playwright) - 5-10 MB per test
+- Screenshots - 100-500 KB per screenshot
+- Videos - 2-5 MB per test
+- HTML reports - 1-2 MB per run
+
+### Selective Testing
+
+**Detect changed files:**
+
+```bash
+git diff --name-only HEAD~1
 ```
+
+**Run affected tests only:**
+
+- Faster feedback for small changes
+- Full suite still runs on main branch
+- Reduces CI time by 50-80% for focused PRs
+
+**Trade-off:**
+
+- May miss integration issues
+- Run full suite at least on merge
+
+### Local CI Mirror
+
+**Purpose:** Debug CI failures locally
+
+**Usage:**
+
+```bash
+./scripts/ci-local.sh
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same test command
+- Same stages (lint → test → burn-in)
+- Reduced burn-in iterations (3 vs 10)
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## CI/CD Pipeline Complete
+
+**Platform**: GitHub Actions (or GitLab CI, etc.)
+
+**Artifacts Created**:
+
+- ✅ Pipeline configuration: .github/workflows/test.yml
+- ✅ Burn-in loop: 10 iterations for flaky detection
+- ✅ Parallel sharding: 4 jobs for fast execution
+- ✅ Caching: Dependencies + browser binaries
+- ✅ Artifact collection: Failure-only traces/screenshots/videos
+- ✅ Helper scripts: test-changed.sh, ci-local.sh, burn-in.sh
+- ✅ Documentation: docs/ci.md, docs/ci-secrets-checklist.md
+
+**Performance:**
+
+- Lint: <2 min
+- Test (per shard): <10 min
+- Burn-in: <30 min
+- Total: <45 min (20× speedup vs sequential)
+
+**Next Steps**:
+
+1. Commit CI configuration: `git add .github/workflows/test.yml && git commit -m "ci: add test pipeline"`
+2. Push to remote: `git push`
+3. Configure required secrets in CI platform settings (see docs/ci-secrets-checklist.md)
+4. Open a PR to trigger first CI run
+5. Monitor pipeline execution and adjust parallelism if needed
+
+**Knowledge Base References Applied**:
+
+- Burn-in loop pattern (ci-burn-in.md)
+- Selective testing strategy (selective-testing.md)
+- Artifact collection (visual-debugging.md)
+- Test quality criteria (test-quality.md)
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable (`chmod +x`)
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/ci/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/ci/workflow.yaml
@@ -1,25 +1,89 @@
 # Test Architect workflow: ci
 name: testarch-ci
-description: "Scaffold or update the CI/CD quality pipeline."
+description: "Scaffold CI/CD quality pipeline with test execution, burn-in loops, and artifact collection"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/ci"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"

-template: false
+# Variables and inputs
+variables:
+  ci_platform: "auto" # auto, github-actions, gitlab-ci, circle-ci, jenkins
+  test_framework: "" # Detected from framework workflow (playwright, cypress)
+  test_dir: "{project-root}/tests"
+  config_file: "" # Framework config file path
+  node_version_source: "{project-root}/.nvmrc" # Node version for CI
+
+  # Execution configuration
+  parallel_jobs: 4 # Number of parallel test shards
+  burn_in_enabled: true # Enable burn-in loop for flaky test detection
+  burn_in_iterations: 10 # Number of burn-in iterations
+  selective_testing_enabled: true # Enable changed test detection
+
+  # Artifact configuration
+  artifact_retention_days: 30
+  upload_artifacts_on: "failure" # failure, always, never
+  artifact_types: "traces,screenshots,videos,html-report" # Comma-separated
+
+  # Performance tuning
+  cache_enabled: true # Enable dependency caching
+  browser_cache_enabled: true # Cache browser binaries
+  timeout_minutes: 60 # Overall job timeout
+  test_timeout_minutes: 30 # Individual test run timeout
+
+  # Notification configuration
+  notify_on_failure: false # Enable notifications (requires setup)
+  notification_channels: "" # slack, email, discord
+
+  # Output artifacts
+  generate_ci_readme: true
+  generate_local_mirror_script: true
+  generate_secrets_checklist: true
+
+  # CI-specific optimizations
+  use_matrix_strategy: true # Parallel execution across OS/browsers
+  use_sharding: true # Split tests into shards
+  retry_failed_tests: true
+  retry_count: 2
+
+# Output configuration
+default_output_file: "{project-root}/.github/workflows/test.yml" # GitHub Actions default
+
+# Required tools
+required_tools:
+  - read_file # Read .nvmrc, package.json, framework config
+  - write_file # Create CI config, scripts, documentation
+  - create_directory # Create .github/workflows/ or .gitlab-ci/ directories
+  - list_files # Detect existing CI configuration
+  - search_repo # Find test files for selective testing
+
+# Recommended inputs
+recommended_inputs:
+  - framework_config: "Framework configuration (playwright.config.ts, cypress.config.ts)"
+  - package_json: "Project dependencies and scripts"
+  - nvmrc: ".nvmrc for Node version (optional, defaults to LTS)"
+  - existing_ci: "Existing CI configuration to update (optional)"
+  - git_info: "Git repository information for platform detection"

 tags:
  - qa
  - ci-cd
  - test-architect
+  - pipeline
+  - automation

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts, auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/framework/README.md
+++ b/src/modules/bmm/workflows/testarch/framework/README.md
@@ -0,0 +1,340 @@
+# Test Framework Setup Workflow
+
+Initializes a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and industry best practices. This workflow scaffolds the complete testing infrastructure for modern web applications, providing a robust foundation for test automation.
+
+## Usage
+
+```bash
+bmad tea *framework
+```
+
+The TEA agent runs this workflow when:
+
+- Starting a new project that needs test infrastructure
+- Migrating from an older testing approach
+- Setting up testing from scratch
+- Standardizing test architecture across teams
+
+## Inputs
+
+**Required Context Files:**
+
+- **package.json**: Project dependencies and scripts to detect project type and bundler
+
+**Optional Context Files:**
+
+- **Architecture docs** (solution-architecture.md, tech-spec.md): Informs framework configuration decisions
+- **Existing tests**: Detects current framework to avoid conflicts
+
+**Workflow Variables:**
+
+- `test_framework`: Auto-detected (playwright/cypress) or manually specified
+- `project_type`: Auto-detected from package.json (react/vue/angular/next/node)
+- `bundler`: Auto-detected from package.json (vite/webpack/rollup/esbuild)
+- `test_dir`: Root test directory (default: `{project-root}/tests`)
+- `use_typescript`: Prefer TypeScript configuration (default: true)
+- `framework_preference`: Auto-detection or force specific framework (default: "auto")
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts` with production-ready settings
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+   - Failure-only artifacts (traces, screenshots, videos)
+
+2. **Directory Structure**
+
+   ```
+   tests/
+   ├── e2e/                          # Test files (organize as needed)
+   ├── support/                      # Framework infrastructure (key pattern)
+   │   ├── fixtures/                 # Test fixtures with auto-cleanup
+   │   │   ├── index.ts             # Fixture merging
+   │   │   └── factories/           # Data factories (faker-based)
+   │   ├── helpers/                 # Utility functions
+   │   └── page-objects/            # Page object models (optional)
+   └── README.md                    # Setup and usage guide
+   ```
+
+   **Note**: Test organization (e2e/, api/, integration/, etc.) is flexible. The **support/** folder contains reusable fixtures, helpers, and factories - the core framework pattern.
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`, auth credentials
+   - `.nvmrc` with Node version (LTS)
+
+4. **Test Infrastructure**
+   - Fixture architecture using `mergeTests` pattern
+   - Data factories with auto-cleanup (faker-based)
+   - Sample tests demonstrating best practices
+   - Helper utilities for common operations
+
+5. **Documentation**
+   - `tests/README.md` with comprehensive setup instructions
+   - Inline comments explaining configuration choices
+   - References to TEA knowledge base
+
+**Secondary Deliverables:**
+
+- Updated `package.json` with minimal test script (`test:e2e`)
+- Sample test demonstrating fixture usage
+- Network-first testing patterns
+- Selector strategy guidance (data-testid)
+
+**Validation Safeguards:**
+
+- ✅ No existing framework detected (prevents conflicts)
+- ✅ package.json exists and is valid
+- ✅ Framework auto-detection successful or explicit choice provided
+- ✅ Sample test runs successfully
+- ✅ All generated files are syntactically correct
+
+## Key Features
+
+### Smart Framework Selection
+
+- **Auto-detection logic** based on project characteristics:
+  - **Playwright** recommended for: Large repos (100+ files), performance-critical apps, multi-browser support, complex debugging needs
+  - **Cypress** recommended for: Small teams prioritizing DX, component testing focus, real-time test development
+- Falls back to Playwright as default if uncertain
+
+### Production-Ready Patterns
+
+- **Fixture Architecture**: Pure function → fixture → `mergeTests` composition pattern
+- **Auto-Cleanup**: Fixtures automatically clean up test data in teardown
+- **Network-First**: Route interception before navigation to prevent race conditions
+- **Failure-Only Artifacts**: Screenshots/videos/traces only captured on failure to reduce storage
+- **Parallel Execution**: Configured for optimal CI performance
+
+### Industry Best Practices
+
+- **Selector Strategy**: Prescriptive guidance on `data-testid` attributes
+- **Data Factories**: Faker-based factories for realistic test data
+- **Contract Testing**: Recommends Pact for microservices architectures
+- **Error Handling**: Comprehensive timeout and retry configuration
+- **Reporting**: Multiple reporter formats (HTML, JUnit, console)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `fixture-architecture.md` - Pure function → fixture → mergeTests pattern
+- `data-factories.md` - Faker-based factories with auto-cleanup
+- `network-first.md` - Network interception before navigation
+- `playwright-config.md` - Playwright-specific best practices
+- `test-config.md` - General configuration guidelines
+
+## Integration with Other Workflows
+
+**Before framework:**
+
+- **plan-project** (Phase 2): Determines project scope and testing needs
+- **workflow-status**: Verifies project readiness
+
+**After framework:**
+
+- **ci**: Scaffold CI/CD pipeline using framework configuration
+- **test-design**: Plan test coverage strategy for the project
+- **atdd**: Generate failing acceptance tests using the framework
+
+**Coordinates with:**
+
+- **solution-architecture** (Phase 3): Aligns test structure with system architecture
+- **tech-spec**: Uses technical specifications to inform test configuration
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds framework initialization to Quality & Testing Progress section
+
+## Important Notes
+
+### Preflight Checks
+
+**Critical requirements** verified before scaffolding:
+
+- package.json exists in project root
+- No modern E2E framework already configured
+- Architecture/stack context available
+
+If any check fails, workflow **HALTS** and notifies user.
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console logs)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend:**
+
+- `data-testid` attributes for UI elements (framework-agnostic)
+- `data-cy` attributes if Cypress is chosen (Cypress-specific)
+- Avoid brittle CSS selectors or XPath
+
+### Standalone Operation
+
+This workflow operates independently:
+
+- **No story required**: Can be run at project initialization
+- **No epic context needed**: Works for greenfield and brownfield projects
+- **Autonomous**: Auto-detects configuration and proceeds without user input
+
+### Output Summary Format
+
+After completion, provides structured summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: playwright.config.ts
+- ✅ Directory structure: tests/e2e/, tests/support/
+- ✅ Environment config: .env.example
+- ✅ Node version: .nvmrc
+- ✅ Fixture architecture: tests/support/fixtures/
+- ✅ Data factories: tests/support/fixtures/factories/
+- ✅ Sample tests: tests/e2e/example.spec.ts
+- ✅ Documentation: tests/README.md
+
+**Next Steps**:
+
+1. Copy .env.example to .env and fill in environment variables
+2. Run npm install to install test dependencies
+3. Run npm run test:e2e to execute sample tests
+4. Review tests/README.md for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
+```
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] Configuration file created and syntactically valid
+- [ ] Directory structure exists with all folders
+- [ ] Environment configuration generated (.env.example, .nvmrc)
+- [ ] Sample tests run successfully (npm run test:e2e)
+- [ ] Documentation complete and accurate (tests/README.md)
+- [ ] No errors or warnings during scaffold
+- [ ] package.json scripts updated correctly
+- [ ] Fixtures and factories follow patterns from knowledge base
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New React + Vite project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - React project (from package.json)
+# - Vite bundler
+# - No existing test framework
+# - 150+ files (recommends Playwright)
+
+# TEA scaffolds:
+# - playwright.config.ts with Vite detection
+# - Component testing configuration
+# - React Testing Library helpers
+# - Sample component + E2E tests
+```
+
+**Scenario 2: Existing Node.js API project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - Node.js backend (no frontend framework)
+# - Express framework
+# - Small project (50 files)
+# - API endpoints in routes/
+
+# TEA scaffolds:
+# - playwright.config.ts focused on API testing
+# - tests/api/ directory structure
+# - API helper utilities
+# - Sample API tests with auth
+```
+
+**Scenario 3: Cypress preferred (explicit)**
+
+```bash
+# User sets framework preference
+# (in workflow config: framework_preference: "cypress")
+
+bmad tea *framework
+
+# TEA scaffolds:
+# - cypress.config.ts
+# - tests/e2e/ with Cypress patterns
+# - Cypress-specific commands
+# - data-cy selector strategy
+```
+
+## Troubleshooting
+
+**Issue: "Existing test framework detected"**
+
+- **Cause**: playwright.config._ or cypress.config._ already exists
+- **Solution**: Use `upgrade-framework` workflow (TBD) or manually remove existing config
+
+**Issue: "Cannot detect project type"**
+
+- **Cause**: package.json missing or malformed
+- **Solution**: Ensure package.json exists and has valid dependencies
+
+**Issue: "Sample test fails to run"**
+
+- **Cause**: Missing dependencies or incorrect BASE_URL
+- **Solution**: Run `npm install` and configure `.env` with correct URLs
+
+**Issue: "TypeScript compilation errors"**
+
+- **Cause**: Missing @types packages or tsconfig misconfiguration
+- **Solution**: Ensure TypeScript and type definitions are installed
+
+## Related Workflows
+
+- **ci**: Scaffold CI/CD pipeline → [ci/README.md](../ci/README.md)
+- **test-design**: Plan test coverage → [test-design/README.md](../test-design/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression suite → [automate/README.md](../automate/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, comprehensive README
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/framework/checklist.md
+++ b/src/modules/bmm/workflows/testarch/framework/checklist.md
@@ -0,0 +1,321 @@
+# Test Framework Setup - Validation Checklist
+
+This checklist ensures the framework workflow completes successfully and all deliverables meet quality standards.
+
+---
+
+## Prerequisites
+
+Before starting the workflow:
+
+- [ ] Project root contains valid `package.json`
+- [ ] No existing modern E2E framework detected (`playwright.config.*`, `cypress.config.*`)
+- [ ] Project type identifiable (React, Vue, Angular, Next.js, Node, etc.)
+- [ ] Bundler identifiable (Vite, Webpack, Rollup, esbuild) or not applicable
+- [ ] User has write permissions to create directories and files
+
+---
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] package.json successfully read and parsed
+- [ ] Project type extracted correctly
+- [ ] Bundler identified (or marked as N/A for backend projects)
+- [ ] No framework conflicts detected
+- [ ] Architecture documents located (if available)
+
+### Step 2: Framework Selection
+
+- [ ] Framework auto-detection logic executed
+- [ ] Framework choice justified (Playwright vs Cypress)
+- [ ] Framework preference respected (if explicitly set)
+- [ ] User notified of framework selection and rationale
+
+### Step 3: Directory Structure
+
+- [ ] `tests/` root directory created
+- [ ] `tests/e2e/` directory created (or user's preferred structure)
+- [ ] `tests/support/` directory created (critical pattern)
+- [ ] `tests/support/fixtures/` directory created
+- [ ] `tests/support/fixtures/factories/` directory created
+- [ ] `tests/support/helpers/` directory created
+- [ ] `tests/support/page-objects/` directory created (if applicable)
+- [ ] All directories have correct permissions
+
+**Note**: Test organization is flexible (e2e/, api/, integration/). The **support/** folder is the key pattern.
+
+### Step 4: Configuration Files
+
+- [ ] Framework config file created (`playwright.config.ts` or `cypress.config.ts`)
+- [ ] Config file uses TypeScript (if `use_typescript: true`)
+- [ ] Timeouts configured correctly (action: 15s, navigation: 30s, test: 60s)
+- [ ] Base URL configured with environment variable fallback
+- [ ] Trace/screenshot/video set to retain-on-failure
+- [ ] Multiple reporters configured (HTML + JUnit + console)
+- [ ] Parallel execution enabled
+- [ ] CI-specific settings configured (retries, workers)
+- [ ] Config file is syntactically valid (no compilation errors)
+
+### Step 5: Environment Configuration
+
+- [ ] `.env.example` created in project root
+- [ ] `TEST_ENV` variable defined
+- [ ] `BASE_URL` variable defined with default
+- [ ] `API_URL` variable defined (if applicable)
+- [ ] Authentication variables defined (if applicable)
+- [ ] Feature flag variables defined (if applicable)
+- [ ] `.nvmrc` created with appropriate Node version
+
+### Step 6: Fixture Architecture
+
+- [ ] `tests/support/fixtures/index.ts` created
+- [ ] Base fixture extended from Playwright/Cypress
+- [ ] Type definitions for fixtures created
+- [ ] mergeTests pattern implemented (if multiple fixtures)
+- [ ] Auto-cleanup logic included in fixtures
+- [ ] Fixture architecture follows knowledge base patterns
+
+### Step 7: Data Factories
+
+- [ ] At least one factory created (e.g., UserFactory)
+- [ ] Factories use @faker-js/faker for realistic data
+- [ ] Factories track created entities (for cleanup)
+- [ ] Factories implement `cleanup()` method
+- [ ] Factories integrate with fixtures
+- [ ] Factories follow knowledge base patterns
+
+### Step 8: Sample Tests
+
+- [ ] Example test file created (`tests/e2e/example.spec.ts`)
+- [ ] Test uses fixture architecture
+- [ ] Test demonstrates data factory usage
+- [ ] Test uses proper selector strategy (data-testid)
+- [ ] Test follows Given-When-Then structure
+- [ ] Test includes proper assertions
+- [ ] Network interception demonstrated (if applicable)
+
+### Step 9: Helper Utilities
+
+- [ ] API helper created (if API testing needed)
+- [ ] Network helper created (if network mocking needed)
+- [ ] Auth helper created (if authentication needed)
+- [ ] Helpers follow functional patterns
+- [ ] Helpers have proper error handling
+
+### Step 10: Documentation
+
+- [ ] `tests/README.md` created
+- [ ] Setup instructions included
+- [ ] Running tests section included
+- [ ] Architecture overview section included
+- [ ] Best practices section included
+- [ ] CI integration section included
+- [ ] Knowledge base references included
+- [ ] Troubleshooting section included
+
+### Step 11: Package.json Updates
+
+- [ ] Minimal test script added to package.json: `test:e2e`
+- [ ] Test framework dependency added (if not already present)
+- [ ] Type definitions added (if TypeScript)
+- [ ] Users can extend with additional scripts as needed
+
+---
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] Config file loads without errors
+- [ ] Config file passes linting (if linter configured)
+- [ ] Config file uses correct syntax for chosen framework
+- [ ] All paths in config resolve correctly
+- [ ] Reporter output directories exist or are created on test run
+
+### Test Execution Validation
+
+- [ ] Sample test runs successfully
+- [ ] Test execution produces expected output (pass/fail)
+- [ ] Test artifacts generated correctly (traces, screenshots, videos)
+- [ ] Test report generated successfully
+- [ ] No console errors or warnings during test run
+
+### Directory Structure Validation
+
+- [ ] All required directories exist
+- [ ] Directory structure matches framework conventions
+- [ ] No duplicate or conflicting directories
+- [ ] Directories accessible with correct permissions
+
+### File Integrity Validation
+
+- [ ] All generated files are syntactically correct
+- [ ] No placeholder text left in files (e.g., "TODO", "FIXME")
+- [ ] All imports resolve correctly
+- [ ] No hardcoded credentials or secrets in files
+- [ ] All file paths use correct separators for OS
+
+---
+
+## Quality Checks
+
+### Code Quality
+
+- [ ] Generated code follows project coding standards
+- [ ] TypeScript types are complete and accurate (no `any` unless necessary)
+- [ ] No unused imports or variables
+- [ ] Consistent code formatting (matches project style)
+- [ ] No linting errors in generated files
+
+### Best Practices Compliance
+
+- [ ] Fixture architecture follows pure function → fixture → mergeTests pattern
+- [ ] Data factories implement auto-cleanup
+- [ ] Network interception occurs before navigation
+- [ ] Selectors use data-testid strategy
+- [ ] Artifacts only captured on failure
+- [ ] Tests follow Given-When-Then structure
+- [ ] No hard-coded waits or sleeps
+
+### Knowledge Base Alignment
+
+- [ ] Fixture pattern matches `fixture-architecture.md`
+- [ ] Data factories match `data-factories.md`
+- [ ] Network handling matches `network-first.md`
+- [ ] Config follows `playwright-config.md` or `test-config.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in configuration files
+- [ ] .env.example contains placeholders, not real values
+- [ ] Sensitive test data handled securely
+- [ ] API keys and tokens use environment variables
+- [ ] No secrets committed to version control
+
+---
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] Framework initialization logged in Quality & Testing Progress section
+- [ ] Status file updated with completion timestamp
+- [ ] Status file shows framework: Playwright or Cypress
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments identified from tea-index.csv
+- [ ] Knowledge fragments successfully loaded
+- [ ] Patterns from knowledge base applied correctly
+- [ ] Knowledge base references included in documentation
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `ci` workflow after completion
+- [ ] Can proceed to `test-design` workflow after completion
+- [ ] Can proceed to `atdd` workflow after completion
+- [ ] Framework setup compatible with downstream workflows
+
+---
+
+## Completion Criteria
+
+**All of the following must be true:**
+
+- [ ] All prerequisite checks passed
+- [ ] All process steps completed without errors
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Sample test executes successfully
+- [ ] User can run `npm run test:e2e` without errors
+- [ ] Documentation is complete and accurate
+- [ ] No critical issues or blockers identified
+
+---
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Copy `.env.example` to `.env`
+2. [ ] Fill in environment-specific values in `.env`
+3. [ ] Run `npm install` to install test dependencies
+4. [ ] Run `npm run test:e2e` to verify setup
+5. [ ] Review `tests/README.md` for project-specific guidance
+
+**Recommended next workflows:**
+
+1. [ ] Run `ci` workflow to set up CI/CD pipeline
+2. [ ] Run `test-design` workflow to plan test coverage
+3. [ ] Run `atdd` workflow when ready to develop stories
+
+---
+
+## Rollback Procedure
+
+If workflow fails and needs to be rolled back:
+
+1. [ ] Delete `tests/` directory
+2. [ ] Remove test scripts from package.json
+3. [ ] Delete `.env.example` (if created)
+4. [ ] Delete `.nvmrc` (if created)
+5. [ ] Delete framework config file
+6. [ ] Remove test dependencies from package.json (if added)
+7. [ ] Run `npm install` to clean up node_modules
+
+---
+
+## Notes
+
+### Common Issues
+
+**Issue**: Config file has TypeScript errors
+
+- **Solution**: Ensure `@playwright/test` or `cypress` types are installed
+
+**Issue**: Sample test fails to run
+
+- **Solution**: Check BASE_URL in .env, ensure app is running
+
+**Issue**: Fixture cleanup not working
+
+- **Solution**: Verify cleanup() is called in fixture teardown
+
+**Issue**: Network interception not working
+
+- **Solution**: Ensure route setup occurs before page.goto()
+
+### Framework-Specific Considerations
+
+**Playwright:**
+
+- Requires Node.js 18+
+- Browser binaries auto-installed on first run
+- Trace viewer requires running `npx playwright show-trace`
+
+**Cypress:**
+
+- Requires Node.js 18+
+- Cypress app opens on first run
+- Component testing requires additional setup
+
+### Version Compatibility
+
+- [ ] Node.js version matches .nvmrc
+- [ ] Framework version compatible with Node.js version
+- [ ] TypeScript version compatible with framework
+- [ ] All peer dependencies satisfied
+
+---
+
+**Checklist Complete**: Sign off when all items checked and validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Framework:** **\*\***\_\_\_**\*\*** (Playwright / Cypress)
+**Notes:** **********\*\***********\_\_\_**********\*\***********
--- a/src/modules/bmm/workflows/testarch/framework/instructions.md
+++ b/src/modules/bmm/workflows/testarch/framework/instructions.md
@@ -1,43 +1,455 @@
 <!-- Powered by BMAD-CORE™ -->

-# Test Framework Setup v3.0
+# Test Framework Setup

-```xml
-<task id="bmad/bmm/testarch/framework" name="Test Framework Setup">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Confirm `package.json` exists.</i>
-    <i>- Verify no modern E2E harness is already configured.</i>
-    <i>- Have architectural/stack context available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Run Preflight Checks">
-      <action>Validate each preflight requirement; stop immediately if any fail.</action>
-    </step>
-    <step n="2" title="Scaffold Framework">
-      <action>Identify framework stack from `package.json` (React/Vue/Angular/Next.js) and bundler (Vite/Webpack/Rollup/esbuild).</action>
-      <action>Select Playwright for large/perf-critical repos, Cypress for small DX-first teams.</action>
-      <action>Create folders `{framework}/tests/`, `{framework}/support/fixtures/`, `{framework}/support/helpers/`.</action>
-      <action>Configure timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit).</action>
-      <action>Generate `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL` plus `.nvmrc`.</action>
-      <action>Implement pure function → fixture → `mergeTests` pattern and faker-based data factories.</action>
-      <action>Enable failure-only screenshots/videos and document setup in README.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce Playwright/Cypress scaffold (config + support tree), `.env.example`, `.nvmrc`, seed tests, and README instructions.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If prerequisites fail or an existing harness is detected, halt and notify the user.</i>
-  </halt>
-  <notes>
-    <i>Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load the `knowledge/` fragments relevant to this task (fixtures, network, config).</i>
-    <i>Playwright: take advantage of worker parallelism, trace viewer, multi-language support.</i>
-    <i>Cypress: avoid when dependent API chains are heavy; consider component testing (Vitest/Cypress CT).</i>
-    <i>Contract testing: suggest Pact for microservices; always recommend data-cy/data-testid selectors.</i>
-  </notes>
-  <output>
-    <i>Scaffolded framework assets and summary of what was created.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/framework`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Initialize a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and best practices. This workflow scaffolds the complete testing infrastructure for modern web applications.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ `package.json` exists in project root
+- ✅ No modern E2E test harness is already configured (check for existing `playwright.config.*` or `cypress.config.*`)
+- ✅ Architectural/stack context available (project type, bundler, dependencies)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Validate package.json**
+   - Read `{project-root}/package.json`
+   - Extract project type (React, Vue, Angular, Next.js, Node, etc.)
+   - Identify bundler (Vite, Webpack, Rollup, esbuild)
+   - Note existing test dependencies
+
+2. **Check for Existing Framework**
+   - Search for `playwright.config.*`, `cypress.config.*`, `cypress.json`
+   - Check `package.json` for `@playwright/test` or `cypress` dependencies
+   - If found, HALT with message: "Existing test framework detected. Use workflow `upgrade-framework` instead."
+
+3. **Gather Context**
+   - Look for architecture documents (`solution-architecture.md`, `tech-spec*.md`)
+   - Check for API documentation or endpoint lists
+   - Identify authentication requirements
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold Framework
+
+### Actions
+
+1. **Framework Selection**
+
+   **Default Logic:**
+   - **Playwright** (recommended for):
+     - Large repositories (100+ files)
+     - Performance-critical applications
+     - Multi-browser support needed
+     - Complex user flows requiring video/trace debugging
+     - Projects requiring worker parallelism
+
+   - **Cypress** (recommended for):
+     - Small teams prioritizing developer experience
+     - Component testing focus
+     - Real-time reloading during test development
+     - Simpler setup requirements
+
+   **Detection Strategy:**
+   - Check `package.json` for existing preference
+   - Consider `project_size` variable from workflow config
+   - Use `framework_preference` variable if set
+   - Default to **Playwright** if uncertain
+
+2. **Create Directory Structure**
+
+   ```
+   {project-root}/
+   ├── tests/                        # Root test directory
+   │   ├── e2e/                      # Test files (users organize as needed)
+   │   ├── support/                  # Framework infrastructure (key pattern)
+   │   │   ├── fixtures/             # Test fixtures (data, mocks)
+   │   │   ├── helpers/              # Utility functions
+   │   │   └── page-objects/         # Page object models (optional)
+   │   └── README.md                 # Test suite documentation
+   ```
+
+   **Note**: Users organize test files (e2e/, api/, integration/, component/) as needed. The **support/** folder is the critical pattern for fixtures and helpers used across tests.
+
+3. **Generate Configuration File**
+
+   **For Playwright** (`playwright.config.ts` or `playwright.config.js`):
+
+   ```typescript
+   import { defineConfig, devices } from '@playwright/test';
+
+   export default defineConfig({
+     testDir: './tests/e2e',
+     fullyParallel: true,
+     forbidOnly: !!process.env.CI,
+     retries: process.env.CI ? 2 : 0,
+     workers: process.env.CI ? 1 : undefined,
+
+     timeout: 60 * 1000, // Test timeout: 60s
+     expect: {
+       timeout: 15 * 1000, // Assertion timeout: 15s
+     },
+
+     use: {
+       baseURL: process.env.BASE_URL || 'http://localhost:3000',
+       trace: 'retain-on-failure',
+       screenshot: 'only-on-failure',
+       video: 'retain-on-failure',
+       actionTimeout: 15 * 1000, // Action timeout: 15s
+       navigationTimeout: 30 * 1000, // Navigation timeout: 30s
+     },
+
+     reporter: [['html', { outputFolder: 'test-results/html' }], ['junit', { outputFile: 'test-results/junit.xml' }], ['list']],
+
+     projects: [
+       { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
+       { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
+       { name: 'webkit', use: { ...devices['Desktop Safari'] } },
+     ],
+   });
+   ```
+
+   **For Cypress** (`cypress.config.ts` or `cypress.config.js`):
+
+   ```typescript
+   import { defineConfig } from 'cypress';
+
+   export default defineConfig({
+     e2e: {
+       baseUrl: process.env.BASE_URL || 'http://localhost:3000',
+       specPattern: 'tests/e2e/**/*.cy.{js,jsx,ts,tsx}',
+       supportFile: 'tests/support/e2e.ts',
+       video: false,
+       screenshotOnRunFailure: true,
+
+       setupNodeEvents(on, config) {
+         // implement node event listeners here
+       },
+     },
+
+     retries: {
+       runMode: 2,
+       openMode: 0,
+     },
+
+     defaultCommandTimeout: 15000,
+     requestTimeout: 30000,
+     responseTimeout: 30000,
+     pageLoadTimeout: 60000,
+   });
+   ```
+
+4. **Generate Environment Configuration**
+
+   Create `.env.example`:
+
+   ```bash
+   # Test Environment Configuration
+   TEST_ENV=local
+   BASE_URL=http://localhost:3000
+   API_URL=http://localhost:3001/api
+
+   # Authentication (if applicable)
+   TEST_USER_EMAIL=test@example.com
+   TEST_USER_PASSWORD=
+
+   # Feature Flags (if applicable)
+   FEATURE_FLAG_NEW_UI=true
+
+   # API Keys (if applicable)
+   TEST_API_KEY=
+   ```
+
+5. **Generate Node Version File**
+
+   Create `.nvmrc`:
+
+   ```
+   20.11.0
+   ```
+
+   (Use Node version from existing `.nvmrc` or default to current LTS)
+
+6. **Implement Fixture Architecture**
+
+   **Knowledge Base Reference**: `testarch/knowledge/fixture-architecture.md`
+
+   Create `tests/support/fixtures/index.ts`:
+
+   ```typescript
+   import { test as base } from '@playwright/test';
+   import { UserFactory } from './factories/user-factory';
+
+   type TestFixtures = {
+     userFactory: UserFactory;
+   };
+
+   export const test = base.extend<TestFixtures>({
+     userFactory: async ({}, use) => {
+       const factory = new UserFactory();
+       await use(factory);
+       await factory.cleanup(); // Auto-cleanup
+     },
+   });
+
+   export { expect } from '@playwright/test';
+   ```
+
+7. **Implement Data Factories**
+
+   **Knowledge Base Reference**: `testarch/knowledge/data-factories.md`
+
+   Create `tests/support/fixtures/factories/user-factory.ts`:
+
+   ```typescript
+   import { faker } from '@faker-js/faker';
+
+   export class UserFactory {
+     private createdUsers: string[] = [];
+
+     async createUser(overrides = {}) {
+       const user = {
+         email: faker.internet.email(),
+         name: faker.person.fullName(),
+         password: faker.internet.password({ length: 12 }),
+         ...overrides,
+       };
+
+       // API call to create user
+       const response = await fetch(`${process.env.API_URL}/users`, {
+         method: 'POST',
+         headers: { 'Content-Type': 'application/json' },
+         body: JSON.stringify(user),
+       });
+
+       const created = await response.json();
+       this.createdUsers.push(created.id);
+       return created;
+     }
+
+     async cleanup() {
+       // Delete all created users
+       for (const userId of this.createdUsers) {
+         await fetch(`${process.env.API_URL}/users/${userId}`, {
+           method: 'DELETE',
+         });
+       }
+       this.createdUsers = [];
+     }
+   }
+   ```
+
+8. **Generate Sample Tests**
+
+   Create `tests/e2e/example.spec.ts`:
+
+   ```typescript
+   import { test, expect } from '../support/fixtures';
+
+   test.describe('Example Test Suite', () => {
+     test('should load homepage', async ({ page }) => {
+       await page.goto('/');
+       await expect(page).toHaveTitle(/Home/i);
+     });
+
+     test('should create user and login', async ({ page, userFactory }) => {
+       // Create test user
+       const user = await userFactory.createUser();
+
+       // Login
+       await page.goto('/login');
+       await page.fill('[data-testid="email-input"]', user.email);
+       await page.fill('[data-testid="password-input"]', user.password);
+       await page.click('[data-testid="login-button"]');
+
+       // Assert login success
+       await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+     });
+   });
+   ```
+
+9. **Update package.json Scripts**
+
+   Add minimal test script to `package.json`:
+
+   ```json
+   {
+     "scripts": {
+       "test:e2e": "playwright test"
+     }
+   }
+   ```
+
+   **Note**: Users can add additional scripts as needed (e.g., `--ui`, `--headed`, `--debug`, `show-report`).
+
+10. **Generate Documentation**
+
+    Create `tests/README.md` with setup instructions (see Step 3 deliverables).
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts`
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+
+2. **Directory Structure**
+   - `tests/` with `e2e/`, `api/`, `support/` subdirectories
+   - `support/fixtures/` for test fixtures
+   - `support/helpers/` for utility functions
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`
+   - `.nvmrc` with Node version
+
+4. **Test Infrastructure**
+   - Fixture architecture (`mergeTests` pattern)
+   - Data factories (faker-based, with auto-cleanup)
+   - Sample tests demonstrating patterns
+
+5. **Documentation**
+   - `tests/README.md` with setup instructions
+   - Comments in config files explaining options
+
+### README Contents
+
+The generated `tests/README.md` should include:
+
+- **Setup Instructions**: How to install dependencies, configure environment
+- **Running Tests**: Commands for local execution, headed mode, debug mode
+- **Architecture Overview**: Fixture pattern, data factories, page objects
+- **Best Practices**: Selector strategy (data-testid), test isolation, cleanup
+- **CI Integration**: How tests run in CI/CD pipeline
+- **Knowledge Base References**: Links to relevant TEA knowledge fragments
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `fixture-architecture.md` - Pure function → fixture → `mergeTests` pattern
+- `data-factories.md` - Faker-based factories with auto-cleanup
+- `network-first.md` - Network-first testing safeguards
+- `playwright-config.md` - Playwright-specific configuration best practices
+- `test-config.md` - General configuration guidelines
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing (Cypress CT or use Vitest)
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend**:
+
+- `data-testid` attributes for UI elements
+- `data-cy` attributes if Cypress is chosen
+- Avoid brittle CSS selectors or XPath
+
+### Contract Testing
+
+For microservices architectures, **recommend Pact** for consumer-driven contract testing alongside E2E tests.
+
+### Failure Artifacts
+
+Configure **failure-only** capture:
+
+- Screenshots: only on failure
+- Videos: retain on failure (delete on success)
+- Traces: retain on failure (Playwright)
+
+This reduces storage overhead while maintaining debugging capability.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: `playwright.config.ts`
+- ✅ Directory structure: `tests/e2e/`, `tests/support/`
+- ✅ Environment config: `.env.example`
+- ✅ Node version: `.nvmrc`
+- ✅ Fixture architecture: `tests/support/fixtures/`
+- ✅ Data factories: `tests/support/fixtures/factories/`
+- ✅ Sample tests: `tests/e2e/example.spec.ts`
+- ✅ Documentation: `tests/README.md`
+
+**Next Steps**:
+
+1. Copy `.env.example` to `.env` and fill in environment variables
+2. Run `npm install` to install test dependencies
+3. Run `npm run test:e2e` to execute sample tests
+4. Review `tests/README.md` for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
 ```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Configuration file created and valid
+- [ ] Directory structure exists
+- [ ] Environment configuration generated
+- [ ] Sample tests run successfully
+- [ ] Documentation complete and accurate
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/framework/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/framework/workflow.yaml
@@ -1,25 +1,67 @@
 # Test Architect workflow: framework
 name: testarch-framework
-description: "Initialize or refresh the test framework harness."
+description: "Initialize production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, and configuration"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/framework"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"

-template: false
+# Variables and inputs
+variables:
+  test_framework: "" # playwright or cypress - auto-detect from package.json or ask
+  project_type: "" # react, vue, angular, next, node - detected from package.json
+  bundler: "" # vite, webpack, rollup, esbuild - detected from package.json
+  test_dir: "{project-root}/tests" # Root test directory
+  config_file: "" # Will be set to {project-root}/{framework}.config.{ts|js}
+  use_typescript: true # Prefer TypeScript configuration
+  standalone_mode: true # Can run without story context
+
+  # Framework selection criteria
+  framework_preference: "auto" # auto, playwright, cypress
+  project_size: "auto" # auto, small, large - influences framework choice
+
+  # Output artifacts
+  generate_env_example: true
+  generate_nvmrc: true
+  generate_readme: true
+  generate_sample_tests: true
+
+# Output configuration
+default_output_file: "{test_dir}/README.md" # Main deliverable is test setup README
+
+# Required tools
+required_tools:
+  - read_file # Read package.json, existing configs
+  - write_file # Create config files, helpers, fixtures, tests
+  - create_directory # Create test directory structure
+  - list_files # Check for existing framework
+  - search_repo # Find architecture docs
+
+# Recommended inputs
+recommended_inputs:
+  - package_json: "package.json with project dependencies and scripts"
+  - architecture_docs: "Architecture or tech stack documentation (optional)"
+  - existing_tests: "Existing test files to detect current framework (optional)"

 tags:
  - qa
  - setup
  - test-architect
+  - framework
+  - initialization

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts; auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/gate/README.md
+++ b/src/modules/bmm/workflows/testarch/gate/README.md
@@ -0,0 +1,493 @@
+# Quality Gate Decision Workflow
+
+The Quality Gate workflow makes deterministic release decisions (PASS/CONCERNS/FAIL/WAIVED) based on comprehensive quality evidence including test results, risk assessment, traceability, and non-functional requirements validation.
+
+## Overview
+
+This workflow is the final checkpoint before deploying a story, epic, or release to production. It evaluates all quality evidence against predefined criteria and makes a transparent, rule-based decision with complete audit trail.
+
+**Key Features:**
+
+- **Deterministic Decision Rules**: Clear, objective criteria eliminate bias
+- **Four Decision States**: PASS (ready), CONCERNS (deploy with monitoring), FAIL (blocked), WAIVED (business override)
+- **P0-P3 Risk Framework**: Prioritized evaluation of critical vs nice-to-have features
+- **Evidence-Based**: Never guess - requires test results, coverage, NFR validation
+- **Waiver Management**: Business-approved exceptions with remediation plans
+- **Audit Trail**: Complete history of decisions with rationale
+- **CI/CD Integration**: Gate YAML snippets for pipeline automation
+- **Stakeholder Communication**: Auto-generated notifications with decision summary
+
+---
+
+## Usage
+
+```bash
+bmad tea *gate
+```
+
+The TEA agent runs this workflow when:
+
+- Story is complete and ready for release (after `*dev story-approved`)
+- Epic is complete and needs quality validation before deployment
+- Release candidate needs final go/no-go decision
+- Hotfix requires expedited quality assessment
+- User explicitly requests gate decision: `bmad tea *gate`
+
+**Typical workflow sequence:**
+
+1. `*test-design` → Risk assessment with P0-P3 prioritization
+2. `*atdd` → Generate failing tests before implementation
+3. `*dev story` → Implement feature with tests passing
+4. `*automate` → Expand regression suite
+5. `*trace` → Verify requirements-to-tests coverage
+6. `*nfr-assess` → Validate non-functional requirements
+7. **`*gate`** → Make final release decision ⬅️ YOU ARE HERE
+
+---
+
+## Inputs
+
+### Required Context Files
+
+- **Test Results**: CI/CD pipeline results, test framework reports (Playwright HTML, Jest JSON, JUnit XML)
+- **Story/Epic File**: The feature being gated (e.g., `story-1.3.md`, `epic-2.md`)
+
+### Recommended Context Files
+
+- **test-design.md**: Risk assessment with P0/P1/P2/P3 scenario prioritization
+- **traceability-matrix.md**: Requirements-to-tests coverage analysis with gap identification
+- **nfr-assessment.md**: Non-functional requirements validation (security, performance, reliability, maintainability)
+- **Code Coverage Report**: Line/branch/function coverage metrics
+- **Burn-in Results**: 10-iteration flakiness detection from CI pipeline
+
+### Workflow Variables
+
+Key variables that control gate behavior (configured in `workflow.yaml`):
+
+- **gate_type**: `story` | `epic` | `release` | `hotfix` (default: `story`)
+- **decision_mode**: `deterministic` | `manual` (default: `deterministic`)
+- **min_p0_pass_rate**: Threshold for P0 tests (default: `100` - must be perfect)
+- **min_p1_pass_rate**: Threshold for P1 tests (default: `95%`)
+- **min_overall_pass_rate**: Overall test threshold (default: `90%`)
+- **min_coverage**: Code coverage minimum (default: `80%`)
+- **allow_waivers**: Enable business-approved waivers (default: `true`)
+- **require_evidence**: Require links to test results/reports (default: `true`)
+- **validate_evidence_freshness**: Warn if assessments >7 days old (default: `true`)
+
+---
+
+## Outputs
+
+### Primary Deliverable
+
+**Gate Decision Document** (`gate-decision-{type}-{id}.md`):
+
+- **Decision**: PASS / CONCERNS / FAIL / WAIVED with clear rationale
+- **Evidence Summary**: Test results, coverage, NFRs, flakiness validation
+- **Rationale**: Explanation of decision based on criteria
+- **Residual Risks**: Unresolved issues (for CONCERNS/WAIVED)
+- **Waiver Details**: Approver, expiry, remediation plan (for WAIVED)
+- **Critical Issues**: Top blockers with owners and due dates (for FAIL)
+- **Recommendations**: Next steps for each decision type
+- **Audit Trail**: Complete history for compliance/review
+
+### Secondary Outputs
+
+- **Gate YAML**: Machine-readable snippet for CI/CD integration
+- **Status Update**: Appends decision to `bmm-workflow-status.md` history
+- **Stakeholder Notification**: Auto-generated message with decision summary
+
+### Validation Safeguards
+
+- ✅ All required evidence sources discovered or explicitly provided
+- ✅ Evidence freshness validated (warns if >7 days old)
+- ✅ P0 criteria evaluated first (immediate FAIL if not met)
+- ✅ Decision rules applied deterministically (no human bias)
+- ✅ Waivers require business justification and remediation plan
+- ✅ Audit trail maintained for transparency
+
+---
+
+## Decision Logic
+
+### PASS Decision
+
+**All criteria met:**
+
+- ✅ P0 test pass rate = 100%
+- ✅ P1 test pass rate ≥ 95%
+- ✅ Overall test pass rate ≥ 90%
+- ✅ Code coverage ≥ 80%
+- ✅ Security issues = 0
+- ✅ Critical NFR failures = 0
+- ✅ Flaky tests = 0
+
+**Action:** Deploy to production with standard monitoring
+
+---
+
+### CONCERNS Decision
+
+**P0 criteria met, but P1 criteria degraded:**
+
+- ✅ P0 test pass rate = 100%
+- ⚠️ P1 test pass rate 90-94% (below 95% threshold)
+- ⚠️ Code coverage 75-79% (below 80% threshold)
+- ✅ No security issues
+- ✅ No critical NFR failures
+- ✅ No flaky tests
+
+**Residual Risks:** Minor P1 issues, edge cases, non-critical gaps
+
+**Action:** Deploy with enhanced monitoring, create backlog stories for fixes
+
+---
+
+### FAIL Decision
+
+**Any P0 criterion failed:**
+
+- ❌ P0 test pass rate <100%
+- OR ❌ Security issues >0
+- OR ❌ Critical NFR failures >0
+- OR ❌ Flaky tests detected
+
+**Critical Blockers:** P0 test failures, security vulnerabilities, critical NFRs
+
+**Action:** Block deployment, fix critical issues, re-run gate after fixes
+
+---
+
+### WAIVED Decision
+
+**FAIL status + business-approved waiver:**
+
+- ❌ Original decision: FAIL
+- 🔓 Waiver approved by: {VP Engineering / CTO / Product Owner}
+- 📋 Business justification: {regulatory deadline, contractual obligation, etc.}
+- 📅 Waiver expiry: {date - does NOT apply to future releases}
+- 🔧 Remediation plan: {fix in next release, due date}
+
+**Action:** Deploy with business approval, aggressive monitoring, fix ASAP
+
+---
+
+## Integration with Other Workflows
+
+### Before Gate
+
+1. **test-design** (recommended) - Provides P0-P3 risk framework
+2. **atdd** (recommended) - Ensures acceptance criteria have tests
+3. **automate** (recommended) - Expands regression suite
+4. **trace** (recommended) - Verifies requirements coverage
+5. **nfr-assess** (recommended) - Validates non-functional requirements
+
+### After Gate
+
+- **PASS**: Proceed to deployment workflow
+- **CONCERNS**: Deploy with monitoring, create remediation backlog stories
+- **FAIL**: Block deployment, fix issues, re-run gate
+- **WAIVED**: Deploy with business approval, escalate monitoring
+
+### Coordinates With
+
+- **bmm-workflow-status.md**: Appends gate decision to history
+- **CI/CD Pipeline**: Gate YAML used for automated gates
+- **PM/SM**: Notification of decision and next steps
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Ideal Release (PASS)
+
+```
+Evidence:
+- P0 tests: 15/15 passed (100%) ✅
+- P1 tests: 28/29 passed (96.5%) ✅
+- Overall: 98% pass rate ✅
+- Coverage: 87% ✅
+- Security: 0 issues ✅
+- Flakiness: 0 flaky tests ✅
+
+Decision: ✅ PASS
+
+Rationale: All criteria exceeded thresholds. Feature ready for production.
+
+Next Steps:
+1. Deploy to staging
+2. Monitor for 24 hours
+3. Deploy to production
+```
+
+---
+
+### Scenario 2: Minor Issues (CONCERNS)
+
+```
+Evidence:
+- P0 tests: 12/12 passed (100%) ✅
+- P1 tests: 21/24 passed (87.5%) ⚠️
+- Overall: 91% pass rate ✅
+- Coverage: 78% ⚠️
+- Security: 0 issues ✅
+- Flakiness: 0 flaky tests ✅
+
+Decision: ⚠️ CONCERNS
+
+Rationale: P0 criteria met, but P1 pass rate (87.5%) below threshold (95%).
+Coverage (78%) slightly below target (80%). Issues are edge cases in
+international date handling - low probability, workaround exists.
+
+Residual Risks:
+- P1: Date formatting edge case for Japan/Korea timezones
+- Coverage: Missing tests for admin override flow
+
+Next Steps:
+1. Deploy with enhanced monitoring on date formatting
+2. Create backlog story: "Fix date formatting for Asia Pacific"
+3. Add admin override tests in next sprint
+```
+
+---
+
+### Scenario 3: Critical Blocker (FAIL)
+
+```
+Evidence:
+- P0 tests: 9/12 passed (75%) ❌
+- Security: 1 SQL injection in search filter ❌
+- Coverage: 68% ❌
+
+Decision: ❌ FAIL
+
+Rationale: CRITICAL BLOCKERS:
+1. P0 test failures in core search functionality
+2. Unresolved SQL injection vulnerability (CRITICAL)
+3. Coverage below minimum threshold
+
+Critical Issues:
+| Priority | Issue | Owner | Due Date |
+|----------|-------|-------|----------|
+| P0 | Fix SQL injection in search filter | Backend | 2025-10-16 |
+| P0 | Fix search pagination crash | Backend | 2025-10-16 |
+| P0 | Fix search timeout for large datasets | Backend | 2025-10-17 |
+
+Next Steps:
+1. BLOCK DEPLOYMENT IMMEDIATELY
+2. Fix P0 issues listed above
+3. Re-run full test suite
+4. Re-run gate after fixes verified
+```
+
+---
+
+### Scenario 4: Business Override (WAIVED)
+
+```
+Evidence:
+- P0 tests: 10/11 passed (90.9%) ❌
+- Issue: Legacy report export fails for Excel 2007
+
+Original Decision: ❌ FAIL
+
+Waiver Details:
+- Approver: Jane Doe, VP Engineering
+- Reason: GDPR compliance deadline (regulatory requirement, Oct 15)
+- Expiry: 2025-10-15 (does NOT apply to v2.5.0)
+- Monitoring: Enhanced error tracking on report export
+- Remediation: Fix in v2.4.1 hotfix (due Oct 20)
+
+Decision: 🔓 WAIVED
+
+Business Justification:
+Release contains critical GDPR features required by law on Oct 15. Failed
+test affects legacy Excel 2007 export used by <1% of users. Workaround
+available (use Excel 2010+). Risk acceptable given regulatory priority.
+
+Next Steps:
+1. Deploy v2.4.0 with waiver approval
+2. Monitor error rates on report export
+3. Fix Excel 2007 export in v2.4.1 (Oct 20)
+4. Notify affected users of workaround
+```
+
+---
+
+## Important Notes
+
+### Deterministic vs Manual Mode
+
+- **Deterministic mode** (recommended): Rule-based decisions using predefined thresholds
+  - Eliminates bias and ensures consistency
+  - Clear audit trail of criteria evaluation
+  - Faster decisions for routine releases
+
+- **Manual mode**: Human judgment with guidance from criteria
+  - Use for edge cases, unusual situations
+  - Still requires evidence documentation
+  - TEA provides recommendation, user makes final call
+
+### P0 is Sacred
+
+**P0 failures ALWAYS result in FAIL** (no exceptions except waivers):
+
+- P0 = Critical user journeys, security, data integrity
+- Cannot deploy with P0 failures - too risky
+- Waivers require VP/CTO approval + business justification
+
+### Waivers are Temporary
+
+- Waiver applies ONLY to specific release
+- Issue must be fixed in next release
+- Waiver expiry date enforced
+- Never waive: security, data corruption, compliance violations
+
+### Evidence Freshness Matters
+
+- Assessments >7 days old may be stale
+- Code changes since assessment may invalidate conclusions
+- Re-run workflows if evidence is outdated
+
+### Security Never Compromised
+
+- Security issues ALWAYS block release
+- No waivers for security vulnerabilities
+- Fix security issues immediately, then re-gate
+
+---
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **risk-governance.md** - Risk-based quality gate criteria and decision framework
+- **probability-impact.md** - Risk scoring (probability × impact) for residual risks
+- **test-quality.md** - Definition of Done for tests, quality standards
+- **test-priorities.md** - P0/P1/P2/P3 priority classification framework
+- **ci-burn-in.md** - Flakiness detection and burn-in validation patterns
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Troubleshooting
+
+### Problem: No test results found
+
+**Solution:**
+
+- Check CI/CD pipeline for test execution
+- Verify `test_results` variable points to correct path
+- Run tests locally and provide results explicitly
+
+---
+
+### Problem: Assessments are stale (>7 days old)
+
+**Solution:**
+
+- Re-run `*test-design` workflow
+- Re-run `*trace` workflow
+- Re-run `*nfr-assess` workflow
+- Update evidence files before gate decision
+
+---
+
+### Problem: Unclear decision (edge case)
+
+**Solution:**
+
+- Switch to manual mode: `decision_mode: manual`
+- Document assumptions and rationale clearly
+- Escalate to tech lead or architect for guidance
+- Consider waiver if business-critical
+
+---
+
+### Problem: Waiver requested but not justified
+
+**Solution:**
+
+- Require written business justification from stakeholder
+- Ensure approver is appropriate authority (VP/CTO/PO)
+- Verify remediation plan exists with concrete due date
+- Document monitoring plan for waived risk
+- Confirm waiver expiry date (must be fixed in next release)
+
+---
+
+## Integration with BMad Status File
+
+This workflow updates `bmm-workflow-status.md` with gate decisions:
+
+```markdown
+### Quality & Testing Progress (TEA Agent)
+
+**Gate Decisions:**
+
+- [2025-10-14] ✅ PASS - Story 1.3 (User Auth) - All criteria met, 98% pass rate
+- [2025-10-14] ⚠️ CONCERNS - Epic 2 (Payments) - P1 pass rate 89%, deploy with monitoring
+- [2025-10-14] ❌ FAIL - Story 3.2 (Export) - SQL injection blocking release
+- [2025-10-15] 🔓 WAIVED - Release v2.4.0 - GDPR deadline, VP approved
+```
+
+---
+
+## Configuration Examples
+
+### Strict Gate (Zero Tolerance)
+
+```yaml
+min_p0_pass_rate: 100
+min_p1_pass_rate: 100
+min_overall_pass_rate: 95
+min_coverage: 90
+allow_waivers: false
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Financial systems, healthcare, security-critical features
+
+---
+
+### Balanced Gate (Production Standard)
+
+```yaml
+min_p0_pass_rate: 100
+min_p1_pass_rate: 95
+min_overall_pass_rate: 90
+min_coverage: 80
+allow_waivers: true
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Most production releases (default configuration)
+
+---
+
+### Relaxed Gate (Early Development)
+
+```yaml
+min_p0_pass_rate: 100
+min_p1_pass_rate: 85
+min_overall_pass_rate: 80
+min_coverage: 70
+allow_waivers: true
+allow_p2_failures: true
+allow_p3_failures: true
+```
+
+Use for: Alpha/beta releases, internal tools, proof-of-concept
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Risk assessment before implementation
+- `bmad tea *trace` - Verify requirements-to-tests coverage
+- `bmad tea *nfr-assess` - Validate non-functional requirements
+- `bmad tea *automate` - Expand regression suite
+- `bmad sm story-approved` - Mark story as complete (triggers gate)
--- a/src/modules/bmm/workflows/testarch/gate/checklist.md
+++ b/src/modules/bmm/workflows/testarch/gate/checklist.md
@@ -0,0 +1,393 @@
+# Quality Gate Decision - Validation Checklist
+
+Use this checklist to validate that the gate decision workflow completed successfully and all criteria were properly evaluated.
+
+---
+
+## Prerequisites
+
+### Evidence Gathering
+
+- [ ] Test execution results obtained (CI/CD pipeline, test framework reports)
+- [ ] Story/epic/release file identified and read
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Traceability matrix discovered or explicitly provided (if available)
+- [ ] NFR assessment discovered or explicitly provided (if available)
+- [ ] Code coverage report discovered or explicitly provided (if available)
+- [ ] Burn-in results discovered or explicitly provided (if available)
+
+### Evidence Validation
+
+- [ ] Evidence freshness validated (warn if >7 days old, recommend re-running workflows)
+- [ ] All required assessments available or user acknowledged gaps
+- [ ] Test results are complete (not partial or interrupted runs)
+- [ ] Test results match current codebase (not from outdated branch)
+
+### Knowledge Base Loading
+
+- [ ] `risk-governance.md` loaded successfully
+- [ ] `probability-impact.md` loaded successfully
+- [ ] `test-quality.md` loaded successfully
+- [ ] `test-priorities.md` loaded successfully
+- [ ] `ci-burn-in.md` loaded (if burn-in results available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Gate type identified (story/epic/release/hotfix)
+- [ ] Target ID extracted (story_id, epic_num, or release_version)
+- [ ] Decision thresholds loaded from workflow variables
+- [ ] Risk tolerance configuration loaded
+- [ ] Waiver policy loaded
+
+### Step 2: Evidence Parsing
+
+**Test Results:**
+
+- [ ] Total test count extracted
+- [ ] Passed test count extracted
+- [ ] Failed test count extracted
+- [ ] Skipped test count extracted
+- [ ] Test duration extracted
+- [ ] P0 test pass rate calculated
+- [ ] P1 test pass rate calculated
+- [ ] Overall test pass rate calculated
+
+**Quality Assessments:**
+
+- [ ] P0/P1/P2/P3 scenarios extracted from test-design.md (if available)
+- [ ] Risk scores extracted from test-design.md (if available)
+- [ ] Coverage percentages extracted from traceability-matrix.md (if available)
+- [ ] Coverage gaps extracted from traceability-matrix.md (if available)
+- [ ] NFR status extracted from nfr-assessment.md (if available)
+- [ ] Security issues count extracted from nfr-assessment.md (if available)
+
+**Code Coverage:**
+
+- [ ] Line coverage percentage extracted (if available)
+- [ ] Branch coverage percentage extracted (if available)
+- [ ] Function coverage percentage extracted (if available)
+- [ ] Critical path coverage validated (if available)
+
+**Burn-in Results:**
+
+- [ ] Burn-in iterations count extracted (if available)
+- [ ] Flaky tests count extracted (if available)
+- [ ] Stability score calculated (if available)
+
+### Step 3: Decision Rules Application
+
+**P0 Criteria Evaluation:**
+
+- [ ] P0 test pass rate evaluated (must be 100%)
+- [ ] P0 acceptance criteria coverage evaluated (must be 100%)
+- [ ] Security issues count evaluated (must be 0)
+- [ ] Critical NFR failures evaluated (must be 0)
+- [ ] Flaky tests evaluated (must be 0 if burn-in enabled)
+- [ ] P0 decision recorded: PASS or FAIL
+
+**P1 Criteria Evaluation:**
+
+- [ ] P1 test pass rate evaluated (threshold: min_p1_pass_rate)
+- [ ] P1 acceptance criteria coverage evaluated (threshold: 95%)
+- [ ] Overall test pass rate evaluated (threshold: min_overall_pass_rate)
+- [ ] Code coverage evaluated (threshold: min_coverage)
+- [ ] P1 decision recorded: PASS or CONCERNS
+
+**P2/P3 Criteria Evaluation:**
+
+- [ ] P2 failures tracked (informational, don't block if allow_p2_failures: true)
+- [ ] P3 failures tracked (informational, don't block if allow_p3_failures: true)
+- [ ] Residual risks documented
+
+**Final Decision:**
+
+- [ ] Decision determined: PASS / CONCERNS / FAIL / WAIVED
+- [ ] Decision rationale documented
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+
+### Step 4: Documentation
+
+**Gate Decision Document Created:**
+
+- [ ] Story/epic/release info section complete (ID, title, description, links)
+- [ ] Decision clearly stated (PASS / CONCERNS / FAIL / WAIVED)
+- [ ] Decision date recorded
+- [ ] Evaluator recorded (user or agent name)
+
+**Evidence Summary Documented:**
+
+- [ ] Test results summary complete (total, passed, failed, pass rates)
+- [ ] Coverage summary complete (P0/P1 criteria, code coverage)
+- [ ] NFR validation summary complete (security, performance, reliability, maintainability)
+- [ ] Flakiness summary complete (burn-in iterations, flaky test count)
+
+**Rationale Documented:**
+
+- [ ] Decision rationale clearly explained
+- [ ] Key evidence highlighted
+- [ ] Assumptions and caveats noted (if any)
+
+**Residual Risks Documented (if CONCERNS or WAIVED):**
+
+- [ ] Unresolved P1/P2 issues listed
+- [ ] Probability × impact estimated for each risk
+- [ ] Mitigations or workarounds described
+
+**Waivers Documented (if WAIVED):**
+
+- [ ] Waiver reason documented (business justification)
+- [ ] Waiver approver documented (name, role)
+- [ ] Waiver expiry date documented
+- [ ] Remediation plan documented (fix in next release, due date)
+- [ ] Monitoring plan documented
+
+**Critical Issues Documented (if FAIL or CONCERNS):**
+
+- [ ] Top 5-10 critical issues listed
+- [ ] Priority assigned to each issue (P0/P1/P2)
+- [ ] Owner assigned to each issue
+- [ ] Due date assigned to each issue
+
+**Recommendations Documented:**
+
+- [ ] Next steps clearly stated for decision type
+- [ ] Deployment recommendation provided
+- [ ] Monitoring recommendations provided (if applicable)
+- [ ] Remediation recommendations provided (if applicable)
+
+### Step 5: Status Updates and Notifications
+
+**Status File Updated:**
+
+- [ ] Gate decision appended to bmm-workflow-status.md (if append_to_history: true)
+- [ ] Format correct: `[DATE] Gate Decision: DECISION - Target {ID} - {rationale}`
+- [ ] Status file committed or staged for commit
+
+**Gate YAML Created:**
+
+- [ ] Gate YAML snippet generated with decision and criteria
+- [ ] Evidence references included in YAML
+- [ ] Next steps included in YAML
+- [ ] YAML file saved to output folder
+
+**Stakeholder Notification Generated:**
+
+- [ ] Notification subject line created
+- [ ] Notification body created with summary
+- [ ] Recipients identified (PM, SM, DEV lead, stakeholders)
+- [ ] Notification ready for delivery (if notify_stakeholders: true)
+
+**Outputs Saved:**
+
+- [ ] Gate decision document saved to `{output_file}`
+- [ ] Gate YAML saved to `{output_folder}/gate-decision-{target}.yaml`
+- [ ] All outputs are valid and readable
+
+---
+
+## Output Validation
+
+### Gate Decision Document
+
+**Completeness:**
+
+- [ ] All required sections present (info, decision, evidence, rationale, next steps)
+- [ ] No placeholder text or TODOs left in document
+- [ ] All evidence references are accurate and complete
+- [ ] All links to artifacts are valid
+
+**Accuracy:**
+
+- [ ] Decision matches applied criteria rules
+- [ ] Test results match CI/CD pipeline output
+- [ ] Coverage percentages match reports
+- [ ] NFR status matches assessment document
+- [ ] No contradictions or inconsistencies
+
+**Clarity:**
+
+- [ ] Decision rationale is clear and unambiguous
+- [ ] Technical jargon is explained or avoided
+- [ ] Stakeholders can understand next steps
+- [ ] Recommendations are actionable
+
+### Gate YAML
+
+**Format:**
+
+- [ ] YAML is valid (no syntax errors)
+- [ ] All required fields present (target, decision, date, evaluator, criteria, evidence)
+- [ ] Field values are correct data types (numbers, strings, dates)
+
+**Content:**
+
+- [ ] Criteria values match decision document
+- [ ] Evidence references are accurate
+- [ ] Next steps align with decision type
+
+---
+
+## Quality Checks
+
+### Decision Integrity
+
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+- [ ] P0 failures result in FAIL decision (unless waived)
+- [ ] Security issues result in FAIL decision (unless waived - but should never be waived)
+- [ ] Waivers have business justification and approver (if WAIVED)
+- [ ] Residual risks are documented (if CONCERNS or WAIVED)
+
+### Evidence-Based
+
+- [ ] Decision is based on actual test results (not guesses)
+- [ ] All claims are supported by evidence
+- [ ] No assumptions without documentation
+- [ ] Evidence sources are cited (CI run IDs, report URLs)
+
+### Transparency
+
+- [ ] Decision rationale is transparent and auditable
+- [ ] Criteria evaluation is documented step-by-step
+- [ ] Any deviations from standard process are explained
+- [ ] Waiver justifications are clear (if applicable)
+
+### Consistency
+
+- [ ] Decision aligns with risk-governance knowledge fragment
+- [ ] Priority framework (P0/P1/P2/P3) applied consistently
+- [ ] Terminology consistent with test-quality knowledge fragment
+- [ ] Decision matrix followed correctly
+
+---
+
+## Integration Points
+
+### BMad Workflow Status
+
+- [ ] Gate decision added to `bmm-workflow-status.md`
+- [ ] Format matches existing gate history entries
+- [ ] Timestamp is accurate
+- [ ] Decision summary is concise (<80 chars)
+
+### CI/CD Pipeline
+
+- [ ] Gate YAML is CI/CD-compatible
+- [ ] YAML can be parsed by pipeline automation
+- [ ] Decision can be used to block/allow deployments
+- [ ] Evidence references are accessible to pipeline
+
+### Stakeholders
+
+- [ ] Notification message is clear and actionable
+- [ ] Decision is explained in non-technical terms
+- [ ] Next steps are specific and time-bound
+- [ ] Recipients are appropriate for decision type
+
+---
+
+## Compliance and Audit
+
+### Audit Trail
+
+- [ ] Decision date and time recorded
+- [ ] Evaluator identified (user or agent)
+- [ ] All evidence sources cited
+- [ ] Decision criteria documented
+- [ ] Rationale clearly explained
+
+### Traceability
+
+- [ ] Gate decision traceable to story/epic/release
+- [ ] Evidence traceable to specific test runs
+- [ ] Assessments traceable to workflows that created them
+- [ ] Waiver traceable to approver (if applicable)
+
+### Compliance
+
+- [ ] Security requirements validated (no unresolved vulnerabilities)
+- [ ] Quality standards met or waived with justification
+- [ ] Regulatory requirements addressed (if applicable)
+- [ ] Documentation sufficient for external audit
+
+---
+
+## Edge Cases and Exceptions
+
+### Missing Evidence
+
+- [ ] If test-design.md missing, decision still possible with test results + trace
+- [ ] If traceability-matrix.md missing, decision still possible with test results
+- [ ] If nfr-assessment.md missing, NFR validation marked as NOT ASSESSED
+- [ ] If code coverage missing, coverage criterion marked as NOT ASSESSED
+- [ ] User acknowledged gaps in evidence or provided alternative proof
+
+### Stale Evidence
+
+- [ ] Evidence freshness checked (if validate_evidence_freshness: true)
+- [ ] Warnings issued for assessments >7 days old
+- [ ] User acknowledged stale evidence or re-ran workflows
+- [ ] Decision document notes any stale evidence used
+
+### Conflicting Evidence
+
+- [ ] Conflicts between test results and assessments resolved
+- [ ] Most recent/authoritative source identified
+- [ ] Conflict resolution documented in decision rationale
+- [ ] User consulted if conflict cannot be resolved
+
+### Waiver Scenarios
+
+- [ ] Waiver only used for FAIL decision (not PASS or CONCERNS)
+- [ ] Waiver has business justification (not technical convenience)
+- [ ] Waiver has named approver with authority (VP/CTO/PO)
+- [ ] Waiver has expiry date (does NOT apply to future releases)
+- [ ] Waiver has remediation plan with concrete due date
+- [ ] Security vulnerabilities are NOT waived (enforced)
+
+---
+
+## Final Validation
+
+### Document Review
+
+- [ ] Gate decision document reviewed for accuracy
+- [ ] Gate YAML reviewed for correctness
+- [ ] Notification message reviewed for clarity
+- [ ] Status file update reviewed for format
+
+### Stakeholder Communication
+
+- [ ] Decision communicated to PM (if applicable)
+- [ ] Decision communicated to SM (if applicable)
+- [ ] Decision communicated to DEV lead (if applicable)
+- [ ] Decision communicated to stakeholders (if notify_stakeholders: true)
+
+### Next Steps Identified
+
+- [ ] **For PASS**: Deployment steps documented, monitoring plan identified
+- [ ] **For CONCERNS**: Monitoring plan documented, remediation backlog created
+- [ ] **For FAIL**: Blockers documented, fix assignments confirmed, re-gate planned
+- [ ] **For WAIVED**: Business approval confirmed, monitoring escalated, remediation scheduled
+
+### Workflow Complete
+
+- [ ] All checklist items completed
+- [ ] All outputs validated and saved
+- [ ] All stakeholders notified
+- [ ] Gate decision is final and documented
+- [ ] Ready to proceed to next phase (deploy, fix, or escalate)
+
+---
+
+## Notes
+
+Record any issues, deviations, or important observations during workflow execution:
+
+- **Evidence Issues**: [Note any missing, stale, or conflicting evidence]
+- **Decision Rationale**: [Document any nuanced reasoning or edge cases]
+- **Waiver Details**: [Document waiver negotiations or approvals]
+- **Follow-up Actions**: [List any actions required after gate decision]
--- a/src/modules/bmm/workflows/testarch/gate/gate-template.md
+++ b/src/modules/bmm/workflows/testarch/gate/gate-template.md
@@ -0,0 +1,445 @@
+# Gate Decision: {target_id} ({feature_name})
+
+**Decision:** {PASS | CONCERNS | FAIL | WAIVED}
+**Date:** {YYYY-MM-DD}
+**Evaluator:** {user_name or TEA Agent}
+**Gate Type:** {story | epic | release | hotfix}
+
+---
+
+## Story/Epic/Release Information
+
+- **ID**: {story_id | epic_num | release_version}
+- **Title**: {feature_name}
+- **Description**: {brief_description}
+- **Links**:
+  - Story/Epic File: `{file_path}`
+  - Test Design: `{test_design_file_path}` (if available)
+  - Traceability Matrix: `{trace_file_path}` (if available)
+  - NFR Assessment: `{nfr_file_path}` (if available)
+
+---
+
+## Evidence Summary
+
+### Test Results
+
+- **Total Tests**: {total_count}
+- **Passed**: {passed_count} ({pass_percentage}%)
+- **Failed**: {failed_count} ({fail_percentage}%)
+- **Skipped**: {skipped_count} ({skip_percentage}%)
+- **Duration**: {total_duration}
+
+**Priority Breakdown:**
+
+- **P0 Tests**: {p0_passed}/{p0_total} passed ({p0_pass_rate}%) {✅ | ❌}
+- **P1 Tests**: {p1_passed}/{p1_total} passed ({p1_pass_rate}%) {✅ | ⚠️ | ❌}
+- **P2 Tests**: {p2_passed}/{p2_total} passed ({p2_pass_rate}%) {informational}
+- **P3 Tests**: {p3_passed}/{p3_total} passed ({p3_pass_rate}%) {informational}
+
+**Overall Pass Rate**: {overall_pass_rate}% {✅ | ⚠️ | ❌}
+
+**Test Results Source**: {CI_run_id | test_report_url | local_run}
+
+---
+
+### Coverage Summary
+
+**Requirements Coverage:**
+
+- **P0 Acceptance Criteria**: {p0_covered}/{p0_total} covered ({p0_coverage}%) {✅ | ❌}
+- **P1 Acceptance Criteria**: {p1_covered}/{p1_total} covered ({p1_coverage}%) {✅ | ⚠️ | ❌}
+- **P2 Acceptance Criteria**: {p2_covered}/{p2_total} covered ({p2_coverage}%) {informational}
+- **Overall Coverage**: {overall_coverage}%
+
+**Code Coverage** (if available):
+
+- **Line Coverage**: {line_coverage}% {✅ | ⚠️ | ❌}
+- **Branch Coverage**: {branch_coverage}% {✅ | ⚠️ | ❌}
+- **Function Coverage**: {function_coverage}% {✅ | ⚠️ | ❌}
+
+**Coverage Gaps**: {gap_count} gaps identified
+
+- {list_of_critical_gaps}
+
+**Coverage Source**: {coverage_report_url | coverage_file_path}
+
+---
+
+### Non-Functional Requirements (NFRs)
+
+**Security**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- Security Issues: {security_issue_count}
+- {details_if_issues}
+
+**Performance**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {performance_metrics_summary}
+
+**Reliability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {reliability_metrics_summary}
+
+**Maintainability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {maintainability_metrics_summary}
+
+**NFR Source**: {nfr_assessment_file_path | not_assessed}
+
+---
+
+### Flakiness Validation
+
+**Burn-in Results** (if available):
+
+- **Burn-in Iterations**: {iteration_count} (e.g., 10)
+- **Flaky Tests Detected**: {flaky_test_count} {✅ if 0 | ❌ if >0}
+- **Stability Score**: {stability_percentage}%
+
+**Flaky Tests List** (if any):
+
+- {flaky_test_1_name} - {failure_rate}
+- {flaky_test_2_name} - {failure_rate}
+- {flaky_test_3_name} - {failure_rate}
+
+**Burn-in Source**: {CI_burn_in_run_id | not_available}
+
+---
+
+## Decision Criteria Evaluation
+
+### P0 Criteria (Must ALL Pass)
+
+| Criterion             | Threshold | Actual                    | Status   |
+| --------------------- | --------- | ------------------------- | -------- | -------- |
+| P0 Test Pass Rate     | 100%      | {p0_pass_rate}%           | {✅ PASS | ❌ FAIL} |
+| P0 Criteria Coverage  | 100%      | {p0_coverage}%            | {✅ PASS | ❌ FAIL} |
+| Security Issues       | 0         | {security_issue_count}    | {✅ PASS | ❌ FAIL} |
+| Critical NFR Failures | 0         | {critical_nfr_fail_count} | {✅ PASS | ❌ FAIL} |
+| Flaky Tests           | 0         | {flaky_test_count}        | {✅ PASS | ❌ FAIL} |
+
+**P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
+
+---
+
+### P1 Criteria (Required for PASS, May Accept for CONCERNS)
+
+| Criterion              | Threshold                 | Actual               | Status   |
+| ---------------------- | ------------------------- | -------------------- | -------- | ----------- | -------- |
+| P1 Test Pass Rate      | ≥{min_p1_pass_rate}%      | {p1_pass_rate}%      | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| P1 Criteria Coverage   | ≥95%                      | {p1_coverage}%       | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Overall Test Pass Rate | ≥{min_overall_pass_rate}% | {overall_pass_rate}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Code Coverage          | ≥{min_coverage}%          | {code_coverage}%     | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+
+**P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
+
+---
+
+### P2/P3 Criteria (Informational, Don't Block)
+
+| Criterion         | Actual          | Notes                                                        |
+| ----------------- | --------------- | ------------------------------------------------------------ |
+| P2 Test Pass Rate | {p2_pass_rate}% | {allow_p2_failures ? "Tracked, doesn't block" : "Evaluated"} |
+| P3 Test Pass Rate | {p3_pass_rate}% | {allow_p3_failures ? "Tracked, doesn't block" : "Evaluated"} |
+
+---
+
+## Rationale
+
+{Explain decision based on criteria evaluation}
+
+{Highlight key evidence that drove decision}
+
+{Note any assumptions or caveats}
+
+**Example (PASS):**
+
+> All P0 criteria met with 100% pass rates across critical tests. All P1 criteria exceeded thresholds with 98% overall pass rate and 87% code coverage. No security issues detected. No flaky tests in burn-in validation. Feature is ready for production deployment with standard monitoring.
+
+**Example (CONCERNS):**
+
+> All P0 criteria met, ensuring critical user journeys are protected. However, P1 pass rate (89%) falls below threshold (95%) due to edge cases in international currency handling. Code coverage (78%) is slightly below target (80%) due to missing tests for admin override flow. Issues are non-critical and have acceptable workarounds. Risk is low enough to deploy with enhanced monitoring.
+
+**Example (FAIL):**
+
+> CRITICAL BLOCKERS DETECTED:
+>
+> 1. P0 test failures (80% pass rate) in core search functionality prevent safe deployment
+> 2. Unresolved SQL injection vulnerability in search filter poses CRITICAL security risk
+> 3. Code coverage (68%) significantly below minimum threshold (80%)
+>
+> Release MUST BE BLOCKED until P0 issues are resolved. Security vulnerability cannot be waived.
+
+**Example (WAIVED):**
+
+> Original decision was FAIL due to P0 test failure in legacy Excel 2007 export module (affects <1% of users). However, release contains critical GDPR compliance features required by regulatory deadline (Oct 15). Business has approved waiver given:
+>
+> - Regulatory priority overrides legacy module risk
+> - Workaround available (use Excel 2010+)
+> - Issue will be fixed in v2.4.1 hotfix (due Oct 20)
+> - Enhanced monitoring in place
+
+---
+
+## {Section: Delete if not applicable}
+
+### Residual Risks (For CONCERNS or WAIVED)
+
+List unresolved P1/P2 issues that don't block release but should be tracked:
+
+1. **{Risk Description}**
+   - **Priority**: P1 | P2
+   - **Probability**: Low | Medium | High
+   - **Impact**: Low | Medium | High
+   - **Risk Score**: {probability × impact}
+   - **Mitigation**: {workaround or monitoring plan}
+   - **Remediation**: {fix in next sprint/release}
+
+2. **{Risk Description}**
+   - **Priority**: P1 | P2
+   - **Probability**: Low | Medium | High
+   - **Impact**: Low | Medium | High
+   - **Risk Score**: {probability × impact}
+   - **Mitigation**: {workaround or monitoring plan}
+   - **Remediation**: {fix in next sprint/release}
+
+**Overall Residual Risk**: {LOW | MEDIUM | HIGH}
+
+---
+
+### Waiver Details (For WAIVED only)
+
+**Original Decision**: ❌ FAIL
+
+**Reason for Failure**:
+
+- {list_of_blocking_issues}
+
+**Waiver Information**:
+
+- **Waiver Reason**: {business_justification}
+- **Waiver Approver**: {name}, {role} (e.g., Jane Doe, VP Engineering)
+- **Approval Date**: {YYYY-MM-DD}
+- **Waiver Expiry**: {YYYY-MM-DD} (**NOTE**: Does NOT apply to next release)
+
+**Monitoring Plan**:
+
+- {enhanced_monitoring_1}
+- {enhanced_monitoring_2}
+- {escalation_criteria}
+
+**Remediation Plan**:
+
+- **Fix Target**: {next_release_version} (e.g., v2.4.1 hotfix)
+- **Due Date**: {YYYY-MM-DD}
+- **Owner**: {team_or_person}
+- **Verification**: {how_fix_will_be_verified}
+
+**Business Justification**:
+{detailed_explanation_of_why_waiver_is_acceptable}
+
+---
+
+### Critical Issues (For FAIL or CONCERNS)
+
+Top blockers requiring immediate attention:
+
+| Priority | Issue         | Description         | Owner        | Due Date     | Status             |
+| -------- | ------------- | ------------------- | ------------ | ------------ | ------------------ |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P1       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P1       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+
+**Blocking Issues Count**: {p0_blocker_count} P0 blockers, {p1_blocker_count} P1 issues
+
+---
+
+## Recommendations
+
+### For PASS Decision
+
+1. **Proceed to deployment**
+   - Deploy to staging environment
+   - Validate with smoke tests
+   - Monitor key metrics for 24-48 hours
+   - Deploy to production with standard monitoring
+
+2. **Post-Deployment Monitoring**
+   - {metric_1_to_monitor}
+   - {metric_2_to_monitor}
+   - {alert_thresholds}
+
+3. **Success Criteria**
+   - {success_criterion_1}
+   - {success_criterion_2}
+
+---
+
+### For CONCERNS Decision
+
+1. **Deploy with Enhanced Monitoring**
+   - Deploy to staging with extended validation period
+   - Enable enhanced logging/monitoring for known risk areas:
+     - {risk_area_1}
+     - {risk_area_2}
+   - Set aggressive alerts for potential issues
+   - Deploy to production with caution
+
+2. **Create Remediation Backlog**
+   - Create story: "{fix_title_1}" (Priority: {priority})
+   - Create story: "{fix_title_2}" (Priority: {priority})
+   - Target sprint: {next_sprint}
+
+3. **Post-Deployment Actions**
+   - Monitor {specific_areas} closely for {time_period}
+   - Weekly status updates on remediation progress
+   - Re-assess after fixes deployed
+
+---
+
+### For FAIL Decision
+
+1. **Block Deployment Immediately**
+   - Do NOT deploy to any environment
+   - Notify stakeholders of blocking issues
+   - Escalate to tech lead and PM
+
+2. **Fix Critical Issues**
+   - Address P0 blockers listed in Critical Issues section
+   - Owner assignments confirmed
+   - Due dates agreed upon
+   - Daily standup on blocker resolution
+
+3. **Re-Run Gate After Fixes**
+   - Re-run full test suite after fixes
+   - Re-run affected quality workflows:
+     - `bmad tea *trace` (if coverage was issue)
+     - `bmad tea *nfr-assess` (if NFRs were issue)
+   - Re-run gate workflow: `bmad tea *gate`
+   - Verify decision is PASS before deploying
+
+---
+
+### For WAIVED Decision
+
+1. **Deploy with Business Approval**
+   - Confirm waiver approver has signed off
+   - Document waiver in release notes
+   - Notify all stakeholders of waived risks
+
+2. **Aggressive Monitoring**
+   - {enhanced_monitoring_plan}
+   - {escalation_procedures}
+   - Daily checks on waived risk areas
+
+3. **Mandatory Remediation**
+   - Fix MUST be completed by {due_date}
+   - Issue CANNOT be waived in next release
+   - Track remediation progress weekly
+   - Verify fix in next gate
+
+---
+
+## Next Steps
+
+**Immediate Actions** (next 24-48 hours):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Follow-up Actions** (next sprint/release):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Stakeholder Communication**:
+
+- Notify PM: {decision_summary}
+- Notify SM: {decision_summary}
+- Notify DEV lead: {decision_summary}
+- Notify stakeholders: {decision_summary}
+
+---
+
+## Gate Decision YAML (CI/CD Integration)
+
+```yaml
+gate_decision:
+  target: '{target_id}'
+  type: '{story | epic | release | hotfix}'
+  decision: '{PASS | CONCERNS | FAIL | WAIVED}'
+  date: '{YYYY-MM-DD}'
+  evaluator: '{user_name or TEA Agent}'
+
+  criteria:
+    p0_pass_rate: { p0_pass_rate }
+    p1_pass_rate: { p1_pass_rate }
+    overall_pass_rate: { overall_pass_rate }
+    code_coverage: { code_coverage }
+    security_issues: { security_issue_count }
+    critical_nfrs_fail: { critical_nfr_fail_count }
+    flaky_tests: { flaky_test_count }
+
+  thresholds:
+    min_p0_pass_rate: 100
+    min_p1_pass_rate: { min_p1_pass_rate }
+    min_overall_pass_rate: { min_overall_pass_rate }
+    min_coverage: { min_coverage }
+
+  evidence:
+    test_results: '{CI_run_id | test_report_url}'
+    traceability: '{trace_file_path}'
+    nfr_assessment: '{nfr_file_path}'
+    code_coverage: '{coverage_report_url}'
+    burn_in: '{burn_in_run_id}'
+
+  next_steps: '{brief_summary_of_recommendations}'
+
+  waiver: # Only if WAIVED
+    reason: '{business_justification}'
+    approver: '{name}, {role}'
+    expiry: '{YYYY-MM-DD}'
+    remediation_due: '{YYYY-MM-DD}'
+```
+
+---
+
+## Audit Trail
+
+**Created**: {YYYY-MM-DD HH:MM:SS}
+**Modified**: {YYYY-MM-DD HH:MM:SS} (if updated)
+**Version**: 1.0
+**Document ID**: gate-decision-{target_id}-{YYYYMMDD}
+**Workflow**: testarch-gate v4.0
+
+---
+
+## Appendices
+
+### Evidence Files Referenced
+
+- Story/Epic: `{file_path}`
+- Test Design: `{test_design_file_path}`
+- Traceability Matrix: `{trace_file_path}`
+- NFR Assessment: `{nfr_file_path}`
+- Test Results: `{test_results_path}`
+- Code Coverage: `{coverage_report_path}`
+- Burn-in Results: `{burn_in_results_path}`
+
+### Knowledge Base Fragments Consulted
+
+- `risk-governance.md` - Risk-based quality gate criteria
+- `probability-impact.md` - Risk scoring framework
+- `test-quality.md` - Definition of Done for tests
+- `test-priorities.md` - P0/P1/P2/P3 priority framework
+- `ci-burn-in.md` - Flakiness detection patterns
+
+### Related Documents
+
+- PRD: `{prd_file_path}` (if applicable)
+- Tech Spec: `{tech_spec_file_path}` (if applicable)
+- Architecture: `{architecture_file_path}` (if applicable)
--- a/src/modules/bmm/workflows/testarch/gate/instructions.md
+++ b/src/modules/bmm/workflows/testarch/gate/instructions.md
@@ -1,39 +1,494 @@
-<!-- Powered by BMAD-CORE™ -->
+# Quality Gate Decision - Instructions v4.0

-# Quality Gate v3.0
+**Workflow:** `testarch-gate`
+**Purpose:** Make deterministic quality gate decision (PASS/CONCERNS/FAIL/WAIVED) for story/epic/release based on test results, risk assessment, and non-functional validation
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)

-```xml
-<task id="bmad/bmm/testarch/gate" name="Quality Gate">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Latest assessments (risk/test design, trace, automation, NFR) are available.</i>
-    <i>- Team has consensus on fixes/mitigations.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Gather required assessments and confirm consensus; halt if information is stale or missing.</action>
-    </step>
-    <step n="2" title="Determine Gate Decision">
-      <action>Assemble story metadata (id, title, links) for the gate file.</action>
-      <action>Apply deterministic rules: PASS (all critical issues resolved), CONCERNS (minor residual risk), FAIL (critical blockers), WAIVED (business-approved waiver).</action>
-      <action>Document rationale, residual risks, owners, due dates, and waiver details where applicable.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Update gate YAML with schema fields (story info, status, rationale, waiver, top issues, risk summary, recommendations, NFR validation, history).</action>
-      <action>Provide summary message for the team highlighting decision and next steps.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If reviews are incomplete or risk data is outdated, halt and request the necessary reruns.</i>
-  </halt>
-  <notes>
-    <i>Pull the risk-governance, probability-impact, and test-quality fragments via `{project-root}/bmad/bmm/testarch/tea-index.csv` before issuing a gate decision.</i>
-    <i>FAIL whenever unresolved P0 risks/tests or security issues remain.</i>
-    <i>CONCERNS when mitigations are planned but residual risk exists; WAIVED requires reason, approver, and expiry.</i>
-    <i>Maintain audit trail in the history section.</i>
-  </notes>
-  <output>
-    <i>Gate YAML entry and communication summary documenting the decision.</i>
-  </output>
-</task>
+---
+
+## Overview
+
+This workflow evaluates all quality evidence (test results, traceability, NFRs, risk assessment) and makes a deterministic gate decision following predefined rules. It ensures that releases meet quality standards and provides an audit trail for decision-making.
+
+**Key Capabilities:**
+
+- Deterministic decision rules (PASS/CONCERNS/FAIL/WAIVED)
+- Evidence-based validation (test results, coverage, NFRs, risks)
+- P0-P3 risk framework integration
+- Waiver management (business-approved exceptions)
+- Audit trail with history tracking
+- Stakeholder notification generation
+- Gate YAML output for CI/CD integration
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Test execution results (CI/CD pipeline, local test runs)
+- Story or epic being gated
+- Completed quality workflows (at minimum test-design OR trace)
+
+**Recommended:**
+
+- `test-design.md` - Risk assessment with P0-P3 prioritization
+- `traceability-matrix.md` - Requirements-to-tests coverage analysis
+- `nfr-assessment.md` - Non-functional requirements validation
+- Code coverage report
+- Burn-in test results (flakiness validation)
+
+**Halt Conditions:**
+
+- If critical assessments are missing AND user doesn't waive requirement, halt and request them
+- If assessments are stale (>7 days old) AND `validate_evidence_freshness: true`, warn user
+- If test results are unavailable, halt and request test execution
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `risk-governance.md` - Risk-based quality gate criteria
+   - `probability-impact.md` - Risk scoring framework
+   - `test-quality.md` - Definition of Done for tests
+   - `test-priorities.md` - P0/P1/P2/P3 priority framework
+   - `ci-burn-in.md` - Flakiness detection validation
+
+2. Read gate configuration from workflow variables:
+   - Gate type (story/epic/release/hotfix)
+   - Decision thresholds (pass rates, coverage minimums)
+   - Risk tolerance (allow P2/P3 failures, escalate P1)
+   - Waiver policy
+
+3. Identify gate target:
+   - Extract story ID, epic number, or release version
+   - Determine scope (single story vs full epic vs release)
+
+**Output:** Complete understanding of gate criteria and target scope
+
+---
+
+### Step 2: Gather Quality Evidence
+
+**Actions:**
+
+1. **Auto-discover assessment files** (if not explicitly provided):
+   - Search for `test-design-epic-{epic_num}.md` or `test-design-story-{story_id}.md`
+   - Search for `traceability-matrix-{story_id}.md` or `traceability-matrix-epic-{epic_num}.md`
+   - Search for `nfr-assessment-{story_id}.md` or `nfr-assessment-epic-{epic_num}.md`
+   - Search for story file: `story-{story_id}.md`
+
+2. **Validate evidence freshness** (if `validate_evidence_freshness: true`):
+   - Check file modification dates
+   - Warn if any assessment is >7 days old
+   - Recommend re-running stale workflows
+
+3. **Parse test execution results**:
+   - CI/CD pipeline results (GitHub Actions, GitLab CI, Jenkins)
+   - Test framework reports (Playwright HTML report, Jest JSON, JUnit XML)
+   - Extract metrics: total tests, passed, failed, skipped, duration
+   - Extract burn-in results: flaky test count, stability score
+
+4. **Parse quality assessments**:
+   - **test-design.md**: Extract P0/P1/P2/P3 scenarios, risk scores, mitigation status
+   - **traceability-matrix.md**: Extract coverage percentages, gaps, unmapped criteria
+   - **nfr-assessment.md**: Extract NFR status (PASS/CONCERNS/FAIL per category)
+
+5. **Parse code coverage** (if available):
+   - Line coverage, branch coverage, function coverage
+   - Coverage by file/directory
+   - Identify uncovered critical paths
+
+**Output:** Comprehensive evidence package with all quality metrics
+
+---
+
+### Step 3: Apply Decision Rules (Deterministic Mode)
+
+**Actions:**
+
+1. **Evaluate P0 criteria** (must ALL pass for gate to PASS):
+   - ✅ P0 test pass rate = 100%
+   - ✅ P0 acceptance criteria coverage = 100%
+   - ✅ No critical security issues (max_security_issues = 0)
+   - ✅ No critical NFR failures (max_critical_nfrs_fail = 0)
+   - ✅ No flaky tests in burn-in (if burn-in enabled)
+
+   **If ANY P0 criterion fails → Decision = FAIL**
+
+2. **Evaluate P1 criteria** (required for PASS, may be waived for CONCERNS):
+   - ✅ P1 test pass rate ≥ min_p1_pass_rate (default: 95%)
+   - ✅ P1 acceptance criteria coverage ≥ 95%
+   - ✅ Overall test pass rate ≥ min_overall_pass_rate (default: 90%)
+   - ✅ Code coverage ≥ min_coverage (default: 80%)
+
+   **If ANY P1 criterion fails → Decision = CONCERNS (may escalate to FAIL)**
+
+3. **Evaluate P2/P3 criteria** (informational, don't block):
+   - P2 failures tracked but don't affect gate decision (if allow_p2_failures: true)
+   - P3 failures tracked but don't affect gate decision (if allow_p3_failures: true)
+   - Document as residual risk
+
+4. **Determine final decision**:
+   - **PASS**: All P0 criteria met, all P1 criteria met, no critical blockers
+   - **CONCERNS**: All P0 criteria met, some P1 criteria missed, residual risk acceptable
+   - **FAIL**: Any P0 criterion missed, critical blockers present
+   - **WAIVED**: FAIL status with business-approved waiver (if allow_waivers: true)
+
+**Output:** Gate decision with deterministic justification
+
+---
+
+### Step 4: Document Decision and Evidence
+
+**Actions:**
+
+1. **Create gate decision document** using `gate-template.md`:
+   - **Story/Epic/Release Info**: ID, title, description, links
+   - **Decision**: PASS / CONCERNS / FAIL / WAIVED
+   - **Decision Date**: Timestamp of gate evaluation
+   - **Evaluator**: User or agent who made decision
+
+2. **Document evidence**:
+   - **Test Results Summary**:
+     - Total tests: X
+     - Passed: Y (Z%)
+     - Failed: N (M%)
+     - P0 pass rate: 100% ✅ / <100% ❌
+     - P1 pass rate: X% ✅ / <95% ⚠️
+   - **Coverage Summary**:
+     - P0 criteria: X/Y covered (Z%)
+     - P1 criteria: X/Y covered (Z%)
+     - Code coverage: X%
+   - **NFR Validation**:
+     - Security: PASS / CONCERNS / FAIL
+     - Performance: PASS / CONCERNS / FAIL
+     - Reliability: PASS / CONCERNS / FAIL
+     - Maintainability: PASS / CONCERNS / FAIL
+   - **Flakiness**:
+     - Burn-in iterations: 10
+     - Flaky tests detected: 0 ✅ / >0 ❌
+
+3. **Document rationale**:
+   - Explain decision based on criteria
+   - Highlight key evidence that drove decision
+   - Note any assumptions or caveats
+
+4. **Document residual risks** (if CONCERNS or WAIVED):
+   - List unresolved P1/P2 issues
+   - Estimate probability × impact
+   - Describe mitigations or workarounds
+
+5. **Document waivers** (if WAIVED):
+   - Waiver reason (business justification)
+   - Waiver approver (name, role)
+   - Waiver expiry date
+   - Remediation plan
+
+6. **List critical issues** (if FAIL or CONCERNS):
+   - Top 5-10 issues blocking gate
+   - Priority (P0/P1/P2)
+   - Owner
+   - Due date
+
+7. **Provide recommendations**:
+   - **For PASS**: Proceed to deployment, monitor post-release
+   - **For CONCERNS**: Deploy with monitoring, address issues in next sprint
+   - **For FAIL**: Block deployment, fix critical issues, re-run gate
+   - **For WAIVED**: Deploy with business approval, aggressive monitoring
+
+**Output:** Complete gate decision document ready for review
+
+---
+
+### Step 5: Update Status Tracking and Notify
+
+**Actions:**
+
+1. **Append to bmm-workflow-status.md** (if `append_to_history: true`):
+   - Add gate decision to history section
+   - Format: `[DATE] Gate Decision: DECISION - Story/Epic/Release {ID} - {brief rationale}`
+   - Example: `[2025-10-14] Gate Decision: PASS - Story 1.3 - All P0/P1 criteria met, 98% pass rate`
+
+2. **Generate stakeholder notification** (if `notify_stakeholders: true`):
+   - **Subject**: Gate Decision: DECISION - {Story/Epic/Release ID}
+   - **Body**: Summary of decision, key metrics, next steps
+   - **Recipients**: PM, SM, DEV lead, stakeholders
+
+3. **Generate gate YAML snippet** for CI/CD integration:
+
+```yaml
+gate_decision:
+  target: 'story-1.3'
+  decision: 'PASS' # or CONCERNS / FAIL / WAIVED
+  date: '2025-10-14'
+  evaluator: 'TEA Agent'
+  criteria:
+    p0_pass_rate: 100
+    p1_pass_rate: 98
+    overall_pass_rate: 96
+    code_coverage: 85
+    security_issues: 0
+    critical_nfrs_fail: 0
+    flaky_tests: 0
+  evidence:
+    test_results: 'CI Run #456'
+    traceability: 'traceability-matrix-1.3.md'
+    nfr_assessment: 'nfr-assessment-1.3.md'
+  next_steps: 'Deploy to staging, monitor metrics'
 ```
+
+4. **Save outputs**:
+   - Write gate decision document to `{output_file}`
+   - Write gate YAML to `{output_folder}/gate-decision-{target}.yaml`
+   - Update status file
+
+**Output:** Gate decision documented, tracked, and communicated
+
+---
+
+## Decision Matrix (Quick Reference)
+
+| Scenario          | P0 Pass Rate | P1 Pass Rate | Security Issues | Critical NFRs | Decision     | Action                 |
+| ----------------- | ------------ | ------------ | --------------- | ------------- | ------------ | ---------------------- |
+| Ideal             | 100%         | ≥95%         | 0               | 0             | **PASS**     | Deploy                 |
+| Minor issues      | 100%         | 90-94%       | 0               | 0             | **CONCERNS** | Deploy with monitoring |
+| P1 degradation    | 100%         | <90%         | 0               | 0             | **CONCERNS** | Fix in next sprint     |
+| P0 failure        | <100%        | any          | any             | any           | **FAIL**     | Block release          |
+| Security issue    | any          | any          | >0              | any           | **FAIL**     | Fix immediately        |
+| Critical NFR fail | any          | any          | any             | >0            | **FAIL**     | Remediate first        |
+| Business waiver   | <100%        | any          | any             | any           | **WAIVED**   | Deploy with approval   |
+
+---
+
+## Waiver Management
+
+**When to waive:**
+
+- Business-critical deadline (e.g., regulatory requirement, contractual obligation)
+- Issue is low-probability edge case with acceptable risk
+- Workaround exists for known issue
+- Fix is in progress but can be deployed post-release
+
+**Waiver requirements:**
+
+- Named approver (VP Engineering, CTO, Product Owner)
+- Business justification documented
+- Remediation plan with due date
+- Expiry date (waiver does NOT apply to future releases)
+- Monitoring plan for waived risk
+
+**Never waive:**
+
+- Security vulnerabilities
+- Data corruption risks
+- Critical user journey failures
+- Compliance violations
+
+---
+
+## Example Gate Decisions
+
+### Example 1: PASS Decision
+
+```markdown
+# Gate Decision: story-1.3 (User Authentication Flow)
+
+**Decision:** ✅ PASS
+**Date:** 2025-10-14
+**Evaluator:** TEA Agent
+
+## Evidence Summary
+
+- **P0 Tests:** 12/12 passed (100%) ✅
+- **P1 Tests:** 24/25 passed (96%) ✅
+- **Overall Pass Rate:** 98% ✅
+- **Code Coverage:** 87% ✅
+- **Security Issues:** 0 ✅
+- **Flaky Tests:** 0 ✅
+
+## Rationale
+
+All P0 criteria met. All P1 criteria exceeded thresholds. No critical issues detected. Feature is ready for production deployment.
+
+## Next Steps
+
+1. Deploy to staging environment
+2. Monitor authentication metrics for 24 hours
+3. Deploy to production if no issues
+```
+
+### Example 2: CONCERNS Decision
+
+```markdown
+# Gate Decision: epic-2 (Payment Processing)
+
+**Decision:** ⚠️ CONCERNS
+**Date:** 2025-10-14
+**Evaluator:** TEA Agent
+
+## Evidence Summary
+
+- **P0 Tests:** 28/28 passed (100%) ✅
+- **P1 Tests:** 42/47 passed (89%) ⚠️
+- **Overall Pass Rate:** 91% ✅
+- **Code Coverage:** 78% ⚠️
+- **Security Issues:** 0 ✅
+- **Flaky Tests:** 0 ✅
+
+## Rationale
+
+All P0 criteria met, but P1 pass rate (89%) below threshold (95%). Coverage (78%) slightly below target (80%). Issues are non-critical and can be addressed post-release.
+
+## Residual Risks
+
+1. **P1 Issue**: Edge case in refund flow for international currencies (low probability)
+2. **Coverage Gap**: Missing tests for admin cancel flow (workaround exists)
+
+## Next Steps
+
+1. Deploy with enhanced monitoring on refund flows
+2. Create backlog stories for P1 fixes
+3. Add missing tests in next sprint
+```
+
+### Example 3: FAIL Decision
+
+```markdown
+# Gate Decision: story-3.2 (Data Export)
+
+**Decision:** ❌ FAIL
+**Date:** 2025-10-14
+**Evaluator:** TEA Agent
+
+## Evidence Summary
+
+- **P0 Tests:** 8/10 passed (80%) ❌
+- **P1 Tests:** 18/22 passed (82%) ❌
+- **Security Issues:** 1 (SQL injection in export filter) ❌
+- **Code Coverage:** 65% ❌
+
+## Rationale
+
+**CRITICAL BLOCKERS:**
+
+1. P0 test failures in core export functionality
+2. Unresolved SQL injection vulnerability (CRITICAL security issue)
+3. Coverage below minimum threshold
+
+Release BLOCKED until critical issues are resolved.
+
+## Critical Issues
+
+| Priority | Issue                                 | Owner        | Due Date   |
+| -------- | ------------------------------------- | ------------ | ---------- |
+| P0       | Fix SQL injection in export filter    | Backend Team | 2025-10-16 |
+| P0       | Fix export pagination bug             | Backend Team | 2025-10-16 |
+| P0       | Fix export timeout for large datasets | Backend Team | 2025-10-17 |
+
+## Next Steps
+
+1. **Block deployment immediately**
+2. Fix P0 issues listed above
+3. Re-run full test suite
+4. Re-run gate workflow after fixes
+```
+
+### Example 4: WAIVED Decision
+
+```markdown
+# Gate Decision: release-v2.4.0
+
+**Decision:** 🔓 WAIVED
+**Date:** 2025-10-14
+**Evaluator:** TEA Agent
+
+## Original Decision: ❌ FAIL
+
+**Reason for failure:**
+
+- P0 test failure in legacy reporting module
+- Issue affects <1% of users (specific browser configuration)
+
+## Waiver Details
+
+- **Waiver Reason:** Regulatory deadline for GDPR compliance features (Oct 15)
+- **Waiver Approver:** Jane Doe, VP Engineering
+- **Waiver Expiry:** 2025-10-15 (does NOT apply to v2.4.1)
+- **Monitoring Plan:** Enhanced error tracking on reporting module
+- **Remediation Plan:** Fix in v2.4.1 hotfix (due Oct 20)
+
+## Business Justification
+
+Release contains critical GDPR compliance features required by regulatory deadline. Failed test affects legacy reporting module used by <1% of users in specific edge case (IE11 + Windows 7). Workaround available (use Chrome). Risk acceptable given regulatory priority.
+
+## Next Steps
+
+1. Deploy v2.4.0 with waiver
+2. Monitor error rates on reporting module
+3. Fix legacy module in v2.4.1 (Oct 20)
+4. Notify affected users of workaround
+```
+
+---
+
+## Integration with BMad Status File
+
+This workflow updates `bmm-workflow-status.md` with gate decisions for tracking:
+
+```markdown
+### Quality & Testing Progress (TEA Agent)
+
+**Gate Decisions:**
+
+- [2025-10-14] ✅ PASS - Story 1.3 (User Auth) - All criteria met
+- [2025-10-14] ⚠️ CONCERNS - Epic 2 (Payments) - P1 pass rate 89%
+- [2025-10-14] ❌ FAIL - Story 3.2 (Export) - Security issue blocking
+- [2025-10-15] 🔓 WAIVED - Release v2.4.0 - GDPR deadline waiver
+```
+
+---
+
+## Important Notes
+
+1. **Deterministic > Manual**: Use rule-based decisions to reduce bias and ensure consistency
+2. **Evidence Required**: Never make decisions without test results and assessments
+3. **P0 is Sacred**: P0 failures ALWAYS result in FAIL (no exceptions except waivers)
+4. **Waivers are Temporary**: Waiver does NOT apply to future releases - issue must be fixed
+5. **Security Never Waived**: Security vulnerabilities should never be waived
+6. **Transparency**: Document rationale clearly for audit trail
+7. **Freshness Matters**: Stale assessments (>7 days) should be re-run
+8. **Burn-in Counts**: Flaky tests detected in burn-in should block gate
+
+---
+
+## Troubleshooting
+
+**Problem: No test results found**
+
+- Check CI/CD pipeline for test execution
+- Verify test results path in workflow variables
+- Run tests locally and provide results
+
+**Problem: Assessments are stale**
+
+- Re-run `*test-design`, `*trace`, `*nfr-assess` workflows
+- Update evidence files before gate decision
+
+**Problem: Unclear decision (edge case)**
+
+- Escalate to manual review
+- Document assumptions and rationale
+- Consider waiver if business-critical
+
+**Problem: Waiver requested but not justified**
+
+- Require business justification from stakeholder
+- Ensure named approver is appropriate authority
+- Verify remediation plan exists with due date
--- a/src/modules/bmm/workflows/testarch/gate/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/gate/workflow.yaml
@@ -1,25 +1,94 @@
 # Test Architect workflow: gate
 name: testarch-gate
-description: "Record the quality gate decision for the story."
+description: "Quality gate decision for story/epic/release with deterministic PASS/CONCERNS/FAIL/WAIVED status"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/gate"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/gate-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Gate target
+  gate_type: "story" # story, epic, release, hotfix
+  story_id: "" # e.g., "1.3" for story mode
+  epic_num: "" # e.g., "1" for epic mode
+  release_version: "" # e.g., "v2.4.0" for release mode
+
+  # Gate decision configuration
+  decision_mode: "deterministic" # deterministic (rule-based) or manual (team decision)
+  allow_waivers: true # Allow business-approved waivers for FAIL → WAIVED
+  require_evidence: true # Require links to test results, reports, etc.
+
+  # Input sources (auto-discover if not provided)
+  story_file: "" # Path to story markdown
+  test_design_file: "" # Path to test-design.md (risk assessment)
+  trace_file: "" # Path to traceability-matrix.md (coverage)
+  nfr_file: "" # Path to nfr-assessment.md (non-functional validation)
+  test_results: "" # Path to test execution results (CI artifacts, reports)
+
+  # Decision criteria (thresholds)
+  min_p0_pass_rate: 100 # P0 tests must have 100% pass rate
+  min_p1_pass_rate: 95 # P1 tests threshold
+  min_overall_pass_rate: 90 # Overall test pass rate
+  min_coverage: 80 # Code/requirement coverage minimum
+  max_critical_nfrs_fail: 0 # No critical NFRs can fail
+  max_security_issues: 0 # No unresolved security issues
+
+  # Risk tolerance
+  allow_p2_failures: true # P2 failures don't block release
+  allow_p3_failures: true # P3 failures don't block release
+  escalate_p1_failures: true # P1 failures require escalation approval
+
+  # Output configuration
+  output_file: "{output_folder}/gate-decision-{gate_type}-{story_id}{epic_num}{release_version}.md"
+  append_to_history: true # Append to bmm-workflow-status.md gate history
+  notify_stakeholders: true # Generate notification message for team
+
+  # Advanced options
+  auto_load_knowledge: true # Load risk-governance, probability-impact, test-quality fragments
+  check_all_workflows_complete: true # Verify test-design, trace, nfr-assess are complete
+  validate_evidence_freshness: true # Warn if assessments are >7 days old
+  require_sign_off: false # Require named approver for gate decision
+
+# Output configuration
+default_output_file: "{output_folder}/gate-decision.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, assessments, test results
+  - write_file # Create gate decision document
+  - search_repo # Find related artifacts
+  - list_files # Discover assessments
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story or epic being gated (required)"
+  - test_design: "Risk assessment with P0-P3 prioritization (required)"
+  - trace: "Requirements-to-tests traceability matrix (required)"
+  - nfr_assess: "Non-functional requirements validation (recommended)"
+  - test_results: "CI/CD test execution results (required)"
+  - code_coverage: "Code coverage report (recommended)"

 tags:
  - qa
  - gate
  - test-architect
+  - release
+  - decision

 execution_hints:
-  interactive: false
-  autonomous: true
-  iterative: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: false # Gate decision is single-pass
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/nfr-assess/README.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/README.md
@@ -0,0 +1,469 @@
+# Non-Functional Requirements Assessment Workflow
+
+**Workflow ID:** `testarch-nfr`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *nfr-assess`
+
+---
+
+## Overview
+
+The **nfr-assess** workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Features:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate CI/CD-ready YAML snippets for quality gates
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## When to Use This Workflow
+
+Use `*nfr-assess` when you need to:
+
+- ✅ Validate non-functional requirements before release
+- ✅ Assess performance against defined thresholds
+- ✅ Verify security requirements are met
+- ✅ Validate reliability and error handling
+- ✅ Check maintainability standards (coverage, quality, documentation)
+- ✅ Generate NFR assessment reports for stakeholders
+- ✅ Create gate-ready metrics for CI/CD pipelines
+
+**Typical Timing:**
+
+- Before release (validate all NFRs)
+- Before PR merge (validate critical NFRs)
+- During sprint retrospectives (assess maintainability)
+- After performance testing (validate performance NFRs)
+- After security audit (validate security NFRs)
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- NFR targets are undefined and cannot be obtained → Halt and request definition
+- Implementation is not accessible for evaluation → Halt and request deployment
+
+---
+
+## Usage
+
+### Basic Usage (BMad Mode)
+
+```bash
+bmad tea *nfr-assess
+```
+
+The workflow will:
+
+1. Read tech-spec.md for NFR requirements
+2. Gather evidence from test results, metrics, logs
+3. Assess each NFR category against thresholds
+4. Generate NFR assessment report
+5. Save to `bmad/output/nfr-assessment.md`
+
+### Standalone Mode (No Tech Spec)
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security true \
+  --assess-reliability true \
+  --assess-maintainability true \
+  --performance-response-time-ms 500 \
+  --security-score-min 85
+```
+
+---
+
+## Workflow Steps
+
+1. **Load Context** - Read tech spec, PRD, knowledge base fragments
+2. **Identify NFRs** - Determine categories and thresholds
+3. **Gather Evidence** - Read test results, metrics, logs, CI results
+4. **Assess NFRs** - Apply deterministic PASS/CONCERNS/FAIL rules
+5. **Identify Actions** - Quick wins, recommended actions, monitoring hooks
+6. **Generate Deliverables** - NFR assessment report, gate YAML, evidence checklist
+
+---
+
+## Outputs
+
+### NFR Assessment Report (`nfr-assessment.md`)
+
+Comprehensive markdown file with:
+
+- Executive summary (overall status, critical issues)
+- Assessment by category (performance, security, reliability, maintainability)
+- Evidence for each NFR (test results, metrics, thresholds)
+- Status classification (PASS/CONCERNS/FAIL)
+- Quick wins section
+- Recommended actions section
+- Evidence gaps checklist
+
+### Gate YAML Snippet (Optional)
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  concerns: 1
+  blockers: false
+```
+
+### Evidence Checklist (Optional)
+
+- List of NFRs with missing or incomplete evidence
+- Owners for evidence collection
+- Suggested evidence sources
+- Deadlines for evidence collection
+
+---
+
+## NFR Categories
+
+### Performance
+
+**Criteria:** Response time, throughput, resource usage, scalability
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70%
+- Memory usage: < 80%
+
+**Evidence Sources:** Load test results, APM data, Lighthouse reports, Playwright traces
+
+---
+
+### Security
+
+**Criteria:** Authentication, authorization, data protection, vulnerability management
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- MFA enabled
+
+**Evidence Sources:** SAST results, DAST results, dependency scanning, pentest reports
+
+---
+
+### Reliability
+
+**Criteria:** Availability, error handling, fault tolerance, disaster recovery
+**Thresholds (Default):**
+
+- Uptime: >= 99.9%
+- Error rate: < 0.1%
+- MTTR: < 15 minutes
+- CI burn-in: 100 consecutive runs
+
+**Evidence Sources:** Uptime monitoring, error logs, CI burn-in results, chaos tests
+
+---
+
+### Maintainability
+
+**Criteria:** Code quality, test coverage, documentation, technical debt
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality: >= 85/100
+- Technical debt: < 5%
+- Documentation: >= 90%
+
+**Evidence Sources:** Coverage reports, static analysis, documentation audit, test review
+
+---
+
+## Assessment Rules
+
+### PASS ✅
+
+- Evidence exists AND meets or exceeds threshold
+- No concerns flagged in evidence
+- Quality is acceptable
+
+### CONCERNS ⚠️
+
+- Threshold is UNKNOWN (not defined)
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+
+### FAIL ❌
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # NFR categories to assess
+  assess_performance: true
+  assess_security: true
+  assess_reliability: true
+  assess_maintainability: true
+
+  # Custom NFR categories
+  custom_nfr_categories: '' # e.g., "accessibility,compliance"
+
+  # Evidence sources
+  test_results_dir: '{project-root}/test-results'
+  metrics_dir: '{project-root}/metrics'
+  logs_dir: '{project-root}/logs'
+  include_ci_results: true
+
+  # Thresholds
+  performance_response_time_ms: 500
+  performance_throughput_rps: 100
+  security_score_min: 85
+  reliability_uptime_pct: 99.9
+  maintainability_coverage_pct: 80
+
+  # Assessment configuration
+  use_deterministic_rules: true
+  never_guess_thresholds: true
+  require_evidence: true
+  suggest_monitoring: true
+
+  # Output configuration
+  output_file: '{output_folder}/nfr-assessment.md'
+  generate_gate_yaml: true
+  generate_evidence_checklist: true
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+- `nfr-criteria.md` - Non-functional requirements criteria
+- `ci-burn-in.md` - CI/CD burn-in patterns for reliability
+- `test-quality.md` - Test quality expectations (maintainability)
+- `playwright-config.md` - Performance configuration patterns
+
+---
+
+## Examples
+
+### Example 1: Full NFR Assessment Before Release
+
+```bash
+bmad tea *nfr-assess
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - Story 1.3
+
+**Overall Status:** PASS ✅ (No blockers)
+
+## Performance Assessment
+
+- Response Time p95: PASS ✅ (320ms < 500ms threshold)
+- Throughput: PASS ✅ (250 RPS > 100 RPS threshold)
+
+## Security Assessment
+
+- Authentication: PASS ✅ (MFA enforced)
+- Data Protection: PASS ✅ (AES-256 + TLS 1.3)
+
+## Reliability Assessment
+
+- Uptime: PASS ✅ (99.95% > 99.9% threshold)
+- Error Rate: PASS ✅ (0.05% < 0.1% threshold)
+
+## Maintainability Assessment
+
+- Test Coverage: PASS ✅ (87% > 80% threshold)
+- Code Quality: PASS ✅ (92/100 > 85/100 threshold)
+
+Gate Status: PASS ✅ - Ready for release
+```
+
+### Example 2: NFR Assessment with Concerns
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - User Authentication
+
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Recommendation:** HIGH - Enforce MFA for all new accounts
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA
+   - No code changes needed
+
+Gate Status: CONCERNS ⚠️ - Address HIGH priority issues before release
+```
+
+### Example 3: Performance-Only Assessment
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security false \
+  --assess-reliability false \
+  --assess-maintainability false
+```
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned in assessment report
+
+### "FAIL status blocks release"
+
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Integration with Other Workflows
+
+- **testarch-test-design** → `*nfr-assess` - Define NFR requirements, then assess
+- **testarch-framework** → `*nfr-assess` - Set up frameworks, then validate NFRs
+- **testarch-ci** → `*nfr-assess` - Configure CI, then assess reliability with burn-in
+- `*nfr-assess` → **testarch-gate** - Assess NFRs, then apply quality gates
+- `*nfr-assess` → **testarch-test-review** - Assess maintainability, then review tests
+
+---
+
+## Best Practices
+
+1. **Never Guess Thresholds**
+   - If threshold is unknown, mark as CONCERNS
+   - Recommend defining threshold in tech-spec.md
+   - Don't infer thresholds from similar features
+
+2. **Evidence-Based Assessment**
+   - Every assessment must be backed by evidence
+   - Mark NFRs without evidence as "NO EVIDENCE"
+   - Don't assume or infer - require explicit evidence
+
+3. **Deterministic Rules**
+   - Apply PASS/CONCERNS/FAIL consistently
+   - Document reasoning for each classification
+   - Use same rules across all NFR categories
+
+4. **Actionable Recommendations**
+   - Provide specific steps, not generic advice
+   - Include priority, effort estimate, owner suggestion
+   - Focus on quick wins first
+
+5. **Gate Integration**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Quality Gates
+
+| Status      | Criteria                     | Action                      |
+| ----------- | ---------------------------- | --------------------------- |
+| PASS ✅     | All NFRs have PASS status    | Ready for release           |
+| CONCERNS ⚠️ | Any NFR has CONCERNS status  | Address before next release |
+| FAIL ❌     | Critical NFR has FAIL status | Do not release - BLOCKER    |
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define NFR requirements and test plan
+- `bmad tea *framework` - Set up performance/security testing frameworks
+- `bmad tea *ci` - Configure CI/CD for NFR validation
+- `bmad tea *gate` - Apply quality gates using NFR assessment metrics
+- `bmad tea *test-review` - Review test quality (maintainability NFR)
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./nfr-report-template.md) - NFR assessment report template
+- [Knowledge Base](../../testarch/knowledge/) - NFR criteria and best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/checklist.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/checklist.md
@@ -0,0 +1,405 @@
+# Non-Functional Requirements Assessment - Validation Checklist
+
+**Workflow:** `testarch-nfr`
+**Purpose:** Ensure comprehensive and evidence-based NFR assessment with actionable recommendations
+
+---
+
+## Prerequisites Validation
+
+- [ ] Implementation is deployed and accessible for evaluation
+- [ ] Evidence sources are available (test results, metrics, logs, CI results)
+- [ ] NFR categories are determined (performance, security, reliability, maintainability, custom)
+- [ ] Evidence directories exist and are accessible (`test_results_dir`, `metrics_dir`, `logs_dir`)
+- [ ] Knowledge base is loaded (nfr-criteria, ci-burn-in, test-quality)
+
+---
+
+## Context Loading
+
+- [ ] Tech-spec.md loaded successfully (if available)
+- [ ] PRD.md loaded (if available)
+- [ ] Story file loaded (if applicable)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`:
+  - [ ] `nfr-criteria.md`
+  - [ ] `ci-burn-in.md`
+  - [ ] `test-quality.md`
+  - [ ] `playwright-config.md` (if using Playwright)
+
+---
+
+## NFR Categories and Thresholds
+
+### Performance
+
+- [ ] Response time threshold defined or marked as UNKNOWN
+- [ ] Throughput threshold defined or marked as UNKNOWN
+- [ ] Resource usage thresholds defined or marked as UNKNOWN
+- [ ] Scalability requirements defined or marked as UNKNOWN
+
+### Security
+
+- [ ] Authentication requirements defined or marked as UNKNOWN
+- [ ] Authorization requirements defined or marked as UNKNOWN
+- [ ] Data protection requirements defined or marked as UNKNOWN
+- [ ] Vulnerability management thresholds defined or marked as UNKNOWN
+- [ ] Compliance requirements identified (GDPR, HIPAA, PCI-DSS, etc.)
+
+### Reliability
+
+- [ ] Availability (uptime) threshold defined or marked as UNKNOWN
+- [ ] Error rate threshold defined or marked as UNKNOWN
+- [ ] MTTR (Mean Time To Recovery) threshold defined or marked as UNKNOWN
+- [ ] Fault tolerance requirements defined or marked as UNKNOWN
+- [ ] Disaster recovery requirements defined (RTO, RPO) or marked as UNKNOWN
+
+### Maintainability
+
+- [ ] Test coverage threshold defined or marked as UNKNOWN
+- [ ] Code quality threshold defined or marked as UNKNOWN
+- [ ] Technical debt threshold defined or marked as UNKNOWN
+- [ ] Documentation completeness threshold defined or marked as UNKNOWN
+
+### Custom NFR Categories (if applicable)
+
+- [ ] Custom NFR category 1: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 2: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 3: Thresholds defined or marked as UNKNOWN
+
+---
+
+## Evidence Gathering
+
+### Performance Evidence
+
+- [ ] Load test results collected (JMeter, k6, Gatling, etc.)
+- [ ] Application metrics collected (response times, throughput, resource usage)
+- [ ] APM data collected (New Relic, Datadog, Dynatrace, etc.)
+- [ ] Lighthouse reports collected (if web app)
+- [ ] Playwright performance traces collected (if applicable)
+
+### Security Evidence
+
+- [ ] SAST results collected (SonarQube, Checkmarx, Veracode, etc.)
+- [ ] DAST results collected (OWASP ZAP, Burp Suite, etc.)
+- [ ] Dependency scanning results collected (Snyk, Dependabot, npm audit)
+- [ ] Penetration test reports collected (if available)
+- [ ] Security audit logs collected
+- [ ] Compliance audit results collected (if applicable)
+
+### Reliability Evidence
+
+- [ ] Uptime monitoring data collected (Pingdom, UptimeRobot, StatusCake)
+- [ ] Error logs collected
+- [ ] Error rate metrics collected
+- [ ] CI burn-in results collected (stability over time)
+- [ ] Chaos engineering test results collected (if available)
+- [ ] Failover/recovery test results collected (if available)
+- [ ] Incident reports and postmortems collected (if applicable)
+
+### Maintainability Evidence
+
+- [ ] Code coverage reports collected (Istanbul, NYC, c8, JaCoCo)
+- [ ] Static analysis results collected (ESLint, SonarQube, CodeClimate)
+- [ ] Technical debt metrics collected
+- [ ] Documentation audit results collected
+- [ ] Test review report collected (from test-review workflow, if available)
+- [ ] Git metrics collected (code churn, commit frequency, etc.)
+
+---
+
+## NFR Assessment with Deterministic Rules
+
+### Performance Assessment
+
+- [ ] Response time assessed against threshold
+- [ ] Throughput assessed against threshold
+- [ ] Resource usage assessed against threshold
+- [ ] Scalability assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, metric name)
+
+### Security Assessment
+
+- [ ] Authentication strength assessed against requirements
+- [ ] Authorization controls assessed against requirements
+- [ ] Data protection assessed against requirements
+- [ ] Vulnerability management assessed against thresholds
+- [ ] Compliance assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, scan result)
+
+### Reliability Assessment
+
+- [ ] Availability (uptime) assessed against threshold
+- [ ] Error rate assessed against threshold
+- [ ] MTTR assessed against threshold
+- [ ] Fault tolerance assessed against requirements
+- [ ] Disaster recovery assessed against requirements (RTO, RPO)
+- [ ] CI burn-in assessed (stability over time)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, monitoring data)
+
+### Maintainability Assessment
+
+- [ ] Test coverage assessed against threshold
+- [ ] Code quality assessed against threshold
+- [ ] Technical debt assessed against threshold
+- [ ] Documentation completeness assessed against threshold
+- [ ] Test quality assessed (from test-review, if available)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, coverage report)
+
+### Custom NFR Assessment (if applicable)
+
+- [ ] Custom NFR 1 assessed against threshold with justification
+- [ ] Custom NFR 2 assessed against threshold with justification
+- [ ] Custom NFR 3 assessed against threshold with justification
+
+---
+
+## Status Classification Validation
+
+### PASS Criteria Verified
+
+- [ ] Evidence exists for PASS status
+- [ ] Evidence meets or exceeds threshold
+- [ ] No concerns flagged in evidence
+- [ ] Quality is acceptable
+
+### CONCERNS Criteria Verified
+
+- [ ] Threshold is UNKNOWN (documented) OR
+- [ ] Evidence is MISSING or INCOMPLETE (documented) OR
+- [ ] Evidence is close to threshold (within 10%, documented) OR
+- [ ] Evidence shows intermittent issues (documented)
+
+### FAIL Criteria Verified
+
+- [ ] Evidence exists BUT does not meet threshold (documented) OR
+- [ ] Critical evidence is MISSING (documented) OR
+- [ ] Evidence shows consistent failures (documented) OR
+- [ ] Quality is unacceptable (documented)
+
+### No Threshold Guessing
+
+- [ ] All thresholds are either defined or marked as UNKNOWN
+- [ ] No thresholds were guessed or inferred
+- [ ] All UNKNOWN thresholds result in CONCERNS status
+
+---
+
+## Quick Wins and Recommended Actions
+
+### Quick Wins Identified
+
+- [ ] Low-effort, high-impact improvements identified for CONCERNS/FAIL
+- [ ] Configuration changes (no code changes) identified
+- [ ] Optimization opportunities identified (caching, indexing, compression)
+- [ ] Monitoring additions identified (detect issues before failures)
+
+### Recommended Actions
+
+- [ ] Specific remediation steps provided (not generic advice)
+- [ ] Priority assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Estimated effort provided (hours, days)
+- [ ] Owner suggestions provided (dev, ops, security)
+
+### Monitoring Hooks
+
+- [ ] Performance monitoring suggested (APM, synthetic monitoring)
+- [ ] Error tracking suggested (Sentry, Rollbar, error logs)
+- [ ] Security monitoring suggested (intrusion detection, audit logs)
+- [ ] Alerting thresholds suggested (notify before breach)
+
+### Fail-Fast Mechanisms
+
+- [ ] Circuit breakers suggested for reliability
+- [ ] Rate limiting suggested for performance
+- [ ] Validation gates suggested for security
+- [ ] Smoke tests suggested for maintainability
+
+---
+
+## Deliverables Generated
+
+### NFR Assessment Report
+
+- [ ] File created at `{output_folder}/nfr-assessment.md`
+- [ ] Template from `nfr-report-template.md` used
+- [ ] Executive summary included (overall status, critical issues)
+- [ ] Assessment by category included (performance, security, reliability, maintainability)
+- [ ] Evidence for each NFR documented
+- [ ] Status classifications documented (PASS/CONCERNS/FAIL)
+- [ ] Findings summary included (PASS count, CONCERNS count, FAIL count)
+- [ ] Quick wins section included
+- [ ] Recommended actions section included
+- [ ] Evidence gaps checklist included
+
+### Gate YAML Snippet (if enabled)
+
+- [ ] YAML snippet generated
+- [ ] Date included
+- [ ] Categories status included (performance, security, reliability, maintainability)
+- [ ] Overall status included (PASS/CONCERNS/FAIL)
+- [ ] Issue counts included (critical, high, medium, concerns)
+- [ ] Blockers flag included (true/false)
+- [ ] Recommendations included
+
+### Evidence Checklist (if enabled)
+
+- [ ] All NFRs with MISSING or INCOMPLETE evidence listed
+- [ ] Owners assigned for evidence collection
+- [ ] Suggested evidence sources provided
+- [ ] Deadlines set for evidence collection
+
+### Updated Story File (if enabled and requested)
+
+- [ ] "NFR Assessment" section added to story markdown
+- [ ] Link to NFR assessment report included
+- [ ] Overall status and critical issues included
+- [ ] Gate status included
+
+---
+
+## Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All NFR categories assessed (none skipped)
+- [ ] All thresholds documented (defined or UNKNOWN)
+- [ ] All evidence sources documented (file paths, metric names)
+- [ ] Status classifications are deterministic and consistent
+- [ ] No false positives (status correctly assigned)
+- [ ] No false negatives (all issues identified)
+
+### Completeness Checks
+
+- [ ] All NFR categories covered (performance, security, reliability, maintainability, custom)
+- [ ] All evidence sources checked (test results, metrics, logs, CI results)
+- [ ] All status types used appropriately (PASS, CONCERNS, FAIL)
+- [ ] All NFRs with CONCERNS/FAIL have recommendations
+- [ ] All evidence gaps have owners and deadlines
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Remediation steps are clear and actionable
+- [ ] Priorities are assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Effort estimates are provided (hours, days)
+- [ ] Owners are suggested (dev, ops, security)
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- [ ] Tech spec loaded for NFR requirements and thresholds
+- [ ] Performance targets extracted
+- [ ] Security requirements extracted
+- [ ] Reliability SLAs extracted
+- [ ] Architectural decisions considered
+
+### With test-design.md
+
+- [ ] Test design loaded for NFR test plan
+- [ ] Test priorities referenced (P0/P1/P2/P3)
+- [ ] Assessment aligned with planned NFR validation
+
+### With PRD.md
+
+- [ ] PRD loaded for product-level NFR context
+- [ ] User experience goals considered
+- [ ] Unstated requirements checked
+- [ ] Product-level SLAs referenced
+
+---
+
+## Quality Gates Validation
+
+### Release Blocker (FAIL)
+
+- [ ] Critical NFR status checked (security, reliability)
+- [ ] Performance failures assessed for user impact
+- [ ] Release blocker flagged if critical NFR has FAIL status
+
+### PR Blocker (HIGH CONCERNS)
+
+- [ ] High-priority NFR status checked
+- [ ] Multiple CONCERNS assessed
+- [ ] PR blocker flagged if HIGH priority issues exist
+
+### Warning (CONCERNS)
+
+- [ ] Any NFR with CONCERNS status flagged
+- [ ] Missing or incomplete evidence documented
+- [ ] Warning issued to address before next release
+
+### Pass (PASS)
+
+- [ ] All NFRs have PASS status
+- [ ] No blockers or concerns exist
+- [ ] Ready for release confirmed
+
+---
+
+## Non-Prescriptive Validation
+
+- [ ] NFR categories adapted to team needs
+- [ ] Thresholds appropriate for project context
+- [ ] Assessment criteria customized as needed
+- [ ] Teams can extend with custom NFR categories
+- [ ] Integration with external tools supported (New Relic, Datadog, SonarQube, JIRA)
+
+---
+
+## Documentation and Communication
+
+- [ ] NFR assessment report is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Overall status is prominent and unambiguous
+- [ ] Executive summary provides quick understanding
+
+---
+
+## Final Validation
+
+- [ ] All prerequisites met
+- [ ] All NFR categories assessed with evidence (or gaps documented)
+- [ ] No thresholds were guessed (all defined or UNKNOWN)
+- [ ] Status classifications are deterministic and justified
+- [ ] Quick wins identified for all CONCERNS/FAIL
+- [ ] Recommended actions are specific and actionable
+- [ ] Evidence gaps documented with owners and deadlines
+- [ ] NFR assessment report generated and saved
+- [ ] Gate YAML snippet generated (if enabled)
+- [ ] Evidence checklist generated (if enabled)
+- [ ] Workflow completed successfully
+
+---
+
+## Sign-Off
+
+**NFR Assessment Status:**
+
+- [ ] ✅ PASS - All NFRs meet requirements, ready for release
+- [ ] ⚠️ CONCERNS - Some NFRs have concerns, address before next release
+- [ ] ❌ FAIL - Critical NFRs not met, BLOCKER for release
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Critical Issues:** {COUNT}
+**High Priority Issues:** {COUNT}
+**Concerns:** {COUNT}
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/instructions.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/instructions.md
@@ -1,39 +1,721 @@
-<!-- Powered by BMAD-CORE™ -->
+# Non-Functional Requirements Assessment - Instructions v4.0

-# NFR Assessment v3.0
+**Workflow:** `testarch-nfr`
+**Purpose:** Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)

-```xml
-<task id="bmad/bmm/testarch/nfr-assess" name="NFR Assessment">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Implementation is deployed locally or accessible for evaluation.</i>
-    <i>- Non-functional goals/SLAs are defined or discoverable.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm prerequisites; halt if targets are unknown and cannot be clarified.</action>
-    </step>
-    <step n="2" title="Assess NFRs">
-      <action>Identify which NFRs to assess (default: Security, Performance, Reliability, Maintainability).</action>
-      <action>Gather thresholds from story/architecture/technical preferences; mark unknown targets.</action>
-      <action>Inspect evidence (tests, telemetry, logs) for each NFR and classify status using deterministic PASS/CONCERNS/FAIL rules.</action>
-      <action>List quick wins and recommended actions for any concerns/failures.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Produce NFR assessment markdown summarizing evidence, status, and actions; update gate YAML block with NFR findings; compile checklist of evidence gaps and owners.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If NFR targets are undefined and cannot be obtained, halt and request definition.</i>
-  </halt>
-  <notes>
-    <i>Load the `nfr-criteria`, `ci-burn-in`, and relevant fragments via `{project-root}/bmad/bmm/testarch/tea-index.csv` to ground the assessment.</i>
-    <i>Unknown thresholds default to CONCERNS—never guess.</i>
-    <i>Ensure every NFR has evidence or call it out explicitly.</i>
-    <i>Suggest monitoring hooks and fail-fast mechanisms when gaps exist.</i>
-  </notes>
-  <output>
-    <i>NFR assessment report with actionable follow-ups and gate snippet.</i>
-  </output>
-</task>
+---
+
+## Overview
+
+This workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Capabilities:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate gate-ready YAML snippets for CI/CD integration
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- If NFR targets are undefined and cannot be obtained, halt and request definition
+- If implementation is not accessible for evaluation, halt and request deployment
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `nfr-criteria.md` - Non-functional requirements criteria and thresholds
+   - `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation
+   - `test-quality.md` - Test quality expectations (related to maintainability)
+   - `playwright-config.md` - Performance configuration patterns (if using Playwright)
+
+2. Read story file (if provided):
+   - Extract NFR requirements
+   - Identify specific thresholds or SLAs
+   - Note any custom NFR categories
+
+3. Read related BMad artifacts (if available):
+   - `tech-spec.md` - Technical NFR requirements and targets
+   - `PRD.md` - Product-level NFR context (user expectations)
+   - `test-design.md` - NFR test plan and priorities
+
+**Output:** Complete understanding of NFR targets, evidence sources, and validation criteria
+
+---
+
+### Step 2: Identify NFR Categories and Thresholds
+
+**Actions:**
+
+1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
+   - **Performance**: Response time, throughput, resource usage
+   - **Security**: Authentication, authorization, data protection, vulnerability scanning
+   - **Reliability**: Error handling, recovery, availability, fault tolerance
+   - **Maintainability**: Code quality, test coverage, documentation, technical debt
+
+2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
+
+3. Gather thresholds for each NFR:
+   - From tech-spec.md (primary source)
+   - From PRD.md (product-level SLAs)
+   - From story file (feature-specific requirements)
+   - From workflow variables (default thresholds)
+   - Mark thresholds as UNKNOWN if not defined
+
+4. Never guess thresholds - if a threshold is unknown, mark the NFR as CONCERNS
+
+**Output:** Complete list of NFRs to assess with defined (or UNKNOWN) thresholds
+
+---
+
+### Step 3: Gather Evidence
+
+**Actions:**
+
+1. For each NFR category, discover evidence sources:
+
+   **Performance Evidence:**
+   - Load test results (JMeter, k6, Lighthouse)
+   - Application metrics (response times, throughput, resource usage)
+   - Performance monitoring data (New Relic, Datadog, APM)
+   - Playwright performance traces (if applicable)
+
+   **Security Evidence:**
+   - Security scan results (SAST, DAST, dependency scanning)
+   - Authentication/authorization test results
+   - Penetration test reports
+   - Vulnerability assessment reports
+   - Compliance audit results
+
+   **Reliability Evidence:**
+   - Error logs and error rates
+   - Uptime monitoring data
+   - Chaos engineering test results
+   - Failover/recovery test results
+   - CI burn-in results (stability over time)
+
+   **Maintainability Evidence:**
+   - Code coverage reports (Istanbul, NYC, c8)
+   - Static analysis results (ESLint, SonarQube)
+   - Technical debt metrics
+   - Documentation completeness
+   - Test quality assessment (from test-review workflow)
+
+2. Read relevant files from evidence directories:
+   - `{test_results_dir}` for test execution results
+   - `{metrics_dir}` for application metrics
+   - `{logs_dir}` for application logs
+   - CI/CD pipeline results (if `include_ci_results` is true)
+
+3. Mark NFRs without evidence as "NO EVIDENCE" - never infer or assume
+
+**Output:** Comprehensive evidence inventory for each NFR
+
+---
+
+### Step 4: Assess NFRs with Deterministic Rules
+
+**Actions:**
+
+1. For each NFR, apply deterministic PASS/CONCERNS/FAIL rules:
+
+   **PASS Criteria:**
+   - Evidence exists AND meets defined threshold
+   - No concerns flagged in evidence
+   - Example: Response time is 350ms (threshold: 500ms) → PASS
+
+   **CONCERNS Criteria:**
+   - Threshold is UNKNOWN (not defined)
+   - Evidence is MISSING or INCOMPLETE
+   - Evidence is close to threshold (within 10%)
+   - Evidence shows intermittent issues
+   - Example: Response time is 480ms (threshold: 500ms, 96% of threshold) → CONCERNS
+
+   **FAIL Criteria:**
+   - Evidence exists BUT does not meet threshold
+   - Critical evidence is MISSING
+   - Evidence shows consistent failures
+   - Example: Response time is 750ms (threshold: 500ms) → FAIL
+
+2. Document findings for each NFR:
+   - Status (PASS/CONCERNS/FAIL)
+   - Evidence source (file path, test name, metric name)
+   - Actual value vs threshold
+   - Justification for status classification
+
+3. Classify severity based on category:
+   - **CRITICAL**: Security failures, reliability failures (affect users immediately)
+   - **HIGH**: Performance failures, maintainability failures (affect users soon)
+   - **MEDIUM**: Concerns without failures (may affect users eventually)
+   - **LOW**: Missing evidence for non-critical NFRs
+
+**Output:** Complete NFR assessment with deterministic status classifications
+
+---
+
+### Step 5: Identify Quick Wins and Recommended Actions
+
+**Actions:**
+
+1. For each NFR with CONCERNS or FAIL status, identify quick wins:
+   - Low-effort, high-impact improvements
+   - Configuration changes (no code changes needed)
+   - Optimization opportunities (caching, indexing, compression)
+   - Monitoring additions (detect issues before they become failures)
+
+2. Provide recommended actions for each issue:
+   - Specific steps to remediate (not generic advice)
+   - Priority (CRITICAL, HIGH, MEDIUM, LOW)
+   - Estimated effort (hours, days)
+   - Owner suggestion (dev, ops, security)
+
+3. Suggest monitoring hooks for gaps:
+   - Add performance monitoring (APM, synthetic monitoring)
+   - Add error tracking (Sentry, Rollbar, error logs)
+   - Add security monitoring (intrusion detection, audit logs)
+   - Add alerting thresholds (notify before thresholds are breached)
+
+4. Suggest fail-fast mechanisms:
+   - Add circuit breakers for reliability
+   - Add rate limiting for performance
+   - Add validation gates for security
+   - Add smoke tests for maintainability
+
+**Output:** Actionable remediation plan with prioritized recommendations
+
+---
+
+### Step 6: Generate Deliverables
+
+**Actions:**
+
+1. Create NFR assessment markdown file:
+   - Use template from `nfr-report-template.md`
+   - Include executive summary (overall status, critical issues)
+   - Add NFR-by-NFR assessment (status, evidence, thresholds)
+   - Add findings summary (PASS count, CONCERNS count, FAIL count)
+   - Add quick wins section
+   - Add recommended actions section
+   - Add evidence gaps checklist
+   - Save to `{output_folder}/nfr-assessment.md`
+
+2. Generate gate YAML snippet (if enabled):
+
+   ```yaml
+   nfr_assessment:
+     date: '2025-10-14'
+     categories:
+       performance: 'PASS'
+       security: 'CONCERNS'
+       reliability: 'PASS'
+       maintainability: 'PASS'
+     overall_status: 'CONCERNS'
+     critical_issues: 0
+     high_priority_issues: 1
+     concerns: 2
+     blockers: false
+   ```
+
+3. Generate evidence checklist (if enabled):
+   - List all NFRs with MISSING or INCOMPLETE evidence
+   - Assign owners for evidence collection
+   - Suggest evidence sources (tests, metrics, logs)
+   - Set deadlines for evidence collection
+
+4. Update story file (if enabled and requested):
+   - Add "NFR Assessment" section to story markdown
+   - Link to NFR assessment report
+   - Include overall status and critical issues
+   - Add gate status
+
+**Output:** Complete NFR assessment documentation ready for review and CI/CD integration
+
+---
+
+## Non-Prescriptive Approach
+
+**Minimal Examples:** This workflow provides principles and patterns, not rigid templates. Teams should adapt NFR categories, thresholds, and assessment criteria to their needs.
+
+**Key Patterns to Follow:**
+
+- Use evidence-based validation (no guessing or inference)
+- Apply deterministic rules (consistent PASS/CONCERNS/FAIL classification)
+- Never guess thresholds (mark as CONCERNS if unknown)
+- Provide actionable recommendations (specific steps, not generic advice)
+- Generate gate-ready artifacts (YAML snippets for CI/CD)
+
+**Extend as Needed:**
+
+- Add custom NFR categories (accessibility, internationalization, compliance)
+- Integrate with external tools (New Relic, Datadog, SonarQube, JIRA)
+- Add custom thresholds and rules
+- Link to external assessment systems
+
+---
+
+## NFR Categories and Criteria
+
+### Performance
+
+**Criteria:**
+
+- Response time (p50, p95, p99 percentiles)
+- Throughput (requests per second, transactions per second)
+- Resource usage (CPU, memory, disk, network)
+- Scalability (horizontal, vertical)
+
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70% average
+- Memory usage: < 80% max
+
+**Evidence Sources:**
+
+- Load test results (JMeter, k6, Gatling)
+- APM data (New Relic, Datadog, Dynatrace)
+- Lighthouse reports (for web apps)
+- Playwright performance traces
+
+---
+
+### Security
+
+**Criteria:**
+
+- Authentication (login security, session management)
+- Authorization (access control, permissions)
+- Data protection (encryption, PII handling)
+- Vulnerability management (SAST, DAST, dependency scanning)
+- Compliance (GDPR, HIPAA, PCI-DSS)
+
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- Authentication strength: MFA enabled
+
+**Evidence Sources:**
+
+- SAST results (SonarQube, Checkmarx, Veracode)
+- DAST results (OWASP ZAP, Burp Suite)
+- Dependency scanning (Snyk, Dependabot, npm audit)
+- Penetration test reports
+- Security audit logs
+
+---
+
+### Reliability
+
+**Criteria:**
+
+- Availability (uptime percentage)
+- Error handling (graceful degradation, error recovery)
+- Fault tolerance (redundancy, failover)
+- Disaster recovery (backup, restore, RTO/RPO)
+- Stability (CI burn-in, chaos engineering)
+
+**Thresholds (Default):**
+
+- Uptime: >= 99.9% (three nines)
+- Error rate: < 0.1% (1 in 1000 requests)
+- MTTR (Mean Time To Recovery): < 15 minutes
+- CI burn-in: 100 consecutive successful runs
+
+**Evidence Sources:**
+
+- Uptime monitoring (Pingdom, UptimeRobot, StatusCake)
+- Error logs and error rates
+- CI burn-in results (see `ci-burn-in.md`)
+- Chaos engineering test results (Chaos Monkey, Gremlin)
+- Incident reports and postmortems
+
+---
+
+### Maintainability
+
+**Criteria:**
+
+- Code quality (complexity, duplication, code smells)
+- Test coverage (unit, integration, E2E)
+- Documentation (code comments, README, architecture docs)
+- Technical debt (debt ratio, code churn)
+- Test quality (from test-review workflow)
+
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality score: >= 85/100
+- Technical debt ratio: < 5%
+- Documentation completeness: >= 90%
+
+**Evidence Sources:**
+
+- Coverage reports (Istanbul, NYC, c8, JaCoCo)
+- Static analysis (ESLint, SonarQube, CodeClimate)
+- Documentation audit (manual or automated)
+- Test review report (from test-review workflow)
+- Git metrics (code churn, commit frequency)
+
+---
+
+## Deterministic Assessment Rules
+
+### PASS Rules
+
+- Evidence exists
+- Evidence meets or exceeds threshold
+- No concerns flagged
+- Quality is acceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 350ms p95
+Status: PASS ✅
+```
+
+---
+
+### CONCERNS Rules
+
+- Threshold is UNKNOWN
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+- Quality is marginal
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 480ms p95 (96% of threshold)
+Status: CONCERNS ⚠️
+Recommendation: Optimize before production - very close to threshold
+```
+
+---
+
+### FAIL Rules
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 750ms p95 (150% of threshold)
+Status: FAIL ❌
+Recommendation: BLOCKER - optimize performance before release
+```
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- Primary source for NFR requirements and thresholds
+- Load performance targets, security requirements, reliability SLAs
+- Use architectural decisions to understand NFR trade-offs
+
+### With test-design.md
+
+- Understand NFR test plan and priorities
+- Reference test priorities (P0/P1/P2/P3) for severity classification
+- Align assessment with planned NFR validation
+
+### With PRD.md
+
+- Understand product-level NFR expectations
+- Verify NFRs align with user experience goals
+- Check for unstated NFR requirements (implied by product goals)
+
+---
+
+## Quality Gates
+
+### Release Blocker (FAIL)
+
+- Critical NFR has FAIL status (security, reliability)
+- Performance failure affects user experience severely
+- Do not release until FAIL is resolved
+
+### PR Blocker (HIGH CONCERNS)
+
+- High-priority NFR has FAIL status
+- Multiple CONCERNS exist
+- Block PR merge until addressed
+
+### Warning (CONCERNS)
+
+- Any NFR has CONCERNS status
+- Evidence is missing or incomplete
+- Address before next release
+
+### Pass (PASS)
+
+- All NFRs have PASS status
+- No blockers or concerns
+- Ready for release
+
+---
+
+## Example NFR Assessment
+
+````markdown
+# NFR Assessment - Story 1.3
+
+**Feature:** User Authentication
+**Date:** 2025-10-14
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Executive Summary
+
+**Assessment:** 3 PASS, 1 CONCERNS, 0 FAIL
+**Blockers:** None
+**High Priority Issues:** 1 (Security - MFA not enforced)
+**Recommendation:** Address security concern before release
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** PASS ✅
+- **Threshold:** 500ms
+- **Actual:** 320ms (64% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** Response time well below threshold across all percentiles
+
+### Throughput
+
+- **Status:** PASS ✅
+- **Threshold:** 100 RPS
+- **Actual:** 250 RPS (250% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** System handles 2.5x target load without degradation
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Findings:** MFA is implemented but not enforced by default
+- **Recommendation:** HIGH - Enforce MFA for all new accounts, provide migration path for existing users
+
+### Data Protection
+
+- **Status:** PASS ✅
+- **Threshold:** PII encrypted at rest and in transit
+- **Actual:** AES-256 at rest, TLS 1.3 in transit
+- **Evidence:** Security scan (security-scan-2025-10-14.json)
+- **Findings:** All PII properly encrypted
+
+## Reliability Assessment
+
+### Uptime
+
+- **Status:** PASS ✅
+- **Threshold:** 99.9% (three nines)
+- **Actual:** 99.95% over 30 days
+- **Evidence:** Uptime monitoring (uptime-report-2025-10-14.csv)
+- **Findings:** Exceeds target with margin
+
+### Error Rate
+
+- **Status:** PASS ✅
+- **Threshold:** < 0.1% (1 in 1000)
+- **Actual:** 0.05% (1 in 2000)
+- **Evidence:** Error logs (logs/errors-2025-10.log)
+- **Findings:** Error rate well below threshold
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** PASS ✅
+- **Threshold:** >= 80%
+- **Actual:** 87%
+- **Evidence:** Coverage report (coverage/lcov-report/index.html)
+- **Findings:** Coverage exceeds threshold with good distribution
+
+### Code Quality
+
+- **Status:** PASS ✅
+- **Threshold:** >= 85/100
+- **Actual:** 92/100
+- **Evidence:** SonarQube analysis (sonarqube-report-2025-10-14.pdf)
+- **Findings:** High code quality score with low technical debt
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA for new accounts
+   - No code changes needed, only config adjustment
+
+## Recommended Actions
+
+### Immediate (Before Release)
+
+1. **Enforce MFA for all new accounts** - HIGH - 4 hours - Security Team
+   - Add `ENFORCE_MFA=true` to production config
+   - Update user onboarding flow to require MFA setup
+   - Test MFA enforcement in staging environment
+
+### Short-term (Next Sprint)
+
+1. **Migrate existing users to MFA** - MEDIUM - 3 days - Product + Engineering
+   - Design migration UX (prompt, incentives, deadline)
+   - Implement migration flow with grace period
+   - Communicate migration to existing users
+
+## Evidence Gaps
+
+- [ ] Chaos engineering test results (reliability)
+  - Owner: DevOps Team
+  - Deadline: 2025-10-21
+  - Suggested evidence: Run chaos monkey tests in staging
+
+- [ ] Penetration test report (security)
+  - Owner: Security Team
+  - Deadline: 2025-10-28
+  - Suggested evidence: Schedule third-party pentest
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  story_id: '1.3'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  medium_priority_issues: 0
+  concerns: 1
+  blockers: false
+  recommendations:
+    - 'Enforce MFA for all new accounts (HIGH - 4 hours)'
+  evidence_gaps: 2
+```
+````
+
+## Recommendations Summary
+
+- **Release Blocker:** None ✅
+- **High Priority:** 1 (Enforce MFA before release)
+- **Medium Priority:** 1 (Migrate existing users to MFA)
+- **Next Steps:** Address HIGH priority item, then proceed to gate workflow
+
+```
+
+---
+
+## Validation Checklist
+
+Before completing this workflow, verify:
+
+- ✅ All NFR categories assessed (performance, security, reliability, maintainability, custom)
+- ✅ Thresholds defined or marked as UNKNOWN
+- ✅ Evidence gathered for each NFR (or marked as MISSING)
+- ✅ Status classified deterministically (PASS/CONCERNS/FAIL)
+- ✅ No thresholds were guessed (marked as CONCERNS if unknown)
+- ✅ Quick wins identified for CONCERNS/FAIL
+- ✅ Recommended actions are specific and actionable
+- ✅ Evidence gaps documented with owners and deadlines
+- ✅ NFR assessment report generated and saved
+- ✅ Gate YAML snippet generated (if enabled)
+- ✅ Evidence checklist generated (if enabled)
+
+---
+
+## Notes
+
+- **Never Guess Thresholds:** If a threshold is unknown, mark as CONCERNS and recommend defining it
+- **Evidence-Based:** Every assessment must be backed by evidence (tests, metrics, logs, CI results)
+- **Deterministic Rules:** Use consistent PASS/CONCERNS/FAIL classification based on evidence
+- **Actionable Recommendations:** Provide specific steps, not generic advice
+- **Gate Integration:** Generate YAML snippets that can be consumed by CI/CD pipelines
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned
+
+### "FAIL status blocks release"
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Related Workflows
+
+- **testarch-test-design** - Define NFR requirements and test plan
+- **testarch-framework** - Set up performance/security testing frameworks
+- **testarch-ci** - Configure CI/CD for NFR validation
+- **testarch-gate** - Use NFR assessment as input for quality gate decisions
+- **testarch-test-review** - Review test quality (maintainability NFR)
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
 ```
--- a/src/modules/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
@@ -0,0 +1,443 @@
+# NFR Assessment - {FEATURE_NAME}
+
+**Date:** {DATE}
+**Story:** {STORY_ID} (if applicable)
+**Overall Status:** {OVERALL_STATUS} {STATUS_ICON}
+
+---
+
+## Executive Summary
+
+**Assessment:** {PASS_COUNT} PASS, {CONCERNS_COUNT} CONCERNS, {FAIL_COUNT} FAIL
+
+**Blockers:** {BLOCKER_COUNT} {BLOCKER_DESCRIPTION}
+
+**High Priority Issues:** {HIGH_PRIORITY_COUNT} {HIGH_PRIORITY_DESCRIPTION}
+
+**Recommendation:** {OVERALL_RECOMMENDATION}
+
+---
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Throughput
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Resource Usage
+
+- **CPU Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **Memory Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+### Scalability
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+- **Recommendation:** {RECOMMENDATION} (if CONCERNS or FAIL)
+
+### Authorization Controls
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Data Protection
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Vulnerability Management
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION} (e.g., "0 critical, <3 high vulnerabilities")
+- **Actual:** {ACTUAL_DESCRIPTION} (e.g., "0 critical, 1 high, 5 medium vulnerabilities")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Snyk scan results - scan-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Compliance (if applicable)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Standards:** {COMPLIANCE_STANDARDS} (e.g., "GDPR, HIPAA, PCI-DSS")
+- **Actual:** {ACTUAL_COMPLIANCE_STATUS}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Reliability Assessment
+
+### Availability (Uptime)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "99.9%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "99.95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Uptime monitoring - uptime-report-2025-10-14.csv")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Error Rate
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<0.1%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "0.05%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Error logs - logs/errors-2025-10.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### MTTR (Mean Time To Recovery)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<15 minutes")
+- **Actual:** {ACTUAL_VALUE} (e.g., "12 minutes")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Incident reports - incidents/")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Fault Tolerance
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### CI Burn-In (Stability)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "100 consecutive successful runs")
+- **Actual:** {ACTUAL_VALUE} (e.g., "150 consecutive successful runs")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CI burn-in results - ci-burn-in-2025-10-14.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Disaster Recovery (if applicable)
+
+- **RTO (Recovery Time Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **RPO (Recovery Point Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+---
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=80%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "87%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Coverage report - coverage/lcov-report/index.html")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Code Quality
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=85/100")
+- **Actual:** {ACTUAL_VALUE} (e.g., "92/100")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "SonarQube analysis - sonarqube-report-2025-10-14.pdf")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Technical Debt
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<5% debt ratio")
+- **Actual:** {ACTUAL_VALUE} (e.g., "3.2% debt ratio")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CodeClimate analysis - codeclimate-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Documentation Completeness
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=90%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Documentation audit - docs-audit-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Test Quality (from test-review, if available)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Test review report - test-review-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Custom NFR Assessments (if applicable)
+
+### {CUSTOM_NFR_NAME_1}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### {CUSTOM_NFR_NAME_2}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Quick Wins
+
+{QUICK_WIN_COUNT} quick wins identified for immediate implementation:
+
+1. **{QUICK_WIN_TITLE_1}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+   - No code changes needed / Minimal code changes
+
+2. **{QUICK_WIN_TITLE_2}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+
+---
+
+## Recommended Actions
+
+### Immediate (Before Release) - CRITICAL/HIGH Priority
+
+1. **{ACTION_TITLE_1}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+2. **{ACTION_TITLE_2}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+### Short-term (Next Sprint) - MEDIUM Priority
+
+1. **{ACTION_TITLE_3}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+2. **{ACTION_TITLE_4}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+### Long-term (Backlog) - LOW Priority
+
+1. **{ACTION_TITLE_5}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+---
+
+## Monitoring Hooks
+
+{MONITORING_HOOK_COUNT} monitoring hooks recommended to detect issues before failures:
+
+### Performance Monitoring
+
+- [ ] {MONITORING_TOOL_1} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+- [ ] {MONITORING_TOOL_2} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Security Monitoring
+
+- [ ] {MONITORING_TOOL_3} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Reliability Monitoring
+
+- [ ] {MONITORING_TOOL_4} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Alerting Thresholds
+
+- [ ] {ALERT_DESCRIPTION} - Notify when {THRESHOLD_CONDITION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+---
+
+## Fail-Fast Mechanisms
+
+{FAIL_FAST_COUNT} fail-fast mechanisms recommended to prevent failures:
+
+### Circuit Breakers (Reliability)
+
+- [ ] {CIRCUIT_BREAKER_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Rate Limiting (Performance)
+
+- [ ] {RATE_LIMITING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Validation Gates (Security)
+
+- [ ] {VALIDATION_GATE_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Smoke Tests (Maintainability)
+
+- [ ] {SMOKE_TEST_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+---
+
+## Evidence Gaps
+
+{EVIDENCE_GAP_COUNT} evidence gaps identified - action required:
+
+- [ ] **{NFR_NAME_1}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+- [ ] **{NFR_NAME_2}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+---
+
+## Findings Summary
+
+| Category        | PASS             | CONCERNS             | FAIL             | Overall Status                      |
+| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
+| Performance     | {P_PASS_COUNT}   | {P_CONCERNS_COUNT}   | {P_FAIL_COUNT}   | {P_STATUS} {P_ICON}                 |
+| Security        | {S_PASS_COUNT}   | {S_CONCERNS_COUNT}   | {S_FAIL_COUNT}   | {S_STATUS} {S_ICON}                 |
+| Reliability     | {R_PASS_COUNT}   | {R_CONCERNS_COUNT}   | {R_FAIL_COUNT}   | {R_STATUS} {R_ICON}                 |
+| Maintainability | {M_PASS_COUNT}   | {M_CONCERNS_COUNT}   | {M_FAIL_COUNT}   | {M_STATUS} {M_ICON}                 |
+| **Total**       | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
+
+---
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '{DATE}'
+  story_id: '{STORY_ID}'
+  feature_name: '{FEATURE_NAME}'
+  categories:
+    performance: '{PERFORMANCE_STATUS}'
+    security: '{SECURITY_STATUS}'
+    reliability: '{RELIABILITY_STATUS}'
+    maintainability: '{MAINTAINABILITY_STATUS}'
+  overall_status: '{OVERALL_STATUS}'
+  critical_issues: { CRITICAL_COUNT }
+  high_priority_issues: { HIGH_COUNT }
+  medium_priority_issues: { MEDIUM_COUNT }
+  concerns: { CONCERNS_COUNT }
+  blockers: { BLOCKER_BOOLEAN } # true/false
+  quick_wins: { QUICK_WIN_COUNT }
+  evidence_gaps: { EVIDENCE_GAP_COUNT }
+  recommendations:
+    - '{RECOMMENDATION_1}'
+    - '{RECOMMENDATION_2}'
+    - '{RECOMMENDATION_3}'
+```
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH} (if applicable)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **PRD:** {PRD_PATH} (if available)
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Evidence Sources:**
+  - Test Results: {TEST_RESULTS_DIR}
+  - Metrics: {METRICS_DIR}
+  - Logs: {LOGS_DIR}
+  - CI Results: {CI_RESULTS_PATH}
+
+---
+
+## Recommendations Summary
+
+**Release Blocker:** {RELEASE_BLOCKER_SUMMARY}
+
+**High Priority:** {HIGH_PRIORITY_SUMMARY}
+
+**Medium Priority:** {MEDIUM_PRIORITY_SUMMARY}
+
+**Next Steps:** {NEXT_STEPS_DESCRIPTION}
+
+---
+
+## Sign-Off
+
+**NFR Assessment:**
+
+- Overall Status: {OVERALL_STATUS} {OVERALL_ICON}
+- Critical Issues: {CRITICAL_COUNT}
+- High Priority Issues: {HIGH_COUNT}
+- Concerns: {CONCERNS_COUNT}
+- Evidence Gaps: {EVIDENCE_GAP_COUNT}
+
+**Gate Status:** {GATE_STATUS} {GATE_ICON}
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Generated:** {DATE}
+**Workflow:** testarch-nfr v4.0
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/nfr-assess/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/nfr-assess/workflow.yaml
@@ -1,25 +1,107 @@
 # Test Architect workflow: nfr-assess
 name: testarch-nfr
-description: "Assess non-functional requirements before release."
+description: "Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/nfr-assess"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/nfr-report-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target specification
+  story_file: "" # Path to story markdown (optional)
+  feature_name: "" # Feature to assess (if no story file)
+
+  # NFR categories to assess
+  assess_performance: true # Response time, throughput, resource usage
+  assess_security: true # Authentication, authorization, data protection
+  assess_reliability: true # Error handling, recovery, availability
+  assess_maintainability: true # Code quality, test coverage, documentation
+
+  # Custom NFR categories (comma-separated)
+  custom_nfr_categories: "" # e.g., "accessibility,internationalization,compliance"
+
+  # Evidence sources
+  test_results_dir: "{project-root}/test-results"
+  metrics_dir: "{project-root}/metrics"
+  logs_dir: "{project-root}/logs"
+  include_ci_results: true # Analyze CI/CD pipeline results
+
+  # Thresholds (can be overridden)
+  performance_response_time_ms: 500 # Target response time
+  performance_throughput_rps: 100 # Target requests per second
+  security_score_min: 85 # Minimum security score (0-100)
+  reliability_uptime_pct: 99.9 # Target uptime percentage
+  maintainability_coverage_pct: 80 # Minimum test coverage
+
+  # Assessment configuration
+  use_deterministic_rules: true # PASS/CONCERNS/FAIL based on evidence
+  never_guess_thresholds: true # Mark as CONCERNS if threshold unknown
+  require_evidence: true # Every NFR must have evidence or be called out
+  suggest_monitoring: true # Recommend monitoring hooks for gaps
+
+  # Integration with BMad artifacts
+  use_tech_spec: true # Load tech-spec.md for NFR requirements
+  use_prd: true # Load PRD.md for NFR context
+  use_test_design: true # Load test-design.md for NFR test plan
+
+  # Output configuration
+  output_file: "{output_folder}/nfr-assessment.md"
+  generate_gate_yaml: true # Create gate YAML snippet with NFR status
+  generate_evidence_checklist: true # Create checklist of evidence gaps
+  update_story_file: false # Add NFR section to story (optional)
+
+  # Quality gates
+  fail_on_critical_nfr: true # Fail if critical NFR has FAIL status
+  warn_on_concerns: true # Warn if any NFR has CONCERNS status
+  block_release_on_fail: true # Block release if NFR assessment fails
+
+  # Advanced options
+  auto_load_knowledge: true # Load nfr-criteria, ci-burn-in fragments
+  include_quick_wins: true # Suggest quick wins for concerns/failures
+  include_recommended_actions: true # Provide actionable remediation steps
+
+# Output configuration
+default_output_file: "{output_folder}/nfr-assessment.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test results, metrics, logs, BMad artifacts
+  - write_file # Create NFR assessment, gate YAML, evidence checklist
+  - list_files # Discover test results, metrics, logs
+  - search_repo # Find NFR-related tests and evidence
+  - glob # Find result files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with NFR requirements (optional)"
+  - tech_spec: "Technical specification with NFR targets (recommended)"
+  - test_results: "Test execution results (performance, security, etc.)"
+  - metrics: "Application metrics (response times, error rates, etc.)"
+  - logs: "Application logs for reliability analysis"
+  - ci_results: "CI/CD pipeline results for burn-in validation"

 tags:
  - qa
  - nfr
  - test-architect
+  - performance
+  - security
+  - reliability

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/test-design/README.md
+++ b/src/modules/bmm/workflows/testarch/test-design/README.md
@@ -0,0 +1,378 @@
+# Test Design and Risk Assessment Workflow
+
+Plans comprehensive test coverage strategy with risk assessment (probability × impact scoring), priority classification (P0-P3), and resource estimation. This workflow generates a test design document that identifies high-risk areas, maps requirements to appropriate test levels, and provides execution ordering for optimal feedback.
+
+## Usage
+
+```bash
+bmad tea *test-design
+```
+
+The TEA agent runs this workflow when:
+
+- Planning test coverage before development starts
+- Assessing risks for an epic or story
+- Prioritizing test scenarios by business impact
+- Estimating testing effort and resources
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown**: Acceptance criteria and requirements
+- **PRD or epics.md**: High-level product context
+- **Architecture docs** (optional): Technical constraints and integration points
+
+**Workflow Variables:**
+
+- `epic_num`: Epic number for scoped design
+- `story_path`: Specific story for design (optional)
+- `design_level`: full/targeted/minimal (default: full)
+- `risk_threshold`: Score for high-priority flag (default: 6)
+- `risk_categories`: TECH,SEC,PERF,DATA,BUS,OPS (all enabled)
+- `priority_levels`: P0,P1,P2,P3 (all enabled)
+
+## Outputs
+
+**Primary Deliverable:**
+
+**Test Design Document** (`test-design-epic-{N}.md`):
+
+1. **Risk Assessment Matrix**
+   - Risk ID, category, description
+   - Probability (1-3) × Impact (1-3) = Score
+   - Scores ≥6 flagged as high-priority
+   - Mitigation plans with owners and timelines
+
+2. **Coverage Matrix**
+   - Requirement → Test Level (E2E/API/Component/Unit)
+   - Priority assignment (P0-P3)
+   - Risk linkage
+   - Test count estimates
+
+3. **Execution Order**
+   - Smoke tests (P0 subset, <5 min)
+   - P0 tests (critical paths, <10 min)
+   - P1 tests (important features, <30 min)
+   - P2/P3 tests (full regression, <60 min)
+
+4. **Resource Estimates**
+   - Hours per priority level
+   - Total effort in days
+   - Tooling and data prerequisites
+
+5. **Quality Gate Criteria**
+   - P0 pass rate: 100%
+   - P1 pass rate: ≥95%
+   - High-risk mitigations: 100%
+   - Coverage target: ≥80%
+
+## Key Features
+
+### Risk Scoring Framework
+
+**Probability × Impact = Risk Score**
+
+**Probability** (1-3):
+
+- 1 (Unlikely): <10% chance
+- 2 (Possible): 10-50% chance
+- 3 (Likely): >50% chance
+
+**Impact** (1-3):
+
+- 1 (Minor): Cosmetic, workaround exists
+- 2 (Degraded): Feature impaired, difficult workaround
+- 3 (Critical): System failure, no workaround
+
+**Scores**:
+
+- 1-2: Low risk (monitor)
+- 3-4: Medium risk (plan mitigation)
+- **6-9: High risk** (immediate mitigation required)
+
+### Risk Categories (6 types)
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws, integration failures
+- Scalability issues, technical debt
+
+**SEC** (Security):
+
+- Missing access controls, auth bypass
+- Data exposure, injection vulnerabilities
+
+**PERF** (Performance):
+
+- SLA violations, response time degradation
+- Resource exhaustion, scalability limits
+
+**DATA** (Data Integrity):
+
+- Data loss/corruption, inconsistent state
+- Migration failures
+
+**BUS** (Business Impact):
+
+- UX degradation, business logic errors
+- Revenue impact, compliance violations
+
+**OPS** (Operations):
+
+- Deployment failures, configuration errors
+- Monitoring gaps, rollback issues
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical)** - Run on every commit:
+
+- Blocks core user journey
+- High-risk (score ≥6)
+- Revenue-impacting or security-critical
+
+**P1 (High)** - Run on PR to main:
+
+- Important user features
+- Medium-risk (score 3-4)
+- Common workflows
+
+**P2 (Medium)** - Run nightly/weekly:
+
+- Secondary features
+- Low-risk (score 1-2)
+- Edge cases
+
+**P3 (Low)** - Run on-demand:
+
+- Nice-to-have, exploratory
+- Performance benchmarks
+
+### Test Level Selection
+
+**E2E (End-to-End)**:
+
+- Critical user journeys
+- Multi-system integration
+- Highest confidence, slowest
+
+**API (Integration)**:
+
+- Service contracts
+- Business logic validation
+- Fast feedback, stable
+
+**Component**:
+
+- UI component behavior
+- Visual regression
+- Fast, isolated
+
+**Unit**:
+
+- Business logic, edge cases
+- Error handling
+- Fastest, most granular
+
+**Key principle**: Avoid duplicate coverage - don't test same behavior at multiple levels.
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+## Integration with Other Workflows
+
+**Before test-design:**
+
+- **plan-project** (Phase 2): Creates PRD and epics
+- **solution-architecture** (Phase 3): Defines technical approach
+- **tech-spec** (Phase 3): Implementation details
+
+**After test-design:**
+
+- **atdd**: Generate failing tests for P0 scenarios
+- **automate**: Expand coverage for P1/P2 scenarios
+- **gate**: Use quality gate criteria for release decisions
+
+**Coordinates with:**
+
+- **framework**: Test infrastructure must exist
+- **ci**: Execution order maps to CI stages
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds test design to Quality & Testing Progress
+
+## Important Notes
+
+### Evidence-Based Assessment
+
+**Critical principle**: Base risk assessment on **evidence**, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**When uncertain**: Document assumptions, request user clarification.
+
+**Avoid**:
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+### Resource Estimation Formula
+
+```
+P0: 2 hours per test (setup + complex scenarios)
+P1: 1 hour per test (standard coverage)
+P2: 0.5 hours per test (simple scenarios)
+P3: 0.25 hours per test (exploratory)
+
+Total Days = Total Hours / 8
+```
+
+Example:
+
+- 15 P0 × 2h = 30h
+- 25 P1 × 1h = 25h
+- 40 P2 × 0.5h = 20h
+- **Total: 75 hours (~10 days)**
+
+### Execution Order Strategy
+
+**Smoke tests** (subset of P0, <5 min):
+
+- Login successful
+- Dashboard loads
+- Core API responds
+
+**Purpose**: Fast feedback, catch build-breaking issues immediately.
+
+**P0 tests** (critical paths, <10 min):
+
+- All scenarios blocking user journeys
+- Security-critical flows
+
+**P1 tests** (important features, <30 min):
+
+- Common workflows
+- Medium-risk areas
+
+**P2/P3 tests** (full regression, <60 min):
+
+- Edge cases
+- Performance benchmarks
+
+### Quality Gate Criteria
+
+**Pass/Fail thresholds:**
+
+- P0: 100% pass (no exceptions)
+- P1: ≥95% pass (2-3 failures acceptable with waivers)
+- P2/P3: ≥90% pass (informational)
+- High-risk items: All mitigated or have approved waivers
+
+**Coverage targets:**
+
+- Critical paths: ≥80%
+- Security scenarios: 100%
+- Business logic: ≥70%
+
+## Validation Checklist
+
+After workflow completion:
+
+- [ ] Risk assessment complete (all categories)
+- [ ] Risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priorities assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created
+
+Refer to `checklist.md` for comprehensive validation.
+
+## Example Execution
+
+**Scenario: E-commerce checkout epic**
+
+```bash
+bmad tea *test-design
+# Epic 3: Checkout flow redesign
+
+# Risk Assessment identifies:
+- R-001 (SEC): Payment bypass, P=2 × I=3 = 6 (HIGH)
+- R-002 (PERF): Cart load time, P=3 × I=2 = 6 (HIGH)
+- R-003 (BUS): Order confirmation email, P=2 × I=2 = 4 (MEDIUM)
+
+# Coverage Plan:
+P0 scenarios: 12 tests (payment security, order creation)
+P1 scenarios: 18 tests (cart management, promo codes)
+P2 scenarios: 25 tests (edge cases, error handling)
+
+Total effort: 65 hours (~8 days)
+
+# Test Levels:
+- E2E: 8 tests (critical checkout path)
+- API: 30 tests (business logic, payment processing)
+- Unit: 17 tests (calculations, validations)
+
+# Execution Order:
+1. Smoke: Payment successful, order created (2 min)
+2. P0: All payment & security flows (8 min)
+3. P1: Cart & promo codes (20 min)
+4. P2: Edge cases (40 min)
+
+# Quality Gates:
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- R-001 mitigated: Add payment validation layer
+- R-002 mitigated: Implement cart caching
+```
+
+## Troubleshooting
+
+**Issue: "Unable to score risks - missing context"**
+
+- **Cause**: Insufficient documentation
+- **Solution**: Request PRD, architecture docs, or user clarification
+
+**Issue: "All tests marked as P0"**
+
+- **Cause**: Over-prioritization
+- **Solution**: Apply strict P0 criteria (blocks core journey + high risk + no workaround)
+
+**Issue: "Duplicate coverage at multiple test levels"**
+
+- **Cause**: Not following test pyramid
+- **Solution**: Use E2E for critical paths only, API for logic, unit for edge cases
+
+**Issue: "Resource estimates too high"**
+
+- **Cause**: Complex test setup or insufficient automation
+- **Solution**: Invest in fixtures/factories upfront, reduce per-test setup time
+
+## Related Workflows
+
+- **atdd**: Generate failing tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression coverage → [automate/README.md](../automate/README.md)
+- **gate**: Quality gate decisions → [gate/README.md](../gate/README.md)
+- **framework**: Test infrastructure → [framework/README.md](../framework/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, risk scoring framework, template-based output
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/src/modules/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/modules/bmm/workflows/testarch/test-design/checklist.md
@@ -0,0 +1,234 @@
+# Test Design and Risk Assessment - Validation Checklist
+
+## Prerequisites
+
+- [ ] Story markdown with clear acceptance criteria exists
+- [ ] PRD or epic documentation available
+- [ ] Architecture documents available (optional)
+- [ ] Requirements are testable and unambiguous
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] PRD.md read and requirements extracted
+- [ ] Epics.md or specific epic documentation loaded
+- [ ] Story markdown with acceptance criteria analyzed
+- [ ] Architecture documents reviewed (if available)
+- [ ] Existing test coverage analyzed
+- [ ] Knowledge base fragments loaded (risk-governance, probability-impact, test-levels, test-priorities)
+
+### Step 2: Risk Assessment
+
+- [ ] Genuine risks identified (not just features)
+- [ ] Risks classified by category (TECH/SEC/PERF/DATA/BUS/OPS)
+- [ ] Probability scored (1-3 for each risk)
+- [ ] Impact scored (1-3 for each risk)
+- [ ] Risk scores calculated (probability × impact)
+- [ ] High-priority risks (score ≥6) flagged
+- [ ] Mitigation plans defined for high-priority risks
+- [ ] Owners assigned for each mitigation
+- [ ] Timelines set for mitigations
+- [ ] Residual risk documented
+
+### Step 3: Coverage Design
+
+- [ ] Acceptance criteria broken into atomic scenarios
+- [ ] Test levels selected (E2E/API/Component/Unit)
+- [ ] No duplicate coverage across levels
+- [ ] Priority levels assigned (P0/P1/P2/P3)
+- [ ] P0 scenarios meet strict criteria (blocks core + high risk + no workaround)
+- [ ] Data prerequisites identified
+- [ ] Tooling requirements documented
+- [ ] Execution order defined (smoke → P0 → P1 → P2/P3)
+
+### Step 4: Deliverables Generation
+
+- [ ] Risk assessment matrix created
+- [ ] Coverage matrix created
+- [ ] Execution order documented
+- [ ] Resource estimates calculated
+- [ ] Quality gate criteria defined
+- [ ] Output file written to correct location
+- [ ] Output file uses template structure
+
+## Output Validation
+
+### Risk Assessment Matrix
+
+- [ ] All risks have unique IDs (R-001, R-002, etc.)
+- [ ] Each risk has category assigned
+- [ ] Probability values are 1, 2, or 3
+- [ ] Impact values are 1, 2, or 3
+- [ ] Scores calculated correctly (P × I)
+- [ ] High-priority risks (≥6) clearly marked
+- [ ] Mitigation strategies specific and actionable
+
+### Coverage Matrix
+
+- [ ] All requirements mapped to test levels
+- [ ] Priorities assigned to all scenarios
+- [ ] Risk linkage documented
+- [ ] Test counts realistic
+- [ ] Owners assigned where applicable
+- [ ] No duplicate coverage (same behavior at multiple levels)
+
+### Execution Order
+
+- [ ] Smoke tests defined (<5 min target)
+- [ ] P0 tests listed (<10 min target)
+- [ ] P1 tests listed (<30 min target)
+- [ ] P2/P3 tests listed (<60 min target)
+- [ ] Order optimizes for fast feedback
+
+### Resource Estimates
+
+- [ ] P0 hours calculated (count × 2 hours)
+- [ ] P1 hours calculated (count × 1 hour)
+- [ ] P2 hours calculated (count × 0.5 hours)
+- [ ] P3 hours calculated (count × 0.25 hours)
+- [ ] Total hours summed
+- [ ] Days estimate provided (hours / 8)
+- [ ] Estimates include setup time
+
+### Quality Gate Criteria
+
+- [ ] P0 pass rate threshold defined (should be 100%)
+- [ ] P1 pass rate threshold defined (typically ≥95%)
+- [ ] High-risk mitigation completion required
+- [ ] Coverage targets specified (≥80% recommended)
+
+## Quality Checks
+
+### Evidence-Based Assessment
+
+- [ ] Risk assessment based on documented evidence
+- [ ] No speculation on business impact
+- [ ] Assumptions clearly documented
+- [ ] Clarifications requested where needed
+- [ ] Historical data referenced where available
+
+### Risk Classification Accuracy
+
+- [ ] TECH risks are architecture/integration issues
+- [ ] SEC risks are security vulnerabilities
+- [ ] PERF risks are performance/scalability concerns
+- [ ] DATA risks are data integrity issues
+- [ ] BUS risks are business/revenue impacts
+- [ ] OPS risks are deployment/operational issues
+
+### Priority Assignment Accuracy
+
+- [ ] P0: Truly blocks core functionality
+- [ ] P0: High-risk (score ≥6)
+- [ ] P0: No workaround exists
+- [ ] P1: Important but not blocking
+- [ ] P2/P3: Nice-to-have or edge cases
+
+### Test Level Selection
+
+- [ ] E2E used only for critical paths
+- [ ] API tests cover complex business logic
+- [ ] Component tests for UI interactions
+- [ ] Unit tests for edge cases and algorithms
+- [ ] No redundant coverage
+
+## Integration Points
+
+### Knowledge Base Integration
+
+- [ ] risk-governance.md consulted
+- [ ] probability-impact.md applied
+- [ ] test-levels-framework.md referenced
+- [ ] test-priorities-matrix.md used
+- [ ] Additional fragments loaded as needed
+
+### Status File Integration
+
+- [ ] bmm-workflow-status.md exists
+- [ ] Test design logged in Quality & Testing Progress
+- [ ] Epic number and scope documented
+- [ ] Completion timestamp recorded
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `atdd` workflow with P0 scenarios
+- [ ] Can proceed to `automate` workflow with full coverage plan
+- [ ] Risk assessment informs `gate` workflow criteria
+- [ ] Integrates with `ci` workflow execution order
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Output file complete and well-formatted
+- [ ] Team review scheduled (if required)
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Review risk assessment with team
+2. [ ] Prioritize mitigation for high-priority risks (score ≥6)
+3. [ ] Allocate resources per estimates
+4. [ ] Run `atdd` workflow to generate P0 tests
+5. [ ] Set up test data factories and fixtures
+6. [ ] Schedule team review of test design document
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for P0 test generation
+2. [ ] Run `framework` workflow if not already done
+3. [ ] Run `ci` workflow to configure pipeline stages
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete output file
+2. [ ] Review error logs
+3. [ ] Fix missing context (PRD, architecture docs)
+4. [ ] Clarify ambiguous requirements
+5. [ ] Retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: Too many P0 tests
+
+- **Solution**: Apply strict P0 criteria - must block core AND high risk AND no workaround
+
+**Issue**: Risk scores all high
+
+- **Solution**: Differentiate between high-impact (3) and degraded (2) impacts
+
+**Issue**: Duplicate coverage across levels
+
+- **Solution**: Use test pyramid - E2E for critical paths only
+
+**Issue**: Resource estimates too high
+
+- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+
+### Best Practices
+
+- Base risk assessment on evidence, not assumptions
+- High-priority risks (≥6) require immediate mitigation
+- P0 tests should cover <10% of total scenarios
+- Avoid testing same behavior at multiple levels
+- Include smoke tests (P0 subset) for fast feedback
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Epic:** **\*\***\_\_\_**\*\***
+**Notes:** **********\*\***********\_\_\_**********\*\***********
--- a/src/modules/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/modules/bmm/workflows/testarch/test-design/instructions.md
@@ -1,44 +1,504 @@
 <!-- Powered by BMAD-CORE™ -->

-# Risk and Test Design v3.1
+# Test Design and Risk Assessment

-```xml
-<task id="bmad/bmm/testarch/test-design" name="Risk and Test Design">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Story markdown, acceptance criteria, PRD/architecture context are available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm inputs; halt if any are missing or unclear.</action>
-    </step>
-    <step n="2" title="Assess Risks">
-      <action>Use `{project-root}/bmad/bmm/testarch/tea-index.csv` to load the `risk-governance`, `probability-impact`, and `test-levels` fragments before scoring.</action>
-      <action>Filter requirements to isolate genuine risks; review PRD/architecture/story for unresolved gaps.</action>
-      <action>Classify risks across TECH, SEC, PERF, DATA, BUS, OPS; request clarification when evidence is missing.</action>
-      <action>Score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical); compute totals and highlight scores ≥6.</action>
-      <action>Plan mitigations with owners, timelines, and update residual risk expectations.</action>
-    </step>
-    <step n="3" title="Design Coverage">
-      <action>Break acceptance criteria into atomic scenarios tied to mitigations.</action>
-      <action>Load the `test-levels` fragment (knowledge/test-levels-framework.md) to select appropriate levels and avoid duplicate coverage.</action>
-      <action>Load the `test-priorities` fragment (knowledge/test-priorities-matrix.md) to assign P0–P3 priorities and outline data/tooling prerequisites.</action>
-    </step>
-    <step n="4" title="Deliverables">
-      <action>Create risk assessment markdown (category/probability/impact/score) with mitigation matrix and gate snippet totals.</action>
-      <action>Produce coverage matrix (requirement/level/priority/mitigation) plus recommended execution order.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If story data or criteria are missing, halt and request them.</i>
-  </halt>
-  <notes>
-    <i>Category definitions: TECH=architecture flaws; SEC=missing controls; PERF=SLA risk; DATA=loss/corruption; BUS=user/business harm; OPS=deployment/run failures.</i>
-    <i>Leverage `tea-index.csv` tags to find supporting evidence (e.g., fixture-architecture, selective-testing) without loading unnecessary files.</i>
-    <i>Rely on evidence, not speculation; tie scenarios back to mitigations; keep scenarios independent and maintainable.</i>
-  </notes>
-  <output>
-    <i>Unified risk assessment and coverage strategy ready for implementation.</i>
-  </output>
-</task>
+**Workflow ID**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow generates a test design document that identifies high-risk areas, maps requirements to test levels, prioritizes scenarios (P0-P3), and provides resource estimates for the testing effort.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story markdown with acceptance criteria available
+- ✅ PRD or epic documentation exists for context
+- ✅ Architecture documents available (optional but recommended)
+- ✅ Requirements are clear and testable
+
+---
+
+## Step 1: Load Context and Requirements
+
+### Actions
+
+1. **Read Requirements Documentation**
+   - Load PRD.md for high-level product requirements
+   - Read epics.md or specific epic for feature scope
+   - Read story markdown for detailed acceptance criteria
+   - Identify all testable requirements
+
+2. **Load Architecture Context**
+   - Read solution-architecture.md for system design
+   - Read tech-spec for implementation details
+   - Identify technical constraints and dependencies
+   - Note integration points and external systems
+
+3. **Analyze Existing Test Coverage**
+   - Search for existing test files in `{test_dir}`
+   - Identify coverage gaps
+   - Note areas with insufficient testing
+   - Check for flaky or outdated tests
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `risk-governance.md` - Risk classification framework
+   - `probability-impact.md` - Risk scoring methodology
+   - `test-levels-framework.md` - Test level selection guidance
+   - `test-priorities-matrix.md` - P0-P3 prioritization criteria
+
+**Halt Condition:** If story data or acceptance criteria are missing, HALT with message: "Test design requires clear requirements and acceptance criteria"
+
+---
+
+## Step 2: Assess and Classify Risks
+
+### Actions
+
+1. **Identify Genuine Risks**
+
+   Filter requirements to isolate actual risks (not just features):
+   - Unresolved technical gaps
+   - Security vulnerabilities
+   - Performance bottlenecks
+   - Data loss or corruption potential
+   - Business impact failures
+   - Operational deployment issues
+
+2. **Classify Risks by Category**
+
+   Use these standard risk categories:
+
+   **TECH** (Technical/Architecture):
+   - Architecture flaws
+   - Integration failures
+   - Scalability issues
+   - Technical debt
+
+   **SEC** (Security):
+   - Missing access controls
+   - Authentication bypass
+   - Data exposure
+   - Injection vulnerabilities
+
+   **PERF** (Performance):
+   - SLA violations
+   - Response time degradation
+   - Resource exhaustion
+   - Scalability limits
+
+   **DATA** (Data Integrity):
+   - Data loss
+   - Data corruption
+   - Inconsistent state
+   - Migration failures
+
+   **BUS** (Business Impact):
+   - User experience degradation
+   - Business logic errors
+   - Revenue impact
+   - Compliance violations
+
+   **OPS** (Operations):
+   - Deployment failures
+   - Configuration errors
+   - Monitoring gaps
+   - Rollback issues
+
+3. **Score Risk Probability**
+
+   Rate likelihood (1-3):
+   - **1 (Unlikely)**: <10% chance, edge case
+   - **2 (Possible)**: 10-50% chance, known scenario
+   - **3 (Likely)**: >50% chance, common occurrence
+
+4. **Score Risk Impact**
+
+   Rate severity (1-3):
+   - **1 (Minor)**: Cosmetic, workaround exists, limited users
+   - **2 (Degraded)**: Feature impaired, workaround difficult, affects many users
+   - **3 (Critical)**: System failure, data loss, no workaround, blocks usage
+
+5. **Calculate Risk Score**
+
+   ```
+   Risk Score = Probability × Impact
+
+   Scores:
+   1-2: Low risk (monitor)
+   3-4: Medium risk (plan mitigation)
+   6-9: High risk (immediate mitigation required)
+   ```
+
+6. **Highlight High-Priority Risks**
+
+   Flag all risks with score ≥6 for immediate attention.
+
+7. **Request Clarification**
+
+   If evidence is missing or assumptions required:
+   - Document assumptions clearly
+   - Request user clarification
+   - Do NOT speculate on business impact
+
+8. **Plan Mitigations**
+
+   For each high-priority risk:
+   - Define mitigation strategy
+   - Assign owner (dev, QA, ops)
+   - Set timeline
+   - Update residual risk expectation
+
+---
+
+## Step 3: Design Test Coverage
+
+### Actions
+
+1. **Break Down Acceptance Criteria**
+
+   Convert each acceptance criterion into atomic test scenarios:
+   - One scenario per testable behavior
+   - Scenarios are independent
+   - Scenarios are repeatable
+   - Scenarios tie back to risk mitigations
+
+2. **Select Appropriate Test Levels**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   Map requirements to optimal test levels (avoid duplication):
+
+   **E2E (End-to-End)**:
+   - Critical user journeys
+   - Multi-system integration
+   - Production-like environment
+   - Highest confidence, slowest execution
+
+   **API (Integration)**:
+   - Service contracts
+   - Business logic validation
+   - Fast feedback
+   - Good for complex scenarios
+
+   **Component**:
+   - UI component behavior
+   - Interaction testing
+   - Visual regression
+   - Fast, isolated
+
+   **Unit**:
+   - Business logic
+   - Edge cases
+   - Error handling
+   - Fastest, most granular
+
+   **Avoid duplicate coverage**: Don't test same behavior at multiple levels unless necessary.
+
+3. **Assign Priority Levels**
+
+   **Knowledge Base Reference**: `test-priorities-matrix.md`
+
+   **P0 (Critical)**:
+   - Blocks core user journey
+   - High-risk areas (score ≥6)
+   - Revenue-impacting
+   - Security-critical
+   - **Run on every commit**
+
+   **P1 (High)**:
+   - Important user features
+   - Medium-risk areas (score 3-4)
+   - Common workflows
+   - **Run on PR to main**
+
+   **P2 (Medium)**:
+   - Secondary features
+   - Low-risk areas (score 1-2)
+   - Edge cases
+   - **Run nightly or weekly**
+
+   **P3 (Low)**:
+   - Nice-to-have
+   - Exploratory
+   - Performance benchmarks
+   - **Run on-demand**
+
+4. **Outline Data and Tooling Prerequisites**
+
+   For each test scenario, identify:
+   - Test data requirements (factories, fixtures)
+   - External services (mocks, stubs)
+   - Environment setup
+   - Tools and dependencies
+
+5. **Define Execution Order**
+
+   Recommend test execution sequence:
+   1. **Smoke tests** (P0 subset, <5 min)
+   2. **P0 tests** (critical paths, <10 min)
+   3. **P1 tests** (important features, <30 min)
+   4. **P2/P3 tests** (full regression, <60 min)
+
+---
+
+## Step 4: Generate Deliverables
+
+### Actions
+
+1. **Create Risk Assessment Matrix**
+
+   Use template structure:
+
+   ```markdown
+   | Risk ID | Category | Description | Probability | Impact | Score | Mitigation      |
+   | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
+   | R-001   | SEC      | Auth bypass | 2           | 3      | 6     | Add authz check |
+   ```
+
+2. **Create Coverage Matrix**
+
+   ```markdown
+   | Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+   | ----------- | ---------- | -------- | --------- | ---------- | ----- |
+   | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
+   ```
+
+3. **Document Execution Order**
+
+   ```markdown
+   ### Smoke Tests (<5 min)
+
+   - Login successful
+   - Dashboard loads
+
+   ### P0 Tests (<10 min)
+
+   - [Full P0 list]
+
+   ### P1 Tests (<30 min)
+
+   - [Full P1 list]
+   ```
+
+4. **Include Resource Estimates**
+
+   ```markdown
+   ### Test Effort Estimates
+
+   - P0 scenarios: 15 tests × 2 hours = 30 hours
+   - P1 scenarios: 25 tests × 1 hour = 25 hours
+   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
+   - **Total:** 75 hours (~10 days)
+   ```
+
+5. **Add Gate Criteria**
+
+   ```markdown
+   ### Quality Gate Criteria
+
+   - All P0 tests pass (100%)
+   - P1 tests pass rate ≥95%
+   - No high-risk (score ≥6) items unmitigated
+   - Test coverage ≥80% for critical paths
+   ```
+
+6. **Write to Output File**
+
+   Save to `{output_folder}/test-design-epic-{epic_num}.md` using template structure.
+
+---
+
+## Important Notes
+
+### Risk Category Definitions
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws or technical debt
+- Integration complexity
+- Scalability concerns
+
+**SEC** (Security):
+
+- Missing security controls
+- Authentication/authorization gaps
+- Data exposure risks
+
+**PERF** (Performance):
+
+- SLA risk or performance degradation
+- Resource constraints
+- Scalability bottlenecks
+
+**DATA** (Data Integrity):
+
+- Data loss or corruption potential
+- State consistency issues
+- Migration risks
+
+**BUS** (Business Impact):
+
+- User experience harm
+- Business logic errors
+- Revenue or compliance impact
+
+**OPS** (Operations):
+
+- Deployment or runtime failures
+- Configuration issues
+- Monitoring/observability gaps
+
+### Risk Scoring Methodology
+
+**Probability × Impact = Risk Score**
+
+Examples:
+
+- High likelihood (3) × Critical impact (3) = **Score 9** (highest priority)
+- Possible (2) × Critical (3) = **Score 6** (high priority threshold)
+- Unlikely (1) × Minor (1) = **Score 1** (low priority)
+
+**Threshold**: Scores ≥6 require immediate mitigation.
+
+### Test Level Selection Strategy
+
+**Avoid duplication:**
+
+- Don't test same behavior at E2E and API level
+- Use E2E for critical paths only
+- Use API tests for complex business logic
+- Use unit tests for edge cases
+
+**Tradeoffs:**
+
+- E2E: High confidence, slow execution, brittle
+- API: Good balance, fast, stable
+- Unit: Fastest feedback, narrow scope
+
+### Priority Assignment Guidelines
+
+**P0 criteria** (all must be true):
+
+- Blocks core functionality
+- High-risk (score ≥6)
+- No workaround exists
+- Affects majority of users
+
+**P1 criteria**:
+
+- Important feature
+- Medium risk (score 3-5)
+- Workaround exists but difficult
+
+**P2/P3**: Everything else, prioritized by value
+
+### Knowledge Base Integration
+
+**Auto-load enabled:**
+
+- `risk-governance.md` - Risk framework
+- `probability-impact.md` - Scoring guide
+- `test-levels-framework.md` - Level selection
+- `test-priorities-matrix.md` - Priority assignment
+
+**Manual reference:**
+
+- Use `tea-index.csv` to find additional fragments
+- Load `selective-testing.md` for execution strategy
+- Load `fixture-architecture.md` for data setup patterns
+
+### Evidence-Based Assessment
+
+**Critical principle:** Base risk assessment on evidence, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**Avoid:**
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+**When uncertain:** Document assumptions and request clarification from user.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Test Design Complete
+
+**Epic**: {epic_num}
+**Scope**: {design_level}
+
+**Risk Assessment**:
+
+- Total risks identified: {count}
+- High-priority risks (≥6): {high_count}
+- Categories: {categories}
+
+**Coverage Plan**:
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+**Test Levels**:
+
+- E2E: {e2e_count}
+- API: {api_count}
+- Component: {component_count}
+- Unit: {unit_count}
+
+**Quality Gate Criteria**:
+
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- High-risk mitigations: 100%
+- Coverage: ≥80%
+
+**Output File**: {output_file}
+
+**Next Steps**:
+
+1. Review risk assessment with team
+2. Prioritize mitigation for high-risk items (score ≥6)
+3. Run `atdd` workflow to generate failing tests for P0 scenarios
+4. Allocate resources per effort estimates
+5. Set up test data factories and fixtures
 ```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Risk assessment complete with all categories
+- [ ] All risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priority levels assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/src/modules/bmm/workflows/testarch/test-design/test-design-template.md
+++ b/src/modules/bmm/workflows/testarch/test-design/test-design-template.md
@@ -0,0 +1,285 @@
+# Test Design: Epic {epic_num} - {epic_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Status:** Draft / Approved
+
+---
+
+## Executive Summary
+
+**Scope:** {design_level} test design for Epic {epic_num}
+
+**Risk Summary:**
+
+- Total risks identified: {total_risks}
+- High-priority risks (≥6): {high_priority_count}
+- Critical categories: {top_categories}
+
+**Coverage Summary:**
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+---
+
+## Risk Assessment
+
+### High-Priority Risks (Score ≥6)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   | Timeline |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- | -------- |
+| R-001   | SEC      | {description} | 2           | 3      | 6     | {mitigation} | {owner} | {date}   |
+| R-002   | PERF     | {description} | 3           | 2      | 6     | {mitigation} | {owner} | {date}   |
+
+### Medium-Priority Risks (Score 3-4)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- |
+| R-003   | TECH     | {description} | 2           | 2      | 4     | {mitigation} | {owner} |
+| R-004   | DATA     | {description} | 1           | 3      | 3     | {mitigation} | {owner} |
+
+### Low-Priority Risks (Score 1-2)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Action  |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------- |
+| R-005   | OPS      | {description} | 1           | 2      | 2     | Monitor |
+| R-006   | BUS      | {description} | 1           | 1      | 1     | Monitor |
+
+### Risk Category Legend
+
+- **TECH**: Technical/Architecture (flaws, integration, scalability)
+- **SEC**: Security (access controls, auth, data exposure)
+- **PERF**: Performance (SLA violations, degradation, resource limits)
+- **DATA**: Data Integrity (loss, corruption, inconsistency)
+- **BUS**: Business Impact (UX harm, logic errors, revenue)
+- **OPS**: Operations (deployment, config, monitoring)
+
+---
+
+## Test Coverage Plan
+
+### P0 (Critical) - Run on every commit
+
+**Criteria**: Blocks core journey + High risk (≥6) + No workaround
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | R-001     | 3          | QA    | {notes} |
+| {requirement} | API        | R-002     | 5          | QA    | {notes} |
+
+**Total P0**: {p0_count} tests, {p0_hours} hours
+
+### P1 (High) - Run on PR to main
+
+**Criteria**: Important features + Medium risk (3-4) + Common workflows
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-003     | 4          | QA    | {notes} |
+| {requirement} | Component  | -         | 6          | DEV   | {notes} |
+
+**Total P1**: {p1_count} tests, {p1_hours} hours
+
+### P2 (Medium) - Run nightly/weekly
+
+**Criteria**: Secondary features + Low risk (1-2) + Edge cases
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-004     | 8          | QA    | {notes} |
+| {requirement} | Unit       | -         | 15         | DEV   | {notes} |
+
+**Total P2**: {p2_count} tests, {p2_hours} hours
+
+### P3 (Low) - Run on-demand
+
+**Criteria**: Nice-to-have + Exploratory + Performance benchmarks
+
+| Requirement   | Test Level | Test Count | Owner | Notes   |
+| ------------- | ---------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | 2          | QA    | {notes} |
+| {requirement} | Unit       | 8          | DEV   | {notes} |
+
+**Total P3**: {p3_count} tests, {p3_hours} hours
+
+---
+
+## Execution Order
+
+### Smoke Tests (<5 min)
+
+**Purpose**: Fast feedback, catch build-breaking issues
+
+- [ ] {scenario} (30s)
+- [ ] {scenario} (45s)
+- [ ] {scenario} (1min)
+
+**Total**: {smoke_count} scenarios
+
+### P0 Tests (<10 min)
+
+**Purpose**: Critical path validation
+
+- [ ] {scenario} (E2E)
+- [ ] {scenario} (API)
+- [ ] {scenario} (API)
+
+**Total**: {p0_count} scenarios
+
+### P1 Tests (<30 min)
+
+**Purpose**: Important feature coverage
+
+- [ ] {scenario} (API)
+- [ ] {scenario} (Component)
+
+**Total**: {p1_count} scenarios
+
+### P2/P3 Tests (<60 min)
+
+**Purpose**: Full regression coverage
+
+- [ ] {scenario} (Unit)
+- [ ] {scenario} (API)
+
+**Total**: {p2p3_count} scenarios
+
+---
+
+## Resource Estimates
+
+### Test Development Effort
+
+| Priority  | Count             | Hours/Test | Total Hours       | Notes                   |
+| --------- | ----------------- | ---------- | ----------------- | ----------------------- |
+| P0        | {p0_count}        | 2.0        | {p0_hours}        | Complex setup, security |
+| P1        | {p1_count}        | 1.0        | {p1_hours}        | Standard coverage       |
+| P2        | {p2_count}        | 0.5        | {p2_hours}        | Simple scenarios        |
+| P3        | {p3_count}        | 0.25       | {p3_hours}        | Exploratory             |
+| **Total** | **{total_count}** | **-**      | **{total_hours}** | **~{total_days} days**  |
+
+### Prerequisites
+
+**Test Data:**
+
+- {factory_name} factory (faker-based, auto-cleanup)
+- {fixture_name} fixture (setup/teardown)
+
+**Tooling:**
+
+- {tool} for {purpose}
+- {tool} for {purpose}
+
+**Environment:**
+
+- {env_requirement}
+- {env_requirement}
+
+---
+
+## Quality Gate Criteria
+
+### Pass/Fail Thresholds
+
+- **P0 pass rate**: 100% (no exceptions)
+- **P1 pass rate**: ≥95% (waivers required for failures)
+- **P2/P3 pass rate**: ≥90% (informational)
+- **High-risk mitigations**: 100% complete or approved waivers
+
+### Coverage Targets
+
+- **Critical paths**: ≥80%
+- **Security scenarios**: 100%
+- **Business logic**: ≥70%
+- **Edge cases**: ≥50%
+
+### Non-Negotiable Requirements
+
+- [ ] All P0 tests pass
+- [ ] No high-risk (≥6) items unmitigated
+- [ ] Security tests (SEC category) pass 100%
+- [ ] Performance targets met (PERF category)
+
+---
+
+## Mitigation Plans
+
+### R-001: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+### R-002: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+---
+
+## Assumptions and Dependencies
+
+### Assumptions
+
+1. {assumption}
+2. {assumption}
+3. {assumption}
+
+### Dependencies
+
+1. {dependency} - Required by {date}
+2. {dependency} - Required by {date}
+
+### Risks to Plan
+
+- **Risk**: {risk_to_plan}
+  - **Impact**: {impact}
+  - **Contingency**: {contingency}
+
+---
+
+## Approval
+
+**Test Design Approved By:**
+
+- [ ] Product Manager: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] Tech Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] QA Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+
+**Comments:**
+
+---
+
+---
+
+---
+
+## Appendix
+
+### Knowledge Base References
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+### Related Documents
+
+- PRD: {prd_link}
+- Epic: {epic_link}
+- Architecture: {arch_link}
+- Tech Spec: {tech_spec_link}
+
+---
+
+**Generated by**: BMad TEA Agent - Test Architect Module
+**Workflow**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
--- a/src/modules/bmm/workflows/testarch/test-design/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/test-design/workflow.yaml
@@ -1,25 +1,79 @@
 # Test Architect workflow: test-design
-name: testarch-plan
-description: "Plan risk mitigation and test coverage before development."
+name: testarch-test-design
+description: "Plan risk mitigation and test coverage strategy before development with risk assessment and prioritization"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-design"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-design-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target scope
+  epic_num: "" # Epic number for scoped design
+  story_path: "" # Specific story for design (optional)
+  design_level: "full" # full, targeted, minimal
+
+  # Risk assessment configuration
+  risk_assessment_enabled: true
+  risk_threshold: 6 # Scores >= 6 are high-priority (probability × impact)
+  risk_categories: "TECH,SEC,PERF,DATA,BUS,OPS" # Comma-separated
+
+  # Coverage planning
+  priority_levels: "P0,P1,P2,P3" # Test priorities
+  test_levels: "e2e,api,integration,unit,component" # Test levels to consider
+  selective_testing_strategy: "risk-based" # risk-based, coverage-based, hybrid
+
+  # Output configuration
+  output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+  include_risk_matrix: true
+  include_coverage_matrix: true
+  include_execution_order: true
+  include_resource_estimates: true
+
+  # Advanced options
+  auto_load_knowledge: true # Load relevant knowledge fragments
+  include_mitigation_plan: true
+  include_gate_criteria: true
+  standalone_mode: false # Can run without epic context
+
+# Output configuration
+default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read PRD, epics, stories, architecture docs
+  - write_file # Create test design document
+  - list_files # Find related documentation
+  - search_repo # Search for existing tests and patterns
+
+# Recommended inputs
+recommended_inputs:
+  - prd: "Product Requirements Document for context"
+  - epics: "Epic documentation (epics.md or specific epic)"
+  - story: "Story markdown with acceptance criteria"
+  - architecture: "Architecture documents (solution-architecture.md, tech-spec)"
+  - existing_tests: "Current test coverage for gap analysis"

 tags:
  - qa
  - planning
  - test-architect
+  - risk-assessment
+  - coverage

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/test-review/README.md
+++ b/src/modules/bmm/workflows/testarch/test-review/README.md
@@ -0,0 +1,775 @@
+# Test Quality Review Workflow
+
+The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.
+
+## Overview
+
+This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.
+
+**Key Features:**
+
+- **Knowledge-Based Review**: Applies patterns from 19+ knowledge fragments in tea-index.csv
+- **Quality Scoring**: 0-100 score with letter grade (A+ to F) based on violations
+- **Multi-Scope Review**: Single file, directory, or entire test suite
+- **Pattern Detection**: Identifies hard waits, race conditions, shared state, conditionals
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions, test length
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Code Examples**: Every issue includes recommended fix with code snippets
+- **Integration**: Works with story files, test-design, acceptance criteria context
+
+---
+
+## Usage
+
+```bash
+bmad tea *test-review
+```
+
+The TEA agent runs this workflow when:
+
+- After `*atdd` workflow → validate generated acceptance tests
+- After `*automate` workflow → ensure regression suite quality
+- After developer writes tests → provide quality feedback
+- Before `*gate` workflow → confirm test quality before release
+- User explicitly requests review: `bmad tea *test-review`
+- Periodic quality audits of existing test suite
+
+**Typical workflow sequence:**
+
+1. `*atdd` → Generate failing acceptance tests
+2. **`*test-review`** → Validate test quality ⬅️ YOU ARE HERE (option 1)
+3. `*dev story` → Implement feature with tests passing
+4. **`*test-review`** → Review implementation tests ⬅️ YOU ARE HERE (option 2)
+5. `*automate` → Expand regression suite
+6. **`*test-review`** → Validate new regression tests ⬅️ YOU ARE HERE (option 3)
+7. `*gate` → Final quality gate decision
+
+---
+
+## Inputs
+
+### Required Context Files
+
+- **Test File(s)**: One or more test files to review (auto-discovered or explicitly provided)
+- **Test Framework Config**: playwright.config.ts, jest.config.js, etc. (for context)
+
+### Recommended Context Files
+
+- **Story File**: Acceptance criteria for context (e.g., `story-1.3.md`)
+- **Test Design**: Priority context (P0/P1/P2/P3) from test-design.md
+- **Knowledge Base**: tea-index.csv with best practice fragments (required for thorough review)
+
+### Workflow Variables
+
+Key variables that control review behavior (configured in `workflow.yaml`):
+
+- **review_scope**: `single` | `directory` | `suite` (default: `single`)
+  - `single`: Review one test file
+  - `directory`: Review all tests in a directory
+  - `suite`: Review entire test suite
+
+- **quality_score_enabled**: Enable 0-100 quality scoring (default: `true`)
+- **append_to_file**: Add inline comments to test files (default: `false`)
+- **check_against_knowledge**: Use tea-index.csv fragments (default: `true`)
+- **strict_mode**: Fail on any violation vs advisory only (default: `false`)
+
+**Quality Criteria Flags** (all default to `true`):
+
+- `check_given_when_then`: BDD format validation
+- `check_test_ids`: Test ID conventions
+- `check_priority_markers`: P0/P1/P2/P3 classification
+- `check_hard_waits`: Detect sleep(), wait(X)
+- `check_determinism`: No conditionals/try-catch abuse
+- `check_isolation`: Tests clean up, no shared state
+- `check_fixture_patterns`: Pure function → Fixture → mergeTests
+- `check_data_factories`: Factory usage vs hardcoded data
+- `check_network_first`: Route intercept before navigate
+- `check_assertions`: Explicit assertions present
+- `check_test_length`: Warn if >300 lines
+- `check_test_duration`: Warn if >1.5 min
+- `check_flakiness_patterns`: Common flaky patterns
+
+---
+
+## Outputs
+
+### Primary Deliverable
+
+**Test Quality Review Report** (`test-review-{filename}.md`):
+
+- **Executive Summary**: Overall assessment, key strengths/weaknesses, recommendation
+- **Quality Score**: 0-100 score with letter grade (A+ to F)
+- **Quality Criteria Assessment**: Table with all criteria evaluated (PASS/WARN/FAIL)
+- **Critical Issues**: P0/P1 violations that must be fixed
+- **Recommendations**: P2/P3 violations that should be fixed
+- **Best Practices Examples**: Good patterns found in tests
+- **Knowledge Base References**: Links to detailed guidance
+
+Each issue includes:
+
+- Code location (file:line)
+- Explanation of problem
+- Recommended fix with code example
+- Knowledge base fragment reference
+
+### Secondary Outputs
+
+- **Inline Comments**: TODO comments in test files at violation locations (if enabled)
+- **Quality Badge**: Badge with score (e.g., "Test Quality: 87/100 (A)")
+- **Story Update**: Test quality section appended to story file (if enabled)
+
+### Validation Safeguards
+
+- ✅ All knowledge base fragments loaded successfully
+- ✅ Test files parsed and structure analyzed
+- ✅ All enabled quality criteria evaluated
+- ✅ Violations categorized by severity (P0/P1/P2/P3)
+- ✅ Quality score calculated with breakdown
+- ✅ Actionable feedback with code examples provided
+
+---
+
+## Quality Criteria Explained
+
+### 1. BDD Format (Given-When-Then)
+
+**PASS**: Tests use clear Given-When-Then structure
+
+```typescript
+// Given: User is logged in
+const user = await createTestUser();
+await loginPage.login(user.email, user.password);
+
+// When: User navigates to dashboard
+await page.goto('/dashboard');
+
+// Then: User sees welcome message
+await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);
+```
+
+**FAIL**: Tests lack structure, hard to understand intent
+
+```typescript
+await page.goto('/dashboard');
+await page.click('.button');
+await expect(page.locator('.text')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, tdd-cycles.md
+
+---
+
+### 2. Test IDs
+
+**PASS**: All tests have IDs following convention
+
+```typescript
+test.describe('1.3-E2E-001: User Login Flow', () => {
+  test('should log in successfully with valid credentials', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**FAIL**: No test IDs, can't trace to requirements
+
+```typescript
+test.describe('Login', () => {
+  test('login works', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**Knowledge**: traceability.md, test-quality.md
+
+---
+
+### 3. Priority Markers
+
+**PASS**: Tests classified as P0/P1/P2/P3
+
+```typescript
+test.describe('P0: Critical User Journey - Checkout', () => {
+  // Critical tests
+});
+
+test.describe('P2: Edge Case - International Addresses', () => {
+  // Nice-to-have tests
+});
+```
+
+**Knowledge**: test-priorities.md, risk-governance.md
+
+---
+
+### 4. No Hard Waits
+
+**PASS**: No sleep(), wait(), hardcoded delays
+
+```typescript
+// ✅ Good: Explicit wait for condition
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+
+**FAIL**: Hard waits introduce flakiness
+
+```typescript
+// ❌ Bad: Hard wait
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, network-first.md
+
+---
+
+### 5. Determinism
+
+**PASS**: Tests work deterministically, no conditionals
+
+```typescript
+// ✅ Good: Deterministic test
+await expect(page.locator('[data-testid="status"]')).toHaveText('Active');
+```
+
+**FAIL**: Conditionals make tests unpredictable
+
+```typescript
+// ❌ Bad: Conditional logic
+const status = await page.locator('[data-testid="status"]').textContent();
+if (status === 'Active') {
+  await page.click('[data-testid="deactivate"]');
+} else {
+  await page.click('[data-testid="activate"]');
+}
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 6. Isolation
+
+**PASS**: Tests clean up, no shared state
+
+```typescript
+test.afterEach(async ({ page, testUser }) => {
+  // Cleanup: Delete test user
+  await api.deleteUser(testUser.id);
+});
+```
+
+**FAIL**: Shared state, tests depend on order
+
+```typescript
+// ❌ Bad: Shared global variable
+let userId: string;
+
+test('create user', async () => {
+  userId = await createUser(); // Sets global
+});
+
+test('update user', async () => {
+  await updateUser(userId); // Depends on previous test
+});
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 7. Fixture Patterns
+
+**PASS**: Pure function → Fixture → mergeTests
+
+```typescript
+// ✅ Good: Pure function fixture
+const createAuthenticatedPage = async (page: Page, user: User) => {
+  await loginPage.login(user.email, user.password);
+  return page;
+};
+
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    const authedPage = await createAuthenticatedPage(page, user);
+    await use(authedPage);
+  },
+});
+```
+
+**FAIL**: No fixtures, repeated setup
+
+```typescript
+// ❌ Bad: Repeated setup in every test
+test('test 1', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'password123');
+  await page.click('[type="submit"]');
+  // Test logic
+});
+```
+
+**Knowledge**: fixture-architecture.md
+
+---
+
+### 8. Data Factories
+
+**PASS**: Factory functions with overrides
+
+```typescript
+// ✅ Good: Factory function
+import { createTestUser } from './factories/user-factory';
+
+test('user can update profile', async ({ page }) => {
+  const user = createTestUser({ role: 'admin' });
+  await api.createUser(user); // API-first setup
+  // Test UI interaction
+});
+```
+
+**FAIL**: Hardcoded test data
+
+```typescript
+// ❌ Bad: Magic strings
+await page.fill('[name="email"]', 'test@example.com');
+await page.fill('[name="phone"]', '555-1234');
+```
+
+**Knowledge**: data-factories.md
+
+---
+
+### 9. Network-First Pattern
+
+**PASS**: Route intercept before navigate
+
+```typescript
+// ✅ Good: Intercept before navigation
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
+await page.goto('/users'); // Navigate after route setup
+```
+
+**FAIL**: Race condition risk
+
+```typescript
+// ❌ Bad: Navigate before intercept
+await page.goto('/users');
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!
+```
+
+**Knowledge**: network-first.md
+
+---
+
+### 10. Explicit Assertions
+
+**PASS**: Clear, specific assertions
+
+```typescript
+await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
+await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);
+```
+
+**FAIL**: Missing or vague assertions
+
+```typescript
+await page.locator('[data-testid="username"]').isVisible(); // No assertion!
+```
+
+**Knowledge**: test-quality.md
+
+---
+
+### 11. Test Length
+
+**PASS**: ≤300 lines per file (ideal: ≤200)
+**WARN**: 301-500 lines (consider splitting)
+**FAIL**: >500 lines (too large)
+
+**Knowledge**: test-quality.md
+
+---
+
+### 12. Test Duration
+
+**PASS**: ≤1.5 minutes per test (target: <30 seconds)
+**WARN**: 1.5-3 minutes (consider optimization)
+**FAIL**: >3 minutes (too slow)
+
+**Knowledge**: test-quality.md, selective-testing.md
+
+---
+
+### 13. Flakiness Patterns
+
+Common flaky patterns detected:
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions
+- Retry logic hiding flakiness
+- Environment-dependent assumptions
+
+**Knowledge**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+## Quality Scoring
+
+### Score Calculation
+
+```
+Starting Score: 100
+
+Deductions:
+- Critical Violations (P0): -10 points each
+- High Violations (P1): -5 points each
+- Medium Violations (P2): -2 points each
+- Low Violations (P3): -1 point each
+
+Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5
+
+Final Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+### Quality Grades
+
+- **90-100** (A+): Excellent - Production-ready, best practices followed
+- **80-89** (A): Good - Minor improvements recommended
+- **70-79** (B): Acceptable - Some issues to address
+- **60-69** (C): Needs Improvement - Several issues detected
+- **<60** (F): Critical Issues - Significant problems, not production-ready
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Excellent Quality (Score: 95)
+
+```markdown
+# Test Quality Review: checkout-flow.spec.ts
+
+**Quality Score**: 95/100 (A+ - Excellent)
+**Recommendation**: Approve - Production Ready
+
+## Executive Summary
+
+Excellent test quality with comprehensive coverage and best practices throughout.
+Tests demonstrate expert-level patterns including fixture architecture, data
+factories, network-first approach, and perfect isolation.
+
+**Strengths:**
+✅ Clear Given-When-Then structure in all tests
+✅ Comprehensive fixtures for authenticated states
+✅ Data factories with faker.js for realistic test data
+✅ Network-first pattern prevents race conditions
+✅ Perfect test isolation with cleanup
+✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)
+
+**Minor Recommendations:**
+⚠️ One test slightly verbose (245 lines) - consider extracting helper function
+
+**Recommendation**: Approve without changes. Use as reference for other tests.
+```
+
+---
+
+### Scenario 2: Good Quality (Score: 82)
+
+```markdown
+# Test Quality Review: user-profile.spec.ts
+
+**Quality Score**: 82/100 (A - Good)
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Solid test quality with good structure and coverage. A few improvements would
+enhance maintainability and reduce flakiness risk.
+
+**Strengths:**
+✅ Good BDD structure
+✅ Test IDs present
+✅ Explicit assertions
+
+**Issues to Address:**
+⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
+⚠️ Hardcoded test data (line 23) - use factory functions
+⚠️ Missing cleanup in one test (line 89) - add afterEach hook
+
+**Recommendation**: Address hard waits before merging. Other improvements
+can be addressed in follow-up PR.
+```
+
+---
+
+### Scenario 3: Needs Improvement (Score: 68)
+
+```markdown
+# Test Quality Review: legacy-report.spec.ts
+
+**Quality Score**: 68/100 (C - Needs Improvement)
+**Recommendation**: Request Changes
+
+## Executive Summary
+
+Test has several quality issues that should be addressed before merging.
+Primarily concerns around flakiness risk and maintainability.
+
+**Critical Issues:**
+❌ 5 hard waits detected (flakiness risk)
+❌ Race condition: navigation before route interception (line 45)
+❌ Shared global state between tests (line 12)
+❌ Missing test IDs (can't trace to requirements)
+
+**Recommendations:**
+⚠️ Test file is 487 lines - consider splitting
+⚠️ Hardcoded data throughout - use factories
+⚠️ Missing cleanup in afterEach
+
+**Recommendation**: Address all critical issues (❌) before re-review.
+Significant refactoring needed.
+```
+
+---
+
+### Scenario 4: Critical Issues (Score: 42)
+
+```markdown
+# Test Quality Review: data-export.spec.ts
+
+**Quality Score**: 42/100 (F - Critical Issues)
+**Recommendation**: Block - Not Production Ready
+
+## Executive Summary
+
+CRITICAL: Test has severe quality issues that make it unsuitable for
+production. Significant refactoring required.
+
+**Critical Issues:**
+❌ 12 hard waits (page.waitForTimeout) throughout
+❌ No test IDs or structure
+❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
+❌ No cleanup - tests leave data in database
+❌ Conditional logic (if/else) throughout tests
+❌ No assertions in 3 tests (tests do nothing!)
+❌ 687 lines - far too large
+❌ Multiple race conditions
+❌ Hardcoded credentials in plain text (SECURITY ISSUE)
+
+**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
+TEA knowledge base patterns. Suggest pairing session with QA engineer.
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+1. **atdd** - Generates acceptance tests → TEA reviews for quality
+2. **dev story** - Developer implements tests → TEA provides feedback
+3. **automate** - Expands regression suite → TEA validates new tests
+
+### After Test Review
+
+1. **Developer** - Addresses critical issues, improves based on recommendations
+2. **gate** - Test quality feeds into release decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria for context
+- **Test Design**: Review validates tests align with P0/P1/P2/P3 prioritization
+- **Knowledge Base**: All feedback references tea-index.csv fragments
+
+---
+
+## Review Scopes
+
+### Single File Review
+
+```bash
+# Review specific test file
+bmad tea *test-review
+# Provide test_file_path when prompted: tests/auth/login.spec.ts
+```
+
+**Use When:**
+
+- Reviewing tests just written
+- PR review of specific test file
+- Debugging flaky test
+- Learning test quality patterns
+
+---
+
+### Directory Review
+
+```bash
+# Review all tests in directory
+bmad tea *test-review
+# Provide review_scope: directory
+# Provide test_dir: tests/auth/
+```
+
+**Use When:**
+
+- Feature branch has multiple test files
+- Reviewing entire feature test suite
+- Auditing test quality for module
+
+---
+
+### Suite Review
+
+```bash
+# Review entire test suite
+bmad tea *test-review
+# Provide review_scope: suite
+```
+
+**Use When:**
+
+- Periodic quality audit (monthly/quarterly)
+- Before major release
+- Identifying patterns across codebase
+- Establishing quality baseline
+
+---
+
+## Configuration Examples
+
+### Strict Review (Fail on Violations)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: true # Fail if score <70
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: PR gates, production releases
+
+---
+
+### Balanced Review (Advisory)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: false # Advisory only
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: Most development workflows (default)
+
+---
+
+### Focused Review (Specific Criteria)
+
+```yaml
+review_scope: 'single'
+check_hard_waits: true
+check_flakiness_patterns: true
+check_network_first: true
+# Other checks: false
+```
+
+Use for: Debugging flaky tests, targeted improvements
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified (document with comments)
+3. **Knowledge-Based**: All feedback grounded in proven patterns
+4. **Actionable**: Every issue includes recommended fix with code example
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review tests periodically as patterns evolve
+7. **Learning Tool**: Use reviews to learn best practices, not just find bugs
+
+---
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-quality.md** - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **fixture-architecture.md** - Pure function → Fixture → mergeTests pattern
+- **network-first.md** - Route intercept before navigate (race condition prevention)
+- **data-factories.md** - Factory functions with overrides, API-first setup
+- **test-levels-framework.md** - E2E vs API vs Component vs Unit appropriateness
+- **playwright-config.md** - Environment-based configuration patterns
+- **tdd-cycles.md** - Red-Green-Refactor patterns
+- **selective-testing.md** - Duplicate coverage detection
+- **ci-burn-in.md** - Flakiness detection patterns
+- **test-priorities.md** - P0/P1/P2/P3 classification framework
+- **traceability.md** - Requirements-to-tests mapping
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Troubleshooting
+
+### Problem: Quality score seems too low
+
+**Solution:**
+
+- Review violation breakdown - focus on critical issues first
+- Consider project context - some patterns may be justified
+- Check if criteria are appropriate for project type
+- Score is indicator, not absolute - focus on actionable feedback
+
+---
+
+### Problem: No test files found
+
+**Solution:**
+
+- Verify test_dir path is correct
+- Check test file extensions (_.spec.ts, _.test.js, etc.)
+- Use glob pattern to discover: `tests/**/*.spec.ts`
+
+---
+
+### Problem: Knowledge fragments not loading
+
+**Solution:**
+
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct in tea-index.csv
+- Ensure auto_load_knowledge: true in workflow variables
+
+---
+
+### Problem: Too many false positives
+
+**Solution:**
+
+- Add justification comments in code for legitimate violations
+- Adjust check\_\* flags to disable specific criteria
+- Use strict_mode: false for advisory-only feedback
+- Context matters - document why pattern is appropriate
+
+---
+
+## Related Commands
+
+- `bmad tea *atdd` - Generate acceptance tests (review after generation)
+- `bmad tea *automate` - Expand regression suite (review new tests)
+- `bmad tea *gate` - Quality gate decision (test quality feeds into decision)
+- `bmad dev story` - Implement story (review tests after implementation)
--- a/src/modules/bmm/workflows/testarch/test-review/checklist.md
+++ b/src/modules/bmm/workflows/testarch/test-review/checklist.md
@@ -0,0 +1,470 @@
+# Test Quality Review - Validation Checklist
+
+Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
+
+---
+
+## Prerequisites
+
+### Test File Discovery
+
+- [ ] Test file(s) identified for review (single/directory/suite scope)
+- [ ] Test files exist and are readable
+- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
+
+### Knowledge Base Loading
+
+- [ ] tea-index.csv loaded successfully
+- [ ] `test-quality.md` loaded (Definition of Done)
+- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
+- [ ] `network-first.md` loaded (Route intercept before navigate)
+- [ ] `data-factories.md` loaded (Factory patterns)
+- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
+- [ ] All other enabled fragments loaded successfully
+
+### Context Gathering
+
+- [ ] Story file discovered or explicitly provided (if available)
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Review scope determined (single/directory/suite)
+- [ ] Test file paths collected
+- [ ] Related artifacts discovered (story, test-design)
+- [ ] Knowledge base fragments loaded successfully
+- [ ] Quality criteria flags read from workflow variables
+
+### Step 2: Test File Parsing
+
+**For Each Test File:**
+
+- [ ] File read successfully
+- [ ] File size measured (lines, KB)
+- [ ] File structure parsed (describe blocks, it blocks)
+- [ ] Test IDs extracted (if present)
+- [ ] Priority markers extracted (if present)
+- [ ] Imports analyzed
+- [ ] Dependencies identified
+
+**Test Structure Analysis:**
+
+- [ ] Describe block count calculated
+- [ ] It/test block count calculated
+- [ ] BDD structure identified (Given-When-Then)
+- [ ] Fixture usage detected
+- [ ] Data factory usage detected
+- [ ] Network interception patterns identified
+- [ ] Assertions counted
+- [ ] Waits and timeouts cataloged
+- [ ] Conditionals (if/else) detected
+- [ ] Try/catch blocks detected
+- [ ] Shared state or globals detected
+
+### Step 3: Quality Criteria Validation
+
+**For Each Enabled Criterion:**
+
+#### BDD Format (if `check_given_when_then: true`)
+
+- [ ] Given-When-Then structure evaluated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers
+- [ ] Examples of good/bad patterns noted
+
+#### Test IDs (if `check_test_ids: true`)
+
+- [ ] Test ID presence validated
+- [ ] Test ID format checked (e.g., 1.3-E2E-001)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing IDs cataloged
+
+#### Priority Markers (if `check_priority_markers: true`)
+
+- [ ] P0/P1/P2/P3 classification validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing priorities cataloged
+
+#### Hard Waits (if `check_hard_waits: true`)
+
+- [ ] sleep(), waitForTimeout(), hardcoded delays detected
+- [ ] Justification comments checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers and recommended fixes
+
+#### Determinism (if `check_determinism: true`)
+
+- [ ] Conditionals (if/else/switch) detected
+- [ ] Try/catch abuse detected
+- [ ] Random values (Math.random, Date.now) detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Isolation (if `check_isolation: true`)
+
+- [ ] Cleanup hooks (afterEach/afterAll) validated
+- [ ] Shared state detected
+- [ ] Global variable mutations detected
+- [ ] Resource cleanup verified
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Fixture Patterns (if `check_fixture_patterns: true`)
+
+- [ ] Fixtures detected (test.extend)
+- [ ] Pure functions validated
+- [ ] mergeTests usage checked
+- [ ] beforeEach complexity analyzed
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Data Factories (if `check_data_factories: true`)
+
+- [ ] Factory functions detected
+- [ ] Hardcoded data (magic strings/numbers) detected
+- [ ] Faker.js or similar usage validated
+- [ ] API-first setup pattern checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Network-First (if `check_network_first: true`)
+
+- [ ] page.route() before page.goto() validated
+- [ ] Race conditions detected (route after navigate)
+- [ ] waitForResponse patterns checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Assertions (if `check_assertions: true`)
+
+- [ ] Explicit assertions counted
+- [ ] Implicit waits without assertions detected
+- [ ] Assertion specificity validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Test Length (if `check_test_length: true`)
+
+- [ ] File line count calculated
+- [ ] Threshold comparison (≤300 lines ideal)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Splitting recommendations generated (if >300 lines)
+
+#### Test Duration (if `check_test_duration: true`)
+
+- [ ] Test complexity analyzed (as proxy for duration if no execution data)
+- [ ] Threshold comparison (≤1.5 min target)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Optimization recommendations generated
+
+#### Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
+- [ ] Race conditions detected
+- [ ] Timing-dependent assertions detected
+- [ ] Retry logic detected
+- [ ] Environment-dependent assumptions detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+---
+
+### Step 4: Quality Score Calculation
+
+**Violation Counting:**
+
+- [ ] Critical (P0) violations counted
+- [ ] High (P1) violations counted
+- [ ] Medium (P2) violations counted
+- [ ] Low (P3) violations counted
+- [ ] Violation breakdown by criterion recorded
+
+**Score Calculation:**
+
+- [ ] Starting score: 100
+- [ ] Critical violations deducted (-10 each)
+- [ ] High violations deducted (-5 each)
+- [ ] Medium violations deducted (-2 each)
+- [ ] Low violations deducted (-1 each)
+- [ ] Bonus points added (max +30):
+  - [ ] Excellent BDD structure (+5 if applicable)
+  - [ ] Comprehensive fixtures (+5 if applicable)
+  - [ ] Comprehensive data factories (+5 if applicable)
+  - [ ] Network-first pattern (+5 if applicable)
+  - [ ] Perfect isolation (+5 if applicable)
+  - [ ] All test IDs present (+5 if applicable)
+- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
+
+**Quality Grade:**
+
+- [ ] Grade assigned based on score:
+  - 90-100: A+ (Excellent)
+  - 80-89: A (Good)
+  - 70-79: B (Acceptable)
+  - 60-69: C (Needs Improvement)
+  - <60: F (Critical Issues)
+
+---
+
+### Step 5: Review Report Generation
+
+**Report Sections Created:**
+
+- [ ] **Header Section**:
+  - [ ] Test file(s) reviewed listed
+  - [ ] Review date recorded
+  - [ ] Review scope noted (single/directory/suite)
+  - [ ] Quality score and grade displayed
+
+- [ ] **Executive Summary**:
+  - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
+  - [ ] Key strengths listed (3-5 bullet points)
+  - [ ] Key weaknesses listed (3-5 bullet points)
+  - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
+
+- [ ] **Quality Criteria Assessment**:
+  - [ ] Table with all criteria evaluated
+  - [ ] Status for each criterion (PASS/WARN/FAIL)
+  - [ ] Violation count per criterion
+
+- [ ] **Critical Issues (Must Fix)**:
+  - [ ] P0/P1 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended fix provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Recommendations (Should Fix)**:
+  - [ ] P2/P3 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended improvement provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Best Practices Examples** (if good patterns found):
+  - [ ] Good patterns highlighted from tests
+  - [ ] Knowledge base fragments referenced
+  - [ ] Examples provided for others to follow
+
+- [ ] **Knowledge Base References**:
+  - [ ] All fragments consulted listed
+  - [ ] Links to detailed guidance provided
+
+---
+
+### Step 6: Optional Outputs Generation
+
+**Inline Comments** (if `generate_inline_comments: true`):
+
+- [ ] Inline comments generated at violation locations
+- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
+- [ ] Comments added to test files (no logic changes)
+- [ ] Test files remain valid and executable
+
+**Quality Badge** (if `generate_quality_badge: true`):
+
+- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
+- [ ] Badge format suitable for README or documentation
+- [ ] Badge saved to output folder
+
+**Story Update** (if `append_to_story: true` and story file exists):
+
+- [ ] "Test Quality Review" section created
+- [ ] Quality score included
+- [ ] Critical issues summarized
+- [ ] Link to full review report provided
+- [ ] Story file updated successfully
+
+---
+
+### Step 7: Save and Notify
+
+**Outputs Saved:**
+
+- [ ] Review report saved to `{output_file}`
+- [ ] Inline comments written to test files (if enabled)
+- [ ] Quality badge saved (if enabled)
+- [ ] Story file updated (if enabled)
+- [ ] All outputs are valid and readable
+
+**Summary Message Generated:**
+
+- [ ] Quality score and grade included
+- [ ] Critical issue count stated
+- [ ] Recommendation provided (Approve/Request changes/Block)
+- [ ] Next steps clarified
+- [ ] Message displayed to user
+
+---
+
+## Output Validation
+
+### Review Report Completeness
+
+- [ ] All required sections present
+- [ ] No placeholder text or TODOs in report
+- [ ] All code locations are accurate (file:line)
+- [ ] All code examples are valid and demonstrate fix
+- [ ] All knowledge base references are correct
+
+### Review Report Accuracy
+
+- [ ] Quality score matches violation breakdown
+- [ ] Grade matches score range
+- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
+- [ ] Violations correctly attributed to quality criteria
+- [ ] No false positives (violations are legitimate issues)
+- [ ] No false negatives (critical issues not missed)
+
+### Review Report Clarity
+
+- [ ] Executive summary is clear and actionable
+- [ ] Issue explanations are understandable
+- [ ] Recommended fixes are implementable
+- [ ] Code examples are correct and runnable
+- [ ] Recommendation (Approve/Request changes) is clear
+
+---
+
+## Quality Checks
+
+### Knowledge-Based Validation
+
+- [ ] All feedback grounded in knowledge base fragments
+- [ ] Recommendations follow proven patterns
+- [ ] No arbitrary or opinion-based feedback
+- [ ] Knowledge fragment references accurate and relevant
+
+### Actionable Feedback
+
+- [ ] Every issue includes recommended fix
+- [ ] Every fix includes code example
+- [ ] Code examples demonstrate correct pattern
+- [ ] Fixes reference knowledge base for more detail
+
+### Severity Classification
+
+- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
+- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
+- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
+- [ ] Low (P3) issues are minor style/preference (verbose tests)
+
+### Context Awareness
+
+- [ ] Review considers project context (some patterns may be justified)
+- [ ] Violations with justification comments noted as acceptable
+- [ ] Edge cases acknowledged
+- [ ] Recommendations are pragmatic, not dogmatic
+
+---
+
+## Integration Points
+
+### Story File Integration
+
+- [ ] Story file discovered correctly (if available)
+- [ ] Acceptance criteria extracted and used for context
+- [ ] Test quality section appended to story (if enabled)
+- [ ] Link to review report added to story
+
+### Test Design Integration
+
+- [ ] Test design document discovered correctly (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted and used
+- [ ] Review validates tests align with prioritization
+- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
+
+### Knowledge Base Integration
+
+- [ ] tea-index.csv loaded successfully
+- [ ] All required fragments loaded
+- [ ] Fragments applied correctly to validation
+- [ ] Fragment references in report are accurate
+
+---
+
+## Edge Cases and Special Situations
+
+### Empty or Minimal Tests
+
+- [ ] If test file is empty, report notes "No tests found"
+- [ ] If test file has only boilerplate, report notes "No meaningful tests"
+- [ ] Score reflects lack of content appropriately
+
+### Legacy Tests
+
+- [ ] Legacy tests acknowledged in context
+- [ ] Review provides practical recommendations for improvement
+- [ ] Recognizes that complete refactor may not be feasible
+- [ ] Prioritizes critical issues (flakiness) over style
+
+### Test Framework Variations
+
+- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
+- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
+- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
+- [ ] Knowledge fragments applied appropriately for framework
+
+### Justified Violations
+
+- [ ] Violations with justification comments in code noted as acceptable
+- [ ] Justifications evaluated for legitimacy
+- [ ] Report acknowledges justified patterns
+- [ ] Score not penalized for justified violations
+
+---
+
+## Final Validation
+
+### Review Completeness
+
+- [ ] All enabled quality criteria evaluated
+- [ ] All test files in scope reviewed
+- [ ] All violations cataloged
+- [ ] All recommendations provided
+- [ ] Review report is comprehensive
+
+### Review Accuracy
+
+- [ ] Quality score is accurate
+- [ ] Violations are correct (no false positives)
+- [ ] Critical issues not missed (no false negatives)
+- [ ] Code locations are correct
+- [ ] Knowledge base references are accurate
+
+### Review Usefulness
+
+- [ ] Feedback is actionable
+- [ ] Recommendations are implementable
+- [ ] Code examples are correct
+- [ ] Review helps developer improve tests
+- [ ] Review educates on best practices
+
+### Workflow Complete
+
+- [ ] All checklist items completed
+- [ ] All outputs validated and saved
+- [ ] User notified with summary
+- [ ] Review ready for developer consumption
+- [ ] Follow-up actions identified (if any)
+
+---
+
+## Notes
+
+Record any issues, observations, or important context during workflow execution:
+
+- **Test Framework**: [Playwright, Jest, Cypress, etc.]
+- **Review Scope**: [single file, directory, full suite]
+- **Quality Score**: [0-100 score, letter grade]
+- **Critical Issues**: [Count of P0/P1 violations]
+- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
+- **Special Considerations**: [Legacy code, justified patterns, edge cases]
+- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]
--- a/src/modules/bmm/workflows/testarch/test-review/instructions.md
+++ b/src/modules/bmm/workflows/testarch/test-review/instructions.md
@@ -0,0 +1,604 @@
+# Test Quality Review - Instructions v4.0
+
+**Workflow:** `testarch-test-review`
+**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)
+
+---
+
+## Overview
+
+This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
+
+**Key Capabilities:**
+
+- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
+- **Quality Scoring**: 0-100 score based on violations and best practices
+- **Multi-Scope**: Review single file, directory, or entire test suite
+- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Integration**: Works with story files, test-design, acceptance criteria
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Test file(s) to review (auto-discovered or explicitly provided)
+- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
+
+**Recommended:**
+
+- Story file with acceptance criteria (for context)
+- Test design document (for priority context)
+- Knowledge base fragments available in tea-index.csv
+
+**Halt Conditions:**
+
+- If test file path is invalid or file doesn't exist, halt and request correction
+- If test_dir is empty (no tests found), halt and notify user
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `test-quality.md` - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
+   - `fixture-architecture.md` - Pure function → Fixture → mergeTests pattern
+   - `network-first.md` - Route intercept before navigate (race condition prevention)
+   - `data-factories.md` - Factory functions with overrides, API-first setup
+   - `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness
+   - `playwright-config.md` - Environment-based configuration (if Playwright detected)
+   - `tdd-cycles.md` - Red-Green-Refactor patterns
+   - `selective-testing.md` - Duplicate coverage detection
+
+2. Determine review scope:
+   - **single**: Review one test file (`test_file_path` provided)
+   - **directory**: Review all tests in directory (`test_dir` provided)
+   - **suite**: Review entire test suite (discover all test files)
+
+3. Auto-discover related artifacts (if `auto_discover_story: true`):
+   - Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
+   - Search for story file (`story-1.3.md`)
+   - Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
+
+4. Read story file for context (if available):
+   - Extract acceptance criteria
+   - Extract priority classification
+   - Extract expected test IDs
+
+**Output:** Complete knowledge base loaded, review scope determined, context gathered
+
+---
+
+### Step 2: Discover and Parse Test Files
+
+**Actions:**
+
+1. **Discover test files** based on scope:
+   - **single**: Use `test_file_path` variable
+   - **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
+   - **suite**: Use `glob` to find all test files recursively from project root
+
+2. **Parse test file metadata**:
+   - File path and name
+   - File size (warn if >15 KB or >300 lines)
+   - Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+   - Imports and dependencies
+   - Test structure (describe/context/it blocks)
+
+3. **Extract test structure**:
+   - Count of describe blocks (test suites)
+   - Count of it/test blocks (individual tests)
+   - Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
+   - Priority markers (if present, e.g., `test.describe.only` for P0)
+   - BDD structure (Given-When-Then comments or steps)
+
+4. **Identify test patterns**:
+   - Fixtures used
+   - Data factories used
+   - Network interception patterns
+   - Assertions used (expect, assert, toHaveText, etc.)
+   - Waits and timeouts (page.waitFor, sleep, hardcoded delays)
+   - Conditionals (if/else, switch, ternary)
+   - Try/catch blocks
+   - Shared state or globals
+
+**Output:** Complete test file inventory with structure and pattern analysis
+
+---
+
+### Step 3: Validate Against Quality Criteria
+
+**Actions:**
+
+For each test file, validate against quality criteria (configurable via workflow variables):
+
+#### 1. BDD Format Validation (if `check_given_when_then: true`)
+
+- ✅ **PASS**: Tests use Given-When-Then structure (comments or step organization)
+- ⚠️ **WARN**: Tests have some structure but not explicit GWT
+- ❌ **FAIL**: Tests lack clear structure, hard to understand intent
+
+**Knowledge Fragment**: test-quality.md, tdd-cycles.md
+
+---
+
+#### 2. Test ID Conventions (if `check_test_ids: true`)
+
+- ✅ **PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
+- ⚠️ **WARN**: Some test IDs missing or inconsistent
+- ❌ **FAIL**: No test IDs, can't trace tests to requirements
+
+**Knowledge Fragment**: traceability.md, test-quality.md
+
+---
+
+#### 3. Priority Markers (if `check_priority_markers: true`)
+
+- ✅ **PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
+- ⚠️ **WARN**: Some priority classifications missing
+- ❌ **FAIL**: No priority classification, can't determine criticality
+
+**Knowledge Fragment**: test-priorities.md, risk-governance.md
+
+---
+
+#### 4. Hard Waits Detection (if `check_hard_waits: true`)
+
+- ✅ **PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
+- ⚠️ **WARN**: Some hard waits used but with justification comments
+- ❌ **FAIL**: Hard waits detected without justification (flakiness risk)
+
+**Patterns to detect:**
+
+- `sleep(1000)`, `setTimeout()`, `delay()`
+- `page.waitForTimeout(5000)` without explicit reason
+- `await new Promise(resolve => setTimeout(resolve, 3000))`
+
+**Knowledge Fragment**: test-quality.md, network-first.md
+
+---
+
+#### 5. Determinism Check (if `check_determinism: true`)
+
+- ✅ **PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
+- ⚠️ **WARN**: Some conditionals but with clear justification
+- ❌ **FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
+
+**Patterns to detect:**
+
+- `if (condition) { test logic }` - tests should work deterministically
+- `try { test } catch { fallback }` - tests shouldn't swallow errors
+- `Math.random()`, `Date.now()` without factory abstraction
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 6. Isolation Validation (if `check_isolation: true`)
+
+- ✅ **PASS**: Tests clean up resources, no shared state, can run in any order
+- ⚠️ **WARN**: Some cleanup missing but isolated enough
+- ❌ **FAIL**: Tests share state, depend on execution order, leave resources
+
+**Patterns to check:**
+
+- afterEach/afterAll cleanup hooks present
+- No global variables mutated
+- Database/API state cleaned up after tests
+- Test data deleted or marked inactive
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
+
+- ✅ **PASS**: Uses pure function → Fixture → mergeTests pattern
+- ⚠️ **WARN**: Some fixtures used but not consistently
+- ❌ **FAIL**: No fixtures, tests repeat setup code (maintainability risk)
+
+**Patterns to check:**
+
+- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
+- Pure functions used for fixture logic
+- mergeTests used to combine fixtures
+- No beforeEach with complex setup (should be in fixtures)
+
+**Knowledge Fragment**: fixture-architecture.md
+
+---
+
+#### 8. Data Factories (if `check_data_factories: true`)
+
+- ✅ **PASS**: Uses factory functions with overrides, API-first setup
+- ⚠️ **WARN**: Some factories used but also hardcoded data
+- ❌ **FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
+
+**Patterns to check:**
+
+- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
+- Factories use faker.js or similar for realistic data
+- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
+- API-first setup (create via API, test via UI)
+
+**Knowledge Fragment**: data-factories.md
+
+---
+
+#### 9. Network-First Pattern (if `check_network_first: true`)
+
+- ✅ **PASS**: Route interception set up BEFORE navigation (race condition prevention)
+- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
+- ❌ **FAIL**: Route interception after navigation (race condition risk)
+
+**Patterns to check:**
+
+- `page.route()` called before `page.goto()`
+- `page.waitForResponse()` used with explicit URL pattern
+- No navigation followed immediately by route setup
+
+**Knowledge Fragment**: network-first.md
+
+---
+
+#### 10. Assertions (if `check_assertions: true`)
+
+- ✅ **PASS**: Explicit assertions present (expect, assert, toHaveText)
+- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
+- ❌ **FAIL**: Missing assertions, tests don't verify behavior
+
+**Patterns to check:**
+
+- Each test has at least one assertion
+- Assertions are specific (not just truthy checks)
+- Assertions use framework-provided matchers (toHaveText, toBeVisible)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 11. Test Length (if `check_test_length: true`)
+
+- ✅ **PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
+- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
+- ❌ **FAIL**: Test file >500 lines (too large, maintainability risk)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 12. Test Duration (if `check_test_duration: true`)
+
+- ✅ **PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
+- ⚠️ **WARN**: Some tests 1.5-3 minutes (consider optimization)
+- ❌ **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
+
+**Note:** Duration estimation based on complexity analysis if execution data unavailable
+
+**Knowledge Fragment**: test-quality.md, selective-testing.md
+
+---
+
+#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- ✅ **PASS**: No known flaky patterns detected
+- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
+- ❌ **FAIL**: Multiple flaky patterns detected (high flakiness risk)
+
+**Patterns to detect:**
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions (e.g., checking timestamps)
+- Retry logic in tests (hides flakiness)
+- Environment-dependent assumptions (hardcoded URLs, ports)
+
+**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+### Step 4: Calculate Quality Score
+
+**Actions:**
+
+1. **Count violations** by severity:
+   - **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
+   - **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
+   - **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
+   - **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
+
+2. **Calculate quality score** (if `quality_score_enabled: true`):
+
+```
+Starting Score: 100
+
+Critical Violations: -10 points each
+High Violations: -5 points each
+Medium Violations: -2 points each
+Low Violations: -1 point each
+
+Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
+
+Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+3. **Quality Grade**:
+   - **90-100**: Excellent (A+)
+   - **80-89**: Good (A)
+   - **70-79**: Acceptable (B)
+   - **60-69**: Needs Improvement (C)
+   - **<60**: Critical Issues (F)
+
+**Output:** Quality score calculated with violation breakdown
+
+---
+
+### Step 5: Generate Review Report
+
+**Actions:**
+
+1. **Create review report** using `test-review-template.md`:
+
+   **Header Section:**
+   - Test file(s) reviewed
+   - Review date
+   - Review scope (single/directory/suite)
+   - Quality score and grade
+
+   **Executive Summary:**
+   - Overall assessment (Excellent/Good/Needs Improvement/Critical)
+   - Key strengths
+   - Key weaknesses
+   - Recommendation (Approve/Approve with comments/Request changes)
+
+   **Quality Criteria Assessment:**
+   - Table with all criteria evaluated
+   - Status for each (PASS/WARN/FAIL)
+   - Violation count per criterion
+
+   **Critical Issues (Must Fix):**
+   - Priority P0/P1 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended fix
+   - Knowledge base reference
+
+   **Recommendations (Should Fix):**
+   - Priority P2/P3 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended improvement
+   - Knowledge base reference
+
+   **Best Practices Examples:**
+   - Highlight good patterns found in tests
+   - Reference knowledge base fragments
+   - Provide examples for others to follow
+
+   **Knowledge Base References:**
+   - List all fragments consulted
+   - Provide links to detailed guidance
+
+2. **Generate inline comments** (if `generate_inline_comments: true`):
+   - Add TODO comments in test files at violation locations
+   - Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
+   - Never modify test logic, only add comments
+
+3. **Generate quality badge** (if `generate_quality_badge: true`):
+   - Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
+   - Format for inclusion in README or documentation
+
+4. **Append to story file** (if `append_to_story: true` and story file exists):
+   - Add "Test Quality Review" section to story
+   - Include quality score and critical issues
+   - Link to full review report
+
+**Output:** Comprehensive review report with actionable feedback
+
+---
+
+### Step 6: Save Outputs and Notify
+
+**Actions:**
+
+1. **Save review report** to `{output_file}`
+2. **Save inline comments** to test files (if enabled)
+3. **Save quality badge** to output folder (if enabled)
+4. **Update story file** (if enabled)
+5. **Generate summary message** for user:
+   - Quality score and grade
+   - Critical issue count
+   - Recommendation
+
+**Output:** All review artifacts saved and user notified
+
+---
+
+## Quality Criteria Decision Matrix
+
+| Criterion          | PASS                      | WARN           | FAIL                | Knowledge Fragment      |
+| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
+| BDD Format         | Given-When-Then present   | Some structure | No structure        | test-quality.md         |
+| Test IDs           | All tests have IDs        | Some missing   | No IDs              | traceability.md         |
+| Priority Markers   | All classified            | Some missing   | No classification   | test-priorities.md      |
+| Hard Waits         | No hard waits             | Some justified | Hard waits present  | test-quality.md         |
+| Determinism        | No conditionals/random    | Some justified | Conditionals/random | test-quality.md         |
+| Isolation          | Clean up, no shared state | Some gaps      | Shared state        | test-quality.md         |
+| Fixture Patterns   | Pure fn → Fixture         | Some fixtures  | No fixtures         | fixture-architecture.md |
+| Data Factories     | Factory functions         | Some factories | Hardcoded data      | data-factories.md       |
+| Network-First      | Intercept before navigate | Some correct   | Race conditions     | network-first.md        |
+| Assertions         | Explicit assertions       | Some implicit  | Missing assertions  | test-quality.md         |
+| Test Length        | ≤300 lines                | 301-500 lines  | >500 lines          | test-quality.md         |
+| Test Duration      | ≤1.5 min                  | 1.5-3 min      | >3 min              | test-quality.md         |
+| Flakiness Patterns | No flaky patterns         | Some potential | Multiple patterns   | ci-burn-in.md           |
+
+---
+
+## Example Review Summary
+
+````markdown
+# Test Quality Review: auth-login.spec.ts
+
+**Quality Score**: 78/100 (B - Acceptable)
+**Review Date**: 2025-10-14
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
+
+**Strengths:**
+
+- Excellent BDD structure with clear Given-When-Then comments
+- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
+- Comprehensive assertions on authentication state
+
+**Weaknesses:**
+
+- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
+- Hardcoded test data (email: 'test@example.com') - use factories instead
+- Missing fixture for common login setup - DRY violation
+
+**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
+
+## Critical Issues (Must Fix)
+
+### 1. Hard Wait Detected (Line 45)
+
+**Severity**: P0 (Critical)
+**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
+**Fix**: Use explicit wait for element or network request instead
+**Knowledge**: See test-quality.md, network-first.md
+
+```typescript
+// ❌ Bad (current)
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+
+// ✅ Good (recommended)
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+````
+
+## Recommendations (Should Fix)
+
+### 1. Use Data Factory for Test User (Lines 23, 32, 41)
+
+**Severity**: P1 (High)
+**Issue**: Hardcoded email 'test@example.com' - maintainability risk
+**Fix**: Create factory function for test users
+**Knowledge**: See data-factories.md
+
+```typescript
+// ✅ Good (recommended)
+import { createTestUser } from './factories/user-factory';
+
+const testUser = createTestUser({ role: 'admin' });
+await loginPage.login(testUser.email, testUser.password);
+```
+
+### 2. Extract Login Setup to Fixture (Lines 18-28)
+
+**Severity**: P1 (High)
+**Issue**: Login setup repeated across tests - DRY violation
+**Fix**: Create fixture for authenticated state
+**Knowledge**: See fixture-architecture.md
+
+```typescript
+// ✅ Good (recommended)
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    await loginPage.login(user.email, user.password);
+    await use(page);
+  },
+});
+
+test('user can access dashboard', async ({ authenticatedPage }) => {
+  // Test starts already logged in
+});
+```
+
+## Quality Score Breakdown
+
+- Starting Score: 100
+- Critical Violations (1 × -10): -10
+- High Violations (2 × -5): -10
+- Medium Violations (0 × -2): 0
+- Low Violations (1 × -1): -1
+- Bonus (BDD +5, Test IDs +5): +10
+- **Final Score**: 78/100 (B)
+
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+- **atdd**: Generate acceptance tests (TEA reviews them for quality)
+- **automate**: Expand regression suite (TEA reviews new tests)
+- **dev story**: Developer writes implementation tests (TEA reviews them)
+
+### After Test Review
+
+- **Developer**: Addresses critical issues, improves based on recommendations
+- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria context
+- **Test Design**: Review validates tests align with prioritization
+- **Knowledge Base**: Review references fragments for detailed guidance
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified for specific scenarios
+3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
+4. **Actionable**: Every issue includes recommended fix with code examples
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review same tests periodically as patterns evolve
+
+---
+
+## Troubleshooting
+
+**Problem: No test files found**
+- Verify test_dir path is correct
+- Check test file extensions match glob pattern
+- Ensure test files exist in expected location
+
+**Problem: Quality score seems too low/high**
+- Review violation counts - may need to adjust thresholds
+- Consider context - some projects have different standards
+- Focus on critical issues first, not just score
+
+**Problem: Inline comments not generated**
+- Check generate_inline_comments: true in variables
+- Verify write permissions on test files
+- Review append_to_file: false (separate report mode)
+
+**Problem: Knowledge fragments not loading**
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct
+- Ensure auto_load_knowledge: true in variables
+```
--- a/src/modules/bmm/workflows/testarch/test-review/test-review-template.md
+++ b/src/modules/bmm/workflows/testarch/test-review/test-review-template.md
@@ -0,0 +1,388 @@
+# Test Quality Review: {test_filename}
+
+**Quality Score**: {score}/100 ({grade} - {assessment})
+**Review Date**: {YYYY-MM-DD}
+**Review Scope**: {single | directory | suite}
+**Reviewer**: {user_name or TEA Agent}
+
+---
+
+## Executive Summary
+
+**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+### Key Strengths
+
+✅ {strength_1}
+✅ {strength_2}
+✅ {strength_3}
+
+### Key Weaknesses
+
+❌ {weakness_1}
+❌ {weakness_2}
+❌ {weakness_3}
+
+### Summary
+
+{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
+
+---
+
+## Quality Criteria Assessment
+
+| Criterion                            | Status                          | Violations | Notes        |
+| ------------------------------------ | ------------------------------- | ---------- | ------------ |
+| BDD Format (Given-When-Then)         | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test IDs                             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Priority Markers (P0/P1/P2/P3)       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Hard Waits (sleep, waitForTimeout)   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Determinism (no conditionals)        | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Fixture Patterns                     | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Data Factories                       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Network-First Pattern                | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Explicit Assertions                  | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test Length (≤300 lines)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines}    | {brief_note} |
+| Test Duration (≤1.5 min)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
+| Flakiness Patterns                   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+
+**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
+
+---
+
+## Quality Score Breakdown
+
+```
+Starting Score:          100
+Critical Violations:     -{critical_count} × 10 = -{critical_deduction}
+High Violations:         -{high_count} × 5 = -{high_deduction}
+Medium Violations:       -{medium_count} × 2 = -{medium_deduction}
+Low Violations:          -{low_count} × 1 = -{low_deduction}
+
+Bonus Points:
+  Excellent BDD:         +{0|5}
+  Comprehensive Fixtures: +{0|5}
+  Data Factories:        +{0|5}
+  Network-First:         +{0|5}
+  Perfect Isolation:     +{0|5}
+  All Test IDs:          +{0|5}
+                         --------
+Total Bonus:             +{bonus_total}
+
+Final Score:             {final_score}/100
+Grade:                   {grade}
+```
+
+---
+
+## Critical Issues (Must Fix)
+
+{If no critical issues: "No critical issues detected. ✅"}
+
+{For each critical issue:}
+
+### {issue_number}. {Issue Title}
+
+**Severity**: P0 (Critical)
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what the problem is and why it's critical}
+
+**Current Code**:
+
+```typescript
+// ❌ Bad (current implementation)
+{
+  code_snippet_showing_problem;
+}
+```
+
+**Recommended Fix**:
+
+```typescript
+// ✅ Good (recommended approach)
+{
+  code_snippet_showing_solution;
+}
+```
+
+**Why This Matters**:
+{Explanation of impact - flakiness risk, maintainability, reliability}
+
+**Related Violations**:
+{If similar issue appears elsewhere, note line numbers}
+
+---
+
+## Recommendations (Should Fix)
+
+{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
+
+{For each recommendation:}
+
+### {rec_number}. {Recommendation Title}
+
+**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what could be improved and why}
+
+**Current Code**:
+
+```typescript
+// ⚠️ Could be improved (current implementation)
+{
+  code_snippet_showing_current_approach;
+}
+```
+
+**Recommended Improvement**:
+
+```typescript
+// ✅ Better approach (recommended)
+{
+  code_snippet_showing_improvement;
+}
+```
+
+**Benefits**:
+{Explanation of benefits - maintainability, readability, reusability}
+
+**Priority**:
+{Why this is P1/P2/P3 - urgency and impact}
+
+---
+
+## Best Practices Found
+
+{If good patterns found, highlight them}
+
+{For each best practice:}
+
+### {practice_number}. {Best Practice Title}
+
+**Location**: `{filename}:{line_number}`
+**Pattern**: {pattern_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Why This Is Good**:
+{Explanation of why this pattern is excellent}
+
+**Code Example**:
+
+```typescript
+// ✅ Excellent pattern demonstrated in this test
+{
+  code_snippet_showing_best_practice;
+}
+```
+
+**Use as Reference**:
+{Encourage using this pattern in other tests}
+
+---
+
+## Test File Analysis
+
+### File Metadata
+
+- **File Path**: `{relative_path_from_project_root}`
+- **File Size**: {line_count} lines, {kb_size} KB
+- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
+- **Language**: {TypeScript | JavaScript}
+
+### Test Structure
+
+- **Describe Blocks**: {describe_count}
+- **Test Cases (it/test)**: {test_count}
+- **Average Test Length**: {avg_lines_per_test} lines per test
+- **Fixtures Used**: {fixture_count} ({fixture_names})
+- **Data Factories Used**: {factory_count} ({factory_names})
+
+### Test Coverage Scope
+
+- **Test IDs**: {test_id_list}
+- **Priority Distribution**:
+  - P0 (Critical): {p0_count} tests
+  - P1 (High): {p1_count} tests
+  - P2 (Medium): {p2_count} tests
+  - P3 (Low): {p3_count} tests
+  - Unknown: {unknown_count} tests
+
+### Assertions Analysis
+
+- **Total Assertions**: {assertion_count}
+- **Assertions per Test**: {avg_assertions_per_test} (avg)
+- **Assertion Types**: {assertion_types_used}
+
+---
+
+## Context and Integration
+
+### Related Artifacts
+
+{If story file found:}
+
+- **Story File**: [{story_filename}]({story_path})
+- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
+
+{If test-design found:}
+
+- **Test Design**: [{test_design_filename}]({test_design_path})
+- **Risk Assessment**: {risk_level}
+- **Priority Framework**: P0-P3 applied
+
+### Acceptance Criteria Validation
+
+{If story file available, map tests to ACs:}
+
+| Acceptance Criterion | Test ID   | Status                     | Notes   |
+| -------------------- | --------- | -------------------------- | ------- |
+| {AC_1}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_2}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_3}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+
+**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
+
+---
+
+## Knowledge Base References
+
+This review consulted the following knowledge base fragments:
+
+- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function → Fixture → mergeTests pattern
+- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
+- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
+- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
+- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
+- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
+- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
+- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
+- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
+
+See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
+
+---
+
+## Next Steps
+
+### Immediate Actions (Before Merge)
+
+1. **{action_1}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+2. **{action_2}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+### Follow-up Actions (Future PRs)
+
+1. **{action_1}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+2. **{action_2}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+### Re-Review Needed?
+
+{✅ No re-review needed - approve as-is}
+{⚠️ Re-review after critical fixes - request changes, then re-review}
+{❌ Major refactor required - block merge, pair programming recommended}
+
+---
+
+## Decision
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+**Rationale**:
+{1-2 paragraph explanation of recommendation based on findings}
+
+**For Approve**:
+
+> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
+
+**For Approve with Comments**:
+
+> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
+
+**For Request Changes**:
+
+> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
+
+**For Block**:
+
+> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
+
+---
+
+## Appendix
+
+### Violation Summary by Location
+
+{Table of all violations sorted by line number:}
+
+| Line   | Severity      | Criterion   | Issue         | Fix         |
+| ------ | ------------- | ----------- | ------------- | ----------- |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+
+### Quality Trends
+
+{If reviewing same file multiple times, show trend:}
+
+| Review Date  | Score         | Grade     | Critical Issues | Trend       |
+| ------------ | ------------- | --------- | --------------- | ----------- |
+| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1}       | ⬆️ Improved |
+| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2}       | ⬇️ Declined |
+| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3}       | ➡️ Stable   |
+
+### Related Reviews
+
+{If reviewing multiple files in directory/suite:}
+
+| File     | Score       | Grade   | Critical | Status             |
+| -------- | ----------- | ------- | -------- | ------------------ |
+| {file_1} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_2} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_3} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+
+**Suite Average**: {avg_score}/100 ({avg_grade})
+
+---
+
+## Review Metadata
+
+**Generated By**: BMad TEA Agent (Test Architect)
+**Workflow**: testarch-test-review v4.0
+**Review ID**: test-review-{filename}-{YYYYMMDD}
+**Timestamp**: {YYYY-MM-DD HH:MM:SS}
+**Version**: 1.0
+
+---
+
+## Feedback on This Review
+
+If you have questions or feedback on this review:
+
+1. Review patterns in knowledge base: `testarch/knowledge/`
+2. Consult tea-index.csv for detailed guidance
+3. Request clarification on specific violations
+4. Pair with QA engineer to apply patterns
+
+This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.
--- a/src/modules/bmm/workflows/testarch/test-review/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/test-review/workflow.yaml
@@ -0,0 +1,99 @@
+# Test Architect workflow: test-review
+name: testarch-test-review
+description: "Review test quality using comprehensive knowledge base and best practices validation"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-review"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-review-template.md"
+
+# Variables and inputs
+variables:
+  # Review target
+  test_file_path: "" # Explicit test file to review (if not provided, auto-discover)
+  test_dir: "{project-root}/tests"
+  review_scope: "single" # single (one file), directory (folder), suite (all tests)
+
+  # Review configuration
+  quality_score_enabled: true # Calculate 0-100 quality score
+  append_to_file: false # true = inline comments, false = separate report
+  check_against_knowledge: true # Use tea-index.csv fragments for validation
+  strict_mode: false # Strict = fail on any violation, Relaxed = advisory only
+
+  # Quality criteria to check
+  check_given_when_then: true # BDD format validation
+  check_test_ids: true # Test ID conventions (e.g., 1.3-E2E-001)
+  check_priority_markers: true # P0/P1/P2/P3 classification
+  check_hard_waits: true # Detect sleep(), wait(X), hardcoded delays
+  check_determinism: true # No conditionals (if/else), no try/catch abuse
+  check_isolation: true # Tests clean up, no shared state
+  check_fixture_patterns: true # Pure function → Fixture → mergeTests
+  check_data_factories: true # Factory usage vs hardcoded data
+  check_network_first: true # Route intercept before navigate
+  check_assertions: true # Explicit assertions, not implicit waits
+  check_test_length: true # Warn if >300 lines per file
+  check_test_duration: true # Warn if individual test >1.5 min
+  check_flakiness_patterns: true # Common flaky patterns (race conditions, timing)
+
+  # Integration with BMad artifacts
+  use_story_file: true # Load story for context (acceptance criteria)
+  use_test_design: true # Load test-design for priority context
+  auto_discover_story: true # Find related story by test ID
+
+  # Output configuration
+  output_file: "{output_folder}/test-review-{filename}.md"
+  generate_inline_comments: false # Add TODO comments in test files
+  generate_quality_badge: true # Create quality badge/score
+  append_to_story: false # Add review section to story file
+
+  # Knowledge base fragments to load
+  knowledge_fragments:
+    - test-quality.md # Definition of Done for tests
+    - fixture-architecture.md # Pure function → Fixture patterns
+    - network-first.md # Route interception before navigation
+    - data-factories.md # Factory patterns and best practices
+    - test-levels-framework.md # E2E vs API vs Component vs Unit
+    - playwright-config.md # Configuration patterns (if Playwright)
+    - tdd-cycles.md # Red-Green-Refactor patterns
+    - selective-testing.md # Duplicate coverage detection
+
+# Output configuration
+default_output_file: "{output_folder}/test-review.md"
+
+# Required tools
+required_tools:
+  - read_file # Read test files, story, test-design
+  - write_file # Create review report
+  - list_files # Discover test files in directory
+  - search_repo # Find tests by patterns
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - test_file: "Test file to review (single file mode)"
+  - test_dir: "Directory of tests to review (directory mode)"
+  - story: "Related story for acceptance criteria context (optional)"
+  - test_design: "Test design for priority context (optional)"
+
+tags:
+  - qa
+  - test-architect
+  - code-review
+  - quality
+  - best-practices
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true # Can review multiple files
+
+web_bundle: false
--- a/src/modules/bmm/workflows/testarch/trace/README.md
+++ b/src/modules/bmm/workflows/testarch/trace/README.md
@@ -0,0 +1,375 @@
+# Requirements Traceability Workflow
+
+**Workflow ID:** `testarch-trace`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *trace`
+
+---
+
+## Overview
+
+The **trace** workflow generates a comprehensive requirements-to-tests traceability matrix that maps acceptance criteria to implemented tests, identifies coverage gaps, and provides actionable recommendations for improving test coverage.
+
+**Key Features:**
+
+- Maps acceptance criteria to specific test cases across all levels (E2E, API, Component, Unit)
+- Classifies coverage status (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- Prioritizes gaps by risk level (P0/P1/P2/P3)
+- Generates CI/CD-ready YAML snippets for quality gates
+- Detects duplicate coverage across test levels
+- Verifies test quality (assertions, structure, performance)
+
+---
+
+## When to Use This Workflow
+
+Use `*trace` when you need to:
+
+- ✅ Validate that all acceptance criteria have test coverage
+- ✅ Identify coverage gaps before release or PR merge
+- ✅ Generate traceability documentation for compliance or audits
+- ✅ Ensure critical paths (P0/P1) are fully tested
+- ✅ Detect duplicate coverage across test levels
+- ✅ Assess test quality across your suite
+- ✅ Create gate-ready metrics for CI/CD pipelines
+
+**Typical Timing:**
+
+- After tests are implemented (post-ATDD or post-development)
+- Before merging a PR (validate P0/P1 coverage)
+- Before release (validate full coverage)
+- During sprint retrospectives (assess test quality)
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Acceptance criteria (from story file OR inline)
+- Implemented test suite (or acknowledged gaps)
+
+**Recommended:**
+
+- `test-design.md` - Risk assessment and test priorities
+- `tech-spec.md` - Technical implementation details
+- Test framework configuration (playwright.config.ts, jest.config.js)
+
+**Halt Conditions:**
+
+- Story lacks any tests AND gaps are not acknowledged → Run `*atdd` first
+- Acceptance criteria are completely missing → Provide criteria or story file
+
+---
+
+## Usage
+
+### Basic Usage (BMad Mode)
+
+```bash
+bmad tea *trace
+```
+
+The workflow will:
+
+1. Read story file from `bmad/output/story-X.X.md`
+2. Extract acceptance criteria
+3. Auto-discover tests for this story
+4. Generate traceability matrix
+5. Save to `bmad/output/traceability-matrix.md`
+
+### Standalone Mode (No Story File)
+
+```bash
+bmad tea *trace --acceptance-criteria "AC-1: User can login with email..."
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *trace \
+  --story-file "bmad/output/story-1.3.md" \
+  --output-file "docs/qa/trace-1.3.md" \
+  --min-p0-coverage 100 \
+  --min-p1-coverage 90
+```
+
+---
+
+## Workflow Steps
+
+1. **Load Context** - Read story, test design, tech spec, knowledge base
+2. **Discover Tests** - Auto-find tests related to story (by ID, describe blocks, file paths)
+3. **Map Criteria** - Link acceptance criteria to specific test cases
+4. **Analyze Gaps** - Identify missing coverage and prioritize by risk
+5. **Verify Quality** - Check test quality (assertions, structure, performance)
+6. **Generate Deliverables** - Create traceability matrix, gate YAML, coverage badge
+
+---
+
+## Outputs
+
+### Traceability Matrix (`traceability-matrix.md`)
+
+Comprehensive markdown file with:
+
+- Coverage summary table (by priority)
+- Detailed criterion-to-test mapping
+- Gap analysis with recommendations
+- Quality assessment for each test
+- Gate YAML snippet
+
+### Gate YAML Snippet
+
+```yaml
+traceability:
+  story_id: '1.3'
+  coverage:
+    overall: 85%
+    p0: 100%
+    p1: 90%
+  gaps:
+    critical: 0
+    high: 1
+  status: 'PASS'
+```
+
+### Updated Story File (Optional)
+
+Adds "Traceability" section to story markdown with:
+
+- Link to traceability matrix
+- Coverage summary
+- Gate status
+
+---
+
+## Coverage Classifications
+
+- **FULL** ✅ - All scenarios validated at appropriate level(s)
+- **PARTIAL** ⚠️ - Some coverage but missing edge cases or levels
+- **NONE** ❌ - No test coverage at any level
+- **UNIT-ONLY** ⚠️ - Only unit tests (missing integration/E2E validation)
+- **INTEGRATION-ONLY** ⚠️ - Only API/Component tests (missing unit confidence)
+
+---
+
+## Quality Gates
+
+| Priority | Coverage Requirement | Severity | Action             |
+| -------- | -------------------- | -------- | ------------------ |
+| P0       | 100%                 | BLOCKER  | Do not release     |
+| P1       | 90%                  | HIGH     | Block PR merge     |
+| P2       | 80% (recommended)    | MEDIUM   | Address in nightly |
+| P3       | No requirement       | LOW      | Optional           |
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # Target specification
+  story_file: '' # Path to story markdown
+  acceptance_criteria: '' # Inline criteria if no story
+
+  # Test discovery
+  test_dir: '{project-root}/tests'
+  auto_discover_tests: true
+
+  # Traceability configuration
+  coverage_levels: 'e2e,api,component,unit'
+  map_by_test_id: true
+  map_by_describe: true
+  map_by_filename: true
+
+  # Gap analysis
+  prioritize_by_risk: true
+  suggest_missing_tests: true
+  check_duplicate_coverage: true
+
+  # Output configuration
+  output_file: '{output_folder}/traceability-matrix.md'
+  generate_gate_yaml: true
+  generate_coverage_badge: true
+  update_story_file: true
+
+  # Quality gates
+  min_p0_coverage: 100
+  min_p1_coverage: 90
+  min_overall_coverage: 80
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+- `traceability.md` - Requirements mapping patterns
+- `test-priorities.md` - P0/P1/P2/P3 risk framework
+- `risk-governance.md` - Risk-based testing approach
+- `test-quality.md` - Definition of Done for tests
+- `selective-testing.md` - Duplicate coverage patterns
+
+---
+
+## Examples
+
+### Example 1: Full Coverage Validation
+
+```bash
+# Validate P0/P1 coverage before PR merge
+bmad tea *trace --story-file "bmad/output/story-1.3.md"
+```
+
+**Output:**
+
+```markdown
+# Traceability Matrix - Story 1.3
+
+## Coverage Summary
+
+| Priority | Total | FULL | Coverage % | Status  |
+| -------- | ----- | ---- | ---------- | ------- |
+| P0       | 3     | 3    | 100%       | ✅ PASS |
+| P1       | 5     | 5    | 100%       | ✅ PASS |
+
+Gate Status: PASS ✅
+```
+
+### Example 2: Gap Identification
+
+```bash
+# Find coverage gaps for existing feature
+bmad tea *trace --target-feature "user-authentication"
+```
+
+**Output:**
+
+```markdown
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+- None ✅
+
+### High Priority Gaps (PR BLOCKER)
+
+1. **AC-3: Password reset email edge cases**
+   - Recommend: Add 1.3-API-001 (email service integration)
+   - Impact: Users may not recover accounts in error scenarios
+```
+
+### Example 3: Duplicate Coverage Detection
+
+```bash
+# Check for redundant tests
+bmad tea *trace --check-duplicate-coverage true
+```
+
+**Output:**
+
+```markdown
+## Duplicate Coverage Detected
+
+⚠️ AC-1 (login validation) is tested at multiple levels:
+
+- 1.3-E2E-001 (full user journey) ✅ Appropriate
+- 1.3-UNIT-001 (business logic) ✅ Appropriate
+- 1.3-COMPONENT-001 (form validation) ⚠️ Redundant with UNIT-001
+
+Recommendation: Remove 1.3-COMPONENT-001 or consolidate with UNIT-001
+```
+
+---
+
+## Troubleshooting
+
+### "No tests found for this story"
+
+- Run `*atdd` workflow first to generate failing acceptance tests
+- Check test file naming conventions (may not match story ID pattern)
+- Verify test directory path is correct (`test_dir` variable)
+
+### "Cannot determine coverage status"
+
+- Tests may lack explicit mapping (no test IDs, unclear describe blocks)
+- Add test IDs: `{STORY_ID}-{LEVEL}-{SEQ}` (e.g., `1.3-E2E-001`)
+- Use Given-When-Then narrative in test descriptions
+
+### "P0 coverage below 100%"
+
+- This is a **BLOCKER** - do not release
+- Identify missing P0 tests in gap analysis
+- Run `*atdd` workflow to generate missing tests
+- Verify P0 classification is correct with stakeholders
+
+### "Duplicate coverage detected"
+
+- Review `selective-testing.md` knowledge fragment
+- Determine if overlap is acceptable (defense in depth) or wasteful
+- Consolidate tests at appropriate level (logic → unit, journey → E2E)
+
+---
+
+## Integration with Other Workflows
+
+- **testarch-test-design** → `*trace` - Define priorities, then trace coverage
+- **testarch-atdd** → `*trace` - Generate tests, then validate coverage
+- `*trace` → **testarch-automate** - Identify gaps, then automate regression
+- `*trace` → **testarch-gate** - Generate metrics, then apply quality gates
+- `*trace` → **testarch-test-review** - Flag quality issues, then review tests
+
+---
+
+## Best Practices
+
+1. **Run Trace After Test Implementation**
+   - Don't run `*trace` before tests exist (run `*atdd` first)
+   - Trace is most valuable after initial test suite is written
+
+2. **Prioritize by Risk**
+   - P0 gaps are BLOCKERS (must fix before release)
+   - P1 gaps are HIGH priority (block PR merge)
+   - P3 gaps are acceptable (fix if time permits)
+
+3. **Explicit Mapping**
+   - Use test IDs (`1.3-E2E-001`) for clear traceability
+   - Reference criteria in describe blocks
+   - Use Given-When-Then narrative
+
+4. **Avoid Duplicate Coverage**
+   - Test each behavior at appropriate level only
+   - Unit tests for logic, E2E for journeys
+   - Only overlap for defense in depth on critical paths
+
+5. **Generate Gate-Ready Artifacts**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define test priorities and risk assessment
+- `bmad tea *atdd` - Generate failing acceptance tests for gaps
+- `bmad tea *automate` - Expand regression suite based on gaps
+- `bmad tea *gate` - Apply quality gates using traceability metrics
+- `bmad tea *test-review` - Review test quality issues flagged by trace
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./trace-template.md) - Traceability matrix template
+- [Knowledge Base](../../testarch/knowledge/) - Testing best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/checklist.md
+++ b/src/modules/bmm/workflows/testarch/trace/checklist.md
@@ -0,0 +1,267 @@
+# Requirements Traceability - Validation Checklist
+
+**Workflow:** `testarch-trace`
+**Purpose:** Ensure complete and accurate traceability matrix with actionable gap analysis
+
+---
+
+## Prerequisites Validation
+
+- [ ] Acceptance criteria are available (from story file OR inline)
+- [ ] Test suite exists (or gaps are acknowledged and documented)
+- [ ] Test directory path is correct (`test_dir` variable)
+- [ ] Story file is accessible (if using BMad mode)
+- [ ] Knowledge base is loaded (test-priorities, traceability, risk-governance)
+
+---
+
+## Context Loading
+
+- [ ] Story file read successfully (if applicable)
+- [ ] Acceptance criteria extracted correctly
+- [ ] Story ID identified (e.g., 1.3)
+- [ ] `test-design.md` loaded (if available)
+- [ ] `tech-spec.md` loaded (if available)
+- [ ] `PRD.md` loaded (if available)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`
+
+---
+
+## Test Discovery and Cataloging
+
+- [ ] Tests auto-discovered using multiple strategies (test IDs, describe blocks, file paths)
+- [ ] Tests categorized by level (E2E, API, Component, Unit)
+- [ ] Test metadata extracted:
+  - [ ] Test IDs (e.g., 1.3-E2E-001)
+  - [ ] Describe/context blocks
+  - [ ] It blocks (individual test cases)
+  - [ ] Given-When-Then structure (if BDD)
+  - [ ] Priority markers (P0/P1/P2/P3)
+- [ ] All relevant test files found (no tests missed due to naming conventions)
+
+---
+
+## Criteria-to-Test Mapping
+
+- [ ] Each acceptance criterion mapped to tests (or marked as NONE)
+- [ ] Explicit references found (test IDs, describe blocks mentioning criterion)
+- [ ] Test level documented (E2E, API, Component, Unit)
+- [ ] Given-When-Then narrative verified for alignment
+- [ ] Traceability matrix table generated:
+  - [ ] Criterion ID
+  - [ ] Description
+  - [ ] Test ID
+  - [ ] Test File
+  - [ ] Test Level
+  - [ ] Coverage Status
+
+---
+
+## Coverage Classification
+
+- [ ] Coverage status classified for each criterion:
+  - [ ] **FULL** - All scenarios validated at appropriate level(s)
+  - [ ] **PARTIAL** - Some coverage but missing edge cases or levels
+  - [ ] **NONE** - No test coverage at any level
+  - [ ] **UNIT-ONLY** - Only unit tests (missing integration/E2E validation)
+  - [ ] **INTEGRATION-ONLY** - Only API/Component tests (missing unit confidence)
+- [ ] Classification justifications provided
+- [ ] Edge cases considered in FULL vs PARTIAL determination
+
+---
+
+## Duplicate Coverage Detection
+
+- [ ] Duplicate coverage checked across test levels
+- [ ] Acceptable overlap identified (defense in depth for critical paths)
+- [ ] Unacceptable duplication flagged (same validation at multiple levels)
+- [ ] Recommendations provided for consolidation
+- [ ] Selective testing principles applied
+
+---
+
+## Gap Analysis
+
+- [ ] Coverage gaps identified:
+  - [ ] Criteria with NONE status
+  - [ ] Criteria with PARTIAL status
+  - [ ] Criteria with UNIT-ONLY status
+  - [ ] Criteria with INTEGRATION-ONLY status
+- [ ] Gaps prioritized by risk level using test-priorities framework:
+  - [ ] **CRITICAL** - P0 criteria without FULL coverage (BLOCKER)
+  - [ ] **HIGH** - P1 criteria without FULL coverage (PR blocker)
+  - [ ] **MEDIUM** - P2 criteria without FULL coverage (nightly gap)
+  - [ ] **LOW** - P3 criteria without FULL coverage (acceptable)
+- [ ] Specific test recommendations provided for each gap:
+  - [ ] Suggested test level (E2E, API, Component, Unit)
+  - [ ] Test description (Given-When-Then)
+  - [ ] Recommended test ID (e.g., 1.3-E2E-004)
+  - [ ] Explanation of why test is needed
+
+---
+
+## Coverage Metrics
+
+- [ ] Overall coverage percentage calculated (FULL coverage / total criteria)
+- [ ] P0 coverage percentage calculated
+- [ ] P1 coverage percentage calculated
+- [ ] P2 coverage percentage calculated (if applicable)
+- [ ] Coverage by level calculated:
+  - [ ] E2E coverage %
+  - [ ] API coverage %
+  - [ ] Component coverage %
+  - [ ] Unit coverage %
+
+---
+
+## Quality Gate Validation
+
+- [ ] P0 coverage >= 100% (required) ✅ or BLOCKER documented ❌
+- [ ] P1 coverage >= 90% (recommended) ✅ or HIGH priority gap documented ⚠️
+- [ ] Overall coverage >= 80% (recommended) ✅ or MEDIUM priority gap documented ⚠️
+- [ ] Gate status determined: PASS / WARN / FAIL
+
+---
+
+## Test Quality Verification
+
+For each mapped test, verify:
+
+- [ ] Explicit assertions are present (not hidden in helpers)
+- [ ] Test follows Given-When-Then structure
+- [ ] No hard waits or sleeps (deterministic waiting only)
+- [ ] Self-cleaning (test cleans up its data)
+- [ ] File size < 300 lines
+- [ ] Test duration < 90 seconds
+
+Quality issues flagged:
+
+- [ ] **BLOCKER** issues identified (missing assertions, hard waits, flaky patterns)
+- [ ] **WARNING** issues identified (large files, slow tests, unclear structure)
+- [ ] **INFO** issues identified (style inconsistencies, missing documentation)
+
+Knowledge fragments referenced:
+
+- [ ] `test-quality.md` for Definition of Done
+- [ ] `fixture-architecture.md` for self-cleaning patterns
+- [ ] `network-first.md` for Playwright best practices
+- [ ] `data-factories.md` for test data patterns
+
+---
+
+## Deliverables Generated
+
+### Traceability Matrix Markdown
+
+- [ ] File created at `{output_folder}/traceability-matrix.md`
+- [ ] Template from `trace-template.md` used
+- [ ] Full mapping table included
+- [ ] Coverage status section included
+- [ ] Gap analysis section included
+- [ ] Quality assessment section included
+- [ ] Recommendations section included
+
+### Gate YAML Snippet (if enabled)
+
+- [ ] YAML snippet generated
+- [ ] Story ID included
+- [ ] Coverage metrics included (overall, p0, p1, p2)
+- [ ] Gap counts included (critical, high, medium, low)
+- [ ] Status included (PASS / WARN / FAIL)
+- [ ] Recommendations included
+
+### Coverage Badge/Metric (if enabled)
+
+- [ ] Badge markdown generated
+- [ ] Metrics exported to JSON for CI/CD integration
+
+### Updated Story File (if enabled)
+
+- [ ] "Traceability" section added to story markdown
+- [ ] Link to traceability matrix included
+- [ ] Coverage summary included
+- [ ] Gate status included
+
+---
+
+## Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All acceptance criteria accounted for (none skipped)
+- [ ] Test IDs correctly formatted (e.g., 1.3-E2E-001)
+- [ ] File paths are correct and accessible
+- [ ] Coverage percentages calculated correctly
+- [ ] No false positives (tests incorrectly mapped to criteria)
+- [ ] No false negatives (existing tests missed in mapping)
+
+### Completeness Checks
+
+- [ ] All test levels considered (E2E, API, Component, Unit)
+- [ ] All priorities considered (P0, P1, P2, P3)
+- [ ] All coverage statuses used appropriately (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- [ ] All gaps have recommendations
+- [ ] All quality issues have severity and remediation guidance
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Test IDs suggested for new tests
+- [ ] Given-When-Then provided for recommended tests
+- [ ] Impact explained for each gap
+- [ ] Priorities clear (CRITICAL, HIGH, MEDIUM, LOW)
+
+---
+
+## Non-Prescriptive Validation
+
+- [ ] Traceability format adapted to team needs (not rigid template)
+- [ ] Examples are minimal and focused on patterns
+- [ ] Teams can extend with custom classifications
+- [ ] Integration with external systems supported (JIRA, Azure DevOps)
+- [ ] Compliance requirements considered (if applicable)
+
+---
+
+## Documentation and Communication
+
+- [ ] Traceability matrix is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Gate status is prominent and unambiguous
+
+---
+
+## Final Validation
+
+- [ ] All prerequisites met
+- [ ] All acceptance criteria mapped or gaps documented
+- [ ] P0 coverage is 100% OR documented as BLOCKER
+- [ ] Gap analysis is complete and prioritized
+- [ ] Test quality issues identified and flagged
+- [ ] Deliverables generated and saved
+- [ ] Gate YAML ready for CI/CD integration (if enabled)
+- [ ] Story file updated (if enabled)
+- [ ] Workflow completed successfully
+
+---
+
+## Sign-Off
+
+**Traceability Status:**
+
+- [ ] ✅ PASS - All quality gates met, no critical gaps
+- [ ] ⚠️ WARN - P1 gaps exist, address before PR merge
+- [ ] ❌ FAIL - P0 gaps exist, BLOCKER for release
+
+**Next Actions:**
+
+- If PASS: Proceed to `*gate` workflow or PR merge
+- If WARN: Address HIGH priority gaps, re-run `*trace`
+- If FAIL: Run `*atdd` to generate missing P0 tests, re-run `*trace`
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/instructions.md
+++ b/src/modules/bmm/workflows/testarch/trace/instructions.md
@@ -1,39 +1,558 @@
-<!-- Powered by BMAD-CORE™ -->
+# Requirements Traceability - Instructions v4.0

-# Requirements Traceability v3.0
+**Workflow:** `testarch-trace`
+**Purpose:** Generate requirements-to-tests traceability matrix with coverage analysis and gap identification
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)

-```xml
-<task id="bmad/bmm/testarch/trace" name="Requirements Traceability">
-  <llm critical="true">
-    <i>Preflight requirements:</i>
-    <i>- Story has implemented tests (or acknowledge gaps).</i>
-    <i>- Access to source code and specifications is available.</i>
-  </llm>
-  <flow>
-    <step n="1" title="Preflight">
-      <action>Confirm prerequisites; halt if tests or specs are unavailable.</action>
-    </step>
-    <step n="2" title="Trace Coverage">
-      <action>Gather acceptance criteria and implemented tests.</action>
-      <action>Map each criterion to concrete tests (file + describe/it) using Given-When-Then narrative.</action>
-      <action>Classify coverage status as FULL, PARTIAL, NONE, UNIT-ONLY, or INTEGRATION-ONLY.</action>
-      <action>Flag severity based on priority (P0 gaps are critical) and recommend additional tests or refactors.</action>
-      <action>Build gate YAML coverage summary reflecting totals and gaps.</action>
-    </step>
-    <step n="3" title="Deliverables">
-      <action>Generate traceability report under `docs/qa/assessments`, a coverage matrix per criterion, and gate YAML snippet capturing totals/gaps.</action>
-    </step>
-  </flow>
-  <halt>
-    <i>If story lacks implemented tests, pause and advise running `*atdd` or writing tests before tracing.</i>
-  </halt>
-  <notes>
-    <i>Use `{project-root}/bmad/bmm/testarch/tea-index.csv` to load traceability-relevant fragments (risk-governance, selective-testing, test-quality) as needed.</i>
-    <i>Coverage definitions: FULL=all scenarios validated, PARTIAL=some coverage, NONE=no validation, UNIT-ONLY=missing higher-level validation, INTEGRATION-ONLY=lacks lower-level confidence.</i>
-    <i>Ensure assertions stay explicit and avoid duplicate coverage.</i>
-  </notes>
-  <output>
-    <i>Traceability matrix and gate snippet ready for review.</i>
-  </output>
-</task>
+---
+
+## Overview
+
+This workflow creates a comprehensive traceability matrix that maps acceptance criteria to implemented tests, identifies coverage gaps, and provides actionable recommendations for improving test coverage. It supports both BMad-integrated mode (with story files and test design) and standalone mode (with inline acceptance criteria).
+
+**Key Capabilities:**
+
+- Map acceptance criteria to specific test cases across all levels (E2E, API, Component, Unit)
+- Classify coverage status (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- Prioritize gaps by risk level (P0/P1/P2/P3) using test-priorities framework
+- Generate gate-ready YAML snippets for CI/CD integration
+- Detect duplicate coverage across test levels
+- Verify explicit assertions in test cases
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Acceptance criteria (from story file OR provided inline)
+- Implemented test suite (or acknowledge gaps to be addressed)
+
+**Recommended:**
+
+- `test-design.md` (for risk assessment and priority context)
+- `tech-spec.md` (for technical implementation context)
+- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
+
+**Halt Conditions:**
+
+- If story lacks any implemented tests AND no gaps are acknowledged, recommend running `*atdd` workflow first
+- If acceptance criteria are completely missing, halt and request them
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `traceability.md` - Requirements mapping patterns
+   - `test-priorities.md` - P0/P1/P2/P3 risk framework
+   - `risk-governance.md` - Risk-based testing approach
+   - `test-quality.md` - Definition of Done for tests
+   - `selective-testing.md` - Duplicate coverage patterns
+
+2. Read story file (if provided):
+   - Extract acceptance criteria
+   - Identify story ID (e.g., 1.3)
+   - Note any existing test design or priority information
+
+3. Read related BMad artifacts (if available):
+   - `test-design.md` - Risk assessment and test priorities
+   - `tech-spec.md` - Technical implementation details
+   - `PRD.md` - Product requirements context
+
+**Output:** Complete understanding of requirements, priorities, and existing context
+
+---
+
+### Step 2: Discover and Catalog Tests
+
+**Actions:**
+
+1. Auto-discover test files related to the story:
+   - Search for test IDs (e.g., `1.3-E2E-001`, `1.3-UNIT-005`)
+   - Search for describe blocks mentioning feature name
+   - Search for file paths matching feature directory
+   - Use `glob` to find test files in `{test_dir}`
+
+2. Categorize tests by level:
+   - **E2E Tests**: Full user journeys through UI
+   - **API Tests**: HTTP contract and integration tests
+   - **Component Tests**: UI component behavior in isolation
+   - **Unit Tests**: Business logic and pure functions
+
+3. Extract test metadata:
+   - Test ID (if present)
+   - Describe/context blocks
+   - It blocks (individual test cases)
+   - Given-When-Then structure (if BDD)
+   - Assertions used
+   - Priority markers (P0/P1/P2/P3)
+
+**Output:** Complete catalog of all tests for this feature
+
+---
+
+### Step 3: Map Criteria to Tests
+
+**Actions:**
+
+1. For each acceptance criterion:
+   - Search for explicit references (test IDs, describe blocks mentioning criterion)
+   - Map to specific test files and it blocks
+   - Use Given-When-Then narrative to verify alignment
+   - Document test level (E2E, API, Component, Unit)
+
+2. Build traceability matrix:
+
+   ```
+   | Criterion ID | Description | Test ID | Test File | Test Level | Coverage Status |
+   |--------------|-------------|---------|-----------|------------|-----------------|
+   | AC-1         | User can... | 1.3-E2E-001 | e2e/auth.spec.ts | E2E | FULL |
+   ```
+
+3. Classify coverage status for each criterion:
+   - **FULL**: All scenarios validated at appropriate level(s)
+   - **PARTIAL**: Some coverage but missing edge cases or levels
+   - **NONE**: No test coverage at any level
+   - **UNIT-ONLY**: Only unit tests (missing integration/E2E validation)
+   - **INTEGRATION-ONLY**: Only API/Component tests (missing unit confidence)
+
+4. Check for duplicate coverage:
+   - Same behavior tested at multiple levels unnecessarily
+   - Flag violations of selective testing principles
+   - Recommend consolidation where appropriate
+
+**Output:** Complete traceability matrix with coverage classifications
+
+---
+
+### Step 4: Analyze Gaps and Prioritize
+
+**Actions:**
+
+1. Identify coverage gaps:
+   - List criteria with NONE, PARTIAL, UNIT-ONLY, or INTEGRATION-ONLY status
+   - Assign severity based on test-priorities framework:
+     - **CRITICAL**: P0 criteria without FULL coverage (blocks release)
+     - **HIGH**: P1 criteria without FULL coverage (PR blocker)
+     - **MEDIUM**: P2 criteria without FULL coverage (nightly test gap)
+     - **LOW**: P3 criteria without FULL coverage (acceptable gap)
+
+2. Recommend specific tests to add:
+   - Suggest test level (E2E, API, Component, Unit)
+   - Provide test description (Given-When-Then)
+   - Recommend test ID (e.g., `1.3-E2E-004`)
+   - Explain why this test is needed
+
+3. Calculate coverage metrics:
+   - Overall coverage percentage (criteria with FULL coverage / total criteria)
+   - P0 coverage percentage (critical paths)
+   - P1 coverage percentage (high priority)
+   - Coverage by level (E2E%, API%, Component%, Unit%)
+
+4. Check against quality gates:
+   - P0 coverage >= 100% (required)
+   - P1 coverage >= 90% (recommended)
+   - Overall coverage >= 80% (recommended)
+
+**Output:** Prioritized gap analysis with actionable recommendations
+
+---
+
+### Step 5: Verify Test Quality
+
+**Actions:**
+
+1. For each mapped test, verify:
+   - Explicit assertions are present (not hidden in helpers)
+   - Test follows Given-When-Then structure
+   - No hard waits or sleeps
+   - Self-cleaning (test cleans up its data)
+   - File size < 300 lines
+   - Test duration < 90 seconds
+
+2. Flag quality issues:
+   - **BLOCKER**: Missing assertions, hard waits, flaky patterns
+   - **WARNING**: Large files, slow tests, unclear structure
+   - **INFO**: Style inconsistencies, missing documentation
+
+3. Reference knowledge fragments:
+   - `test-quality.md` for Definition of Done
+   - `fixture-architecture.md` for self-cleaning patterns
+   - `network-first.md` for Playwright best practices
+   - `data-factories.md` for test data patterns
+
+**Output:** Quality assessment for each test with improvement recommendations
+
+---
+
+### Step 6: Generate Deliverables
+
+**Actions:**
+
+1. Create traceability matrix markdown file:
+   - Use template from `trace-template.md`
+   - Include full mapping table
+   - Add coverage status section
+   - Add gap analysis section
+   - Add quality assessment section
+   - Add recommendations section
+   - Save to `{output_folder}/traceability-matrix.md`
+
+2. Generate gate YAML snippet (if enabled):
+
+   ```yaml
+   traceability:
+     story_id: '1.3'
+     coverage:
+       overall: 85%
+       p0: 100%
+       p1: 90%
+       p2: 75%
+     gaps:
+       critical: 0
+       high: 1
+       medium: 2
+     status: 'PASS' # or "FAIL" if P0 < 100%
+   ```
+
+3. Create coverage badge/metric (if enabled):
+   - Generate badge markdown: `![Coverage](https://img.shields.io/badge/coverage-85%25-green)`
+   - Export metrics to JSON for CI/CD integration
+
+4. Update story file (if enabled):
+   - Add "Traceability" section to story markdown
+   - Link to traceability matrix
+   - Include coverage summary
+   - Add gate status
+
+**Output:** Complete traceability documentation ready for review and CI/CD integration
+
+---
+
+## Non-Prescriptive Approach
+
+**Minimal Examples:** This workflow provides principles and patterns, not rigid templates. Teams should adapt the traceability format to their needs.
+
+**Key Patterns to Follow:**
+
+- Map criteria to tests explicitly (don't rely on inference alone)
+- Prioritize by risk (P0 gaps are critical, P3 gaps are acceptable)
+- Check coverage at appropriate levels (E2E for journeys, Unit for logic)
+- Verify test quality (explicit assertions, no flakiness)
+- Generate gate-ready artifacts (YAML snippets for CI/CD)
+
+**Extend as Needed:**
+
+- Add custom coverage classifications
+- Integrate with code coverage tools (Istanbul, NYC)
+- Link to external traceability systems (JIRA, Azure DevOps)
+- Add compliance or regulatory requirements
+
+---
+
+## Coverage Classification Details
+
+### FULL Coverage
+
+- All scenarios validated at appropriate test level(s)
+- Edge cases considered
+- Both happy path and error paths tested
+- Assertions are explicit and complete
+
+### PARTIAL Coverage
+
+- Some scenarios validated but missing edge cases
+- Only happy path tested (missing error paths)
+- Assertions present but incomplete
+- Coverage exists but needs enhancement
+
+### NONE Coverage
+
+- No tests found for this criterion
+- Complete gap requiring new tests
+- Critical if P0/P1, acceptable if P3
+
+### UNIT-ONLY Coverage
+
+- Only unit tests exist (business logic validated)
+- Missing integration or E2E validation
+- Risk: Implementation may not work end-to-end
+- Recommendation: Add integration or E2E tests for critical paths
+
+### INTEGRATION-ONLY Coverage
+
+- Only API or Component tests exist
+- Missing unit test confidence for business logic
+- Risk: Logic errors may not be caught quickly
+- Recommendation: Add unit tests for complex algorithms or state machines
+
+---
+
+## Duplicate Coverage Detection
+
+Use selective testing principles from `selective-testing.md`:
+
+**Acceptable Overlap:**
+
+- Unit tests for business logic + E2E tests for user journey (different aspects)
+- API tests for contract + E2E tests for full workflow (defense in depth for critical paths)
+
+**Unacceptable Duplication:**
+
+- Same validation at multiple levels (e.g., E2E testing math logic better suited for unit tests)
+- Multiple E2E tests covering identical user path
+- Component tests duplicating unit test logic
+
+**Recommendation Pattern:**
+
+- Test logic at unit level
+- Test integration at API/Component level
+- Test user experience at E2E level
+- Avoid testing framework behavior at any level
+
+---
+
+## Integration with BMad Artifacts
+
+### With test-design.md
+
+- Use risk assessment to prioritize gap remediation
+- Reference test priorities (P0/P1/P2/P3) for severity classification
+- Align traceability with originally planned test coverage
+
+### With tech-spec.md
+
+- Understand technical implementation details
+- Map criteria to specific code modules
+- Verify tests cover technical edge cases
+
+### With PRD.md
+
+- Understand full product context
+- Verify acceptance criteria align with product goals
+- Check for unstated requirements that need coverage
+
+---
+
+## Quality Gates
+
+### P0 Coverage (Critical Paths)
+
+- **Requirement:** 100% FULL coverage
+- **Severity:** BLOCKER if not met
+- **Action:** Do not release until P0 coverage is complete
+
+### P1 Coverage (High Priority)
+
+- **Requirement:** 90% FULL coverage
+- **Severity:** HIGH if not met
+- **Action:** Block PR merge until addressed
+
+### P2 Coverage (Medium Priority)
+
+- **Requirement:** No strict requirement (recommended 80%)
+- **Severity:** MEDIUM if gaps exist
+- **Action:** Address in nightly test improvements
+
+### P3 Coverage (Low Priority)
+
+- **Requirement:** No requirement
+- **Severity:** LOW if gaps exist
+- **Action:** Optional - add if time permits
+
+---
+
+## Example Traceability Matrix
+
+````markdown
+# Traceability Matrix - Story 1.3
+
+**Story:** User Authentication
+**Date:** 2025-10-14
+**Status:** 85% Coverage (1 HIGH gap)
+
+## Coverage Summary
+
+| Priority  | Total Criteria | FULL Coverage | Coverage % | Status  |
+| --------- | -------------- | ------------- | ---------- | ------- |
+| P0        | 3              | 3             | 100%       | ✅ PASS |
+| P1        | 5              | 4             | 80%        | ⚠️ WARN |
+| P2        | 4              | 3             | 75%        | ✅ PASS |
+| P3        | 2              | 1             | 50%        | ✅ PASS |
+| **Total** | **14**         | **11**        | **79%**    | ⚠️ WARN |
+
+## Detailed Mapping
+
+### AC-1: User can login with email and password (P0)
+
+- **Coverage:** FULL ✅
+- **Tests:**
+  - `1.3-E2E-001` - tests/e2e/auth.spec.ts:12
+    - Given: User has valid credentials
+    - When: User submits login form
+    - Then: User is redirected to dashboard
+  - `1.3-UNIT-001` - tests/unit/auth-service.spec.ts:8
+    - Given: Valid email and password hash
+    - When: validateCredentials is called
+    - Then: Returns user object
+
+### AC-2: User sees error for invalid credentials (P0)
+
+- **Coverage:** FULL ✅
+- **Tests:**
+  - `1.3-E2E-002` - tests/e2e/auth.spec.ts:28
+    - Given: User has invalid password
+    - When: User submits login form
+    - Then: Error message is displayed
+  - `1.3-UNIT-002` - tests/unit/auth-service.spec.ts:18
+    - Given: Invalid password hash
+    - When: validateCredentials is called
+    - Then: Throws AuthenticationError
+
+### AC-3: User can reset password via email (P1)
+
+- **Coverage:** PARTIAL ⚠️
+- **Tests:**
+  - `1.3-E2E-003` - tests/e2e/auth.spec.ts:44
+    - Given: User requests password reset
+    - When: User clicks reset link
+    - Then: User can set new password
+- **Gaps:**
+  - Missing: Email delivery validation
+  - Missing: Expired token handling
+  - Missing: Unit test for token generation
+- **Recommendation:** Add `1.3-API-001` for email service integration and `1.3-UNIT-003` for token logic
+
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+- None ✅
+
+### High Priority Gaps (PR BLOCKER)
+
+1. **AC-3: Password reset email edge cases**
+   - Missing tests for expired tokens, invalid tokens, email failures
+   - Recommend: `1.3-API-001` (email service integration) and `1.3-E2E-004` (error paths)
+   - Impact: Users may not be able to recover accounts in error scenarios
+
+### Medium Priority Gaps (Nightly)
+
+1. **AC-7: Session timeout handling** - UNIT-ONLY coverage (missing E2E validation)
+
+## Quality Assessment
+
+### Tests with Issues
+
+- `1.3-E2E-001` ⚠️ - 145 seconds (exceeds 90s target) - Optimize fixture setup
+- `1.3-UNIT-005` ⚠️ - 320 lines (exceeds 300 line limit) - Split into multiple test files
+
+### Tests Passing Quality Gates
+
+- 11/13 tests (85%) meet all quality criteria ✅
+
+## Gate YAML Snippet
+
+```yaml
+traceability:
+  story_id: '1.3'
+  coverage:
+    overall: 79%
+    p0: 100%
+    p1: 80%
+    p2: 75%
+    p3: 50%
+  gaps:
+    critical: 0
+    high: 1
+    medium: 1
+    low: 1
+  status: 'WARN' # P1 coverage below 90% threshold
+  recommendations:
+    - 'Add 1.3-API-001 for email service integration'
+    - 'Add 1.3-E2E-004 for password reset error paths'
+    - 'Optimize 1.3-E2E-001 performance (145s → <90s)'
+```
+````
+
+## Recommendations
+
+1. **Address High Priority Gap:** Add password reset edge case tests before PR merge
+2. **Optimize Slow Test:** Refactor `1.3-E2E-001` to use faster fixture setup
+3. **Split Large Test:** Break `1.3-UNIT-005` into focused test files
+4. **Enhance P2 Coverage:** Add E2E validation for session timeout (currently UNIT-ONLY)
+
+```
+
+---
+
+## Validation Checklist
+
+Before completing this workflow, verify:
+
+- ✅ All acceptance criteria are mapped to tests (or gaps are documented)
+- ✅ Coverage status is classified (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- ✅ Gaps are prioritized by risk level (P0/P1/P2/P3)
+- ✅ P0 coverage is 100% or blockers are documented
+- ✅ Duplicate coverage is identified and flagged
+- ✅ Test quality is assessed (assertions, structure, performance)
+- ✅ Traceability matrix is generated and saved
+- ✅ Gate YAML snippet is generated (if enabled)
+- ✅ Story file is updated with traceability section (if enabled)
+- ✅ Recommendations are actionable and specific
+
+---
+
+## Notes
+
+- **Explicit Mapping:** Require tests to reference criteria explicitly (test IDs, describe blocks) for maintainability
+- **Risk-Based Prioritization:** Use test-priorities framework (P0/P1/P2/P3) to determine gap severity
+- **Quality Over Quantity:** Better to have fewer high-quality tests with FULL coverage than many low-quality tests with PARTIAL coverage
+- **Selective Testing:** Avoid duplicate coverage - test each behavior at the appropriate level only
+- **Gate Integration:** Generate YAML snippets that can be consumed by CI/CD pipelines for automated quality gates
+
+---
+
+## Troubleshooting
+
+### "No tests found for this story"
+- Run `*atdd` workflow first to generate failing acceptance tests
+- Check test file naming conventions (may not match story ID pattern)
+- Verify test directory path is correct
+
+### "Cannot determine coverage status"
+- Tests may lack explicit mapping to criteria (no test IDs, unclear describe blocks)
+- Review test structure and add Given-When-Then narrative
+- Add test IDs in format: `{STORY_ID}-{LEVEL}-{SEQ}` (e.g., 1.3-E2E-001)
+
+### "P0 coverage below 100%"
+- This is a **BLOCKER** - do not release
+- Identify missing P0 tests in gap analysis
+- Run `*atdd` workflow to generate missing tests
+- Verify with stakeholders that P0 classification is correct
+
+### "Duplicate coverage detected"
+- Review selective testing principles in `selective-testing.md`
+- Determine if overlap is acceptable (defense in depth) or wasteful (same validation at multiple levels)
+- Consolidate tests at appropriate level (logic → unit, integration → API, journey → E2E)
+
+---
+
+## Related Workflows
+
+- **testarch-test-design** - Define test priorities (P0/P1/P2/P3) before tracing
+- **testarch-atdd** - Generate failing acceptance tests for gaps identified
+- **testarch-automate** - Expand regression suite based on traceability findings
+- **testarch-gate** - Use traceability matrix as input for quality gate decisions
+- **testarch-test-review** - Review test quality issues flagged in traceability
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
 ```
--- a/src/modules/bmm/workflows/testarch/trace/trace-template.md
+++ b/src/modules/bmm/workflows/testarch/trace/trace-template.md
@@ -0,0 +1,307 @@
+# Traceability Matrix - Story {STORY_ID}
+
+**Story:** {STORY_TITLE}
+**Date:** {DATE}
+**Status:** {OVERALL_COVERAGE}% Coverage ({GAP_COUNT} {GAP_SEVERITY} gap{s})
+
+---
+
+## Coverage Summary
+
+| Priority  | Total Criteria | FULL Coverage | Coverage % | Status       |
+| --------- | -------------- | ------------- | ---------- | ------------ |
+| P0        | {P0_TOTAL}     | {P0_FULL}     | {P0_PCT}%  | {P0_STATUS}  |
+| P1        | {P1_TOTAL}     | {P1_FULL}     | {P1_PCT}%  | {P1_STATUS}  |
+| P2        | {P2_TOTAL}     | {P2_FULL}     | {P2_PCT}%  | {P2_STATUS}  |
+| P3        | {P3_TOTAL}     | {P3_FULL}     | {P3_PCT}%  | {P3_STATUS}  |
+| **Total** | **{TOTAL}**    | **{FULL}**    | **{PCT}%** | **{STATUS}** |
+
+**Legend:**
+
+- ✅ PASS - Coverage meets quality gate threshold
+- ⚠️ WARN - Coverage below threshold but not critical
+- ❌ FAIL - Coverage below minimum threshold (blocker)
+
+---
+
+## Detailed Mapping
+
+### {CRITERION_ID}: {CRITERION_DESCRIPTION} ({PRIORITY})
+
+- **Coverage:** {COVERAGE_STATUS} {STATUS_ICON}
+- **Tests:**
+  - `{TEST_ID}` - {TEST_FILE}:{LINE}
+    - **Given:** {GIVEN}
+    - **When:** {WHEN}
+    - **Then:** {THEN}
+  - `{TEST_ID_2}` - {TEST_FILE_2}:{LINE}
+    - **Given:** {GIVEN_2}
+    - **When:** {WHEN_2}
+    - **Then:** {THEN_2}
+
+- **Gaps:** (if PARTIAL or UNIT-ONLY or INTEGRATION-ONLY)
+  - Missing: {MISSING_SCENARIO_1}
+  - Missing: {MISSING_SCENARIO_2}
+
+- **Recommendation:** {RECOMMENDATION_TEXT}
+
+---
+
+### Example: AC-1: User can login with email and password (P0)
+
+- **Coverage:** FULL ✅
+- **Tests:**
+  - `1.3-E2E-001` - tests/e2e/auth.spec.ts:12
+    - **Given:** User has valid credentials
+    - **When:** User submits login form
+    - **Then:** User is redirected to dashboard
+  - `1.3-UNIT-001` - tests/unit/auth-service.spec.ts:8
+    - **Given:** Valid email and password hash
+    - **When:** validateCredentials is called
+    - **Then:** Returns user object
+
+---
+
+### Example: AC-3: User can reset password via email (P1)
+
+- **Coverage:** PARTIAL ⚠️
+- **Tests:**
+  - `1.3-E2E-003` - tests/e2e/auth.spec.ts:44
+    - **Given:** User requests password reset
+    - **When:** User clicks reset link in email
+    - **Then:** User can set new password
+
+- **Gaps:**
+  - Missing: Email delivery validation
+  - Missing: Expired token handling (error path)
+  - Missing: Invalid token handling (security test)
+  - Missing: Unit test for token generation logic
+
+- **Recommendation:** Add `1.3-API-001` for email service integration testing and `1.3-UNIT-003` for token generation logic. Add `1.3-E2E-004` for error path validation (expired/invalid tokens).
+
+---
+
+### Example: AC-7: Session timeout handling (P2)
+
+- **Coverage:** UNIT-ONLY ⚠️
+- **Tests:**
+  - `1.3-UNIT-006` - tests/unit/session-manager.spec.ts:42
+    - **Given:** Session has expired timestamp
+    - **When:** isSessionValid is called
+    - **Then:** Returns false
+
+- **Gaps:**
+  - Missing: E2E validation of timeout behavior in UI
+  - Missing: API test for session refresh flow
+
+- **Recommendation:** Add `1.3-E2E-005` to validate that user sees timeout message and is redirected to login. Add `1.3-API-002` to validate session refresh endpoint behavior.
+
+---
+
+## Gap Analysis
+
+### Critical Gaps (BLOCKER) ❌
+
+{CRITICAL_GAP_COUNT} gaps found. **Do not release until resolved.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P0)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+### High Priority Gaps (PR BLOCKER) ⚠️
+
+{HIGH_GAP_COUNT} gaps found. **Address before PR merge.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P1)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+### Medium Priority Gaps (Nightly) ⚠️
+
+{MEDIUM_GAP_COUNT} gaps found. **Address in nightly test improvements.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P2)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+
+---
+
+### Low Priority Gaps (Optional) ℹ️
+
+{LOW_GAP_COUNT} gaps found. **Optional - add if time permits.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P3)
+   - Current Coverage: {COVERAGE_STATUS}
+
+---
+
+## Quality Assessment
+
+### Tests with Issues
+
+**BLOCKER Issues** ❌
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**WARNING Issues** ⚠️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**INFO Issues** ℹ️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+---
+
+### Example Quality Issues
+
+**WARNING Issues** ⚠️
+
+- `1.3-E2E-001` - 145 seconds (exceeds 90s target) - Optimize fixture setup to reduce test duration
+- `1.3-UNIT-005` - 320 lines (exceeds 300 line limit) - Split into multiple focused test files
+
+**INFO Issues** ℹ️
+
+- `1.3-E2E-002` - Missing Given-When-Then structure - Refactor describe block to use BDD format
+
+---
+
+### Tests Passing Quality Gates
+
+**{PASSING_TEST_COUNT}/{TOTAL_TEST_COUNT} tests ({PASSING_PCT}%) meet all quality criteria** ✅
+
+---
+
+## Duplicate Coverage Analysis
+
+### Acceptable Overlap (Defense in Depth)
+
+- {CRITERION_ID}: Tested at unit (business logic) and E2E (user journey) ✅
+
+### Unacceptable Duplication ⚠️
+
+- {CRITERION_ID}: Same validation at E2E and Component level
+  - Recommendation: Remove {TEST_ID} or consolidate with {OTHER_TEST_ID}
+
+---
+
+## Coverage by Test Level
+
+| Test Level | Tests             | Criteria Covered     | Coverage %       |
+| ---------- | ----------------- | -------------------- | ---------------- |
+| E2E        | {E2E_COUNT}       | {E2E_CRITERIA}       | {E2E_PCT}%       |
+| API        | {API_COUNT}       | {API_CRITERIA}       | {API_PCT}%       |
+| Component  | {COMP_COUNT}      | {COMP_CRITERIA}      | {COMP_PCT}%      |
+| Unit       | {UNIT_COUNT}      | {UNIT_CRITERIA}      | {UNIT_PCT}%      |
+| **Total**  | **{TOTAL_TESTS}** | **{TOTAL_CRITERIA}** | **{TOTAL_PCT}%** |
+
+---
+
+## Gate YAML Snippet
+
+```yaml
+traceability:
+  story_id: "{STORY_ID}"
+  date: "{DATE}"
+  coverage:
+    overall: {OVERALL_PCT}%
+    p0: {P0_PCT}%
+    p1: {P1_PCT}%
+    p2: {P2_PCT}%
+    p3: {P3_PCT}%
+  gaps:
+    critical: {CRITICAL_COUNT}
+    high: {HIGH_COUNT}
+    medium: {MEDIUM_COUNT}
+    low: {LOW_COUNT}
+  quality:
+    passing_tests: {PASSING_COUNT}
+    total_tests: {TOTAL_TESTS}
+    blocker_issues: {BLOCKER_COUNT}
+    warning_issues: {WARNING_COUNT}
+  status: "{STATUS}" # PASS / WARN / FAIL
+  recommendations:
+    - "{RECOMMENDATION_1}"
+    - "{RECOMMENDATION_2}"
+    - "{RECOMMENDATION_3}"
+```
+
+---
+
+## Recommendations
+
+### Immediate Actions (Before PR Merge)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+### Short-term Actions (This Sprint)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+### Long-term Actions (Backlog)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+
+---
+
+### Example Recommendations
+
+### Immediate Actions (Before PR Merge)
+
+1. **Add P1 Password Reset Tests** - Implement `1.3-API-001` for email service integration and `1.3-E2E-004` for error path validation. P1 coverage currently at 80%, target is 90%.
+2. **Optimize Slow E2E Test** - Refactor `1.3-E2E-001` to use faster fixture setup. Currently 145s, target is <90s.
+
+### Short-term Actions (This Sprint)
+
+1. **Enhance P2 Coverage** - Add E2E validation for session timeout (`1.3-E2E-005`). Currently UNIT-ONLY coverage.
+2. **Split Large Test File** - Break `1.3-UNIT-005` (320 lines) into multiple focused test files (<300 lines each).
+
+### Long-term Actions (Backlog)
+
+1. **Enrich P3 Coverage** - Add tests for edge cases in P3 criteria if time permits.
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH}
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **Test Files:** {TEST_DIR_PATH}
+
+---
+
+## Sign-Off
+
+**Traceability Assessment:**
+
+- Overall Coverage: {OVERALL_PCT}%
+- P0 Coverage: {P0_PCT}% {P0_STATUS}
+- P1 Coverage: {P1_PCT}% {P1_STATUS}
+- Critical Gaps: {CRITICAL_COUNT}
+- High Priority Gaps: {HIGH_COUNT}
+
+**Gate Status:** {STATUS} {STATUS_ICON}
+
+**Next Steps:**
+
+- If PASS ✅: Proceed to `*gate` workflow or PR merge
+- If WARN ⚠️: Address HIGH priority gaps, re-run `*trace`
+- If FAIL ❌: Run `*atdd` to generate missing P0 tests, re-run `*trace`
+
+**Generated:** {DATE}
+**Workflow:** testarch-trace v4.0
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/src/modules/bmm/workflows/testarch/trace/workflow.yaml
+++ b/src/modules/bmm/workflows/testarch/trace/workflow.yaml
@@ -1,25 +1,99 @@
 # Test Architect workflow: trace
 name: testarch-trace
-description: "Trace requirements to implemented automated tests."
+description: "Generate requirements-to-tests traceability matrix with coverage analysis and gap identification"
 author: "BMad"

+# Critical variables from config
 config_source: "{project-root}/bmad/bmm/config.yaml"
 output_folder: "{config_source}:output_folder"
 user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

+# Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/testarch/trace"
 instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/trace-template.md"

-template: false
+# Variables and inputs
+variables:
+  # Target specification
+  story_file: "" # Path to story markdown (e.g., bmad/output/story-1.3.md)
+  acceptance_criteria: "" # Optional - inline criteria if no story file
+
+  # Test discovery
+  test_dir: "{project-root}/tests"
+  source_dir: "{project-root}/src"
+  auto_discover_tests: true # Automatically find tests related to story
+
+  # Traceability configuration
+  coverage_levels: "e2e,api,component,unit" # Which levels to trace (comma-separated)
+  map_by_test_id: true # Use test IDs (e.g., 1.3-E2E-001) for mapping
+  map_by_describe: true # Use describe blocks for mapping
+  map_by_filename: true # Use file paths for mapping
+
+  # Coverage classification
+  require_explicit_mapping: true # Require tests to explicitly reference criteria
+  flag_unit_only: true # Flag criteria covered only by unit tests
+  flag_integration_only: true # Flag criteria covered only by integration tests
+  flag_partial_coverage: true # Flag criteria with incomplete coverage
+
+  # Gap analysis
+  prioritize_by_risk: true # Use test-priorities (P0/P1/P2/P3) for gap severity
+  suggest_missing_tests: true # Recommend specific tests to add
+  check_duplicate_coverage: true # Warn about same behavior tested at multiple levels
+
+  # Integration with BMad artifacts
+  use_test_design: true # Load test-design.md if exists (risk assessment)
+  use_tech_spec: true # Load tech-spec.md if exists (technical context)
+  use_prd: true # Load PRD.md if exists (requirements context)
+
+  # Output configuration
+  output_file: "{output_folder}/traceability-matrix.md"
+  generate_gate_yaml: true # Create gate YAML snippet with coverage summary
+  generate_coverage_badge: true # Create coverage badge/metric
+  update_story_file: true # Add traceability section to story file
+
+  # Quality gates
+  min_p0_coverage: 100 # Percentage (P0 must be 100% covered)
+  min_p1_coverage: 90 # Percentage
+  min_overall_coverage: 80 # Percentage
+
+  # Advanced options
+  auto_load_knowledge: true # Load traceability, risk-governance, test-quality fragments
+  include_code_coverage: false # Integrate with code coverage reports (Istanbul, NYC)
+  check_assertions: true # Verify explicit assertions in tests
+
+# Output configuration
+default_output_file: "{output_folder}/traceability-matrix.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test files, BMad artifacts
+  - write_file # Create traceability matrix, gate YAML
+  - list_files # Discover test files
+  - search_repo # Find tests by test ID, describe blocks
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required for BMad mode)"
+  - test_files: "Test suite for the feature (auto-discovered if not provided)"
+  - test_design: "Test design with risk/priority assessment (optional)"
+  - tech_spec: "Technical specification (optional)"
+  - existing_tests: "Current test suite for analysis"

 tags:
  - qa
  - traceability
  - test-architect
+  - coverage
+  - requirements

 execution_hints:
-  interactive: false
-  autonomous: true
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
  iterative: true
+
+web_bundle: false