docs: update test-design workflow to generate two documents for system-level mode

2026-01-30 04:32:02 +00:00 · 2026-01-22 12:21:00 -06:00
parent 9b9f43fcb9
commit 9e9387991d
15 changed files with 1270 additions and 101 deletions
--- a/docs/explanation/features/tea-overview.md
+++ b/docs/explanation/features/tea-overview.md
@@ -160,7 +160,7 @@ graph TB

 **TEA workflows:** `*framework` and `*ci` run once in Phase 3 after architecture. `*test-design` is **dual-mode**:

- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce `test-design-system.md` (testability review, ADR → test mapping, Architecturally Significant Requirements (ASRs), environment needs). Feeds the implementation-readiness gate.
+- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce TWO documents: `test-design-architecture.md` (for Architecture/Dev teams: testability gaps, ASRs, NFR requirements) + `test-design-qa.md` (for QA team: test execution recipe, coverage plan, Sprint 0 setup). Feeds the implementation-readiness gate.
 - **Epic-level (Phase 4):** Run per-epic to produce `test-design-epic-N.md` (risk, priorities, coverage plan).

 The Quick Flow track skips Phases 1 and 3.
--- a/docs/how-to/brownfield/use-tea-for-enterprise.md
+++ b/docs/how-to/brownfield/use-tea-for-enterprise.md
@@ -114,10 +114,9 @@ Focus areas:
 - Performance requirements (SLA: P99 <200ms)
 - Compliance (HIPAA PHI handling, audit logging)

-Output: test-design-system.md with:
- Security testing strategy
- Compliance requirement → test mapping
- Performance testing plan
+Output: TWO documents (system-level):
+- `test-design-architecture.md`: Security gaps, compliance requirements, performance SLOs for Architecture team
+- `test-design-qa.md`: Security testing strategy, compliance test mapping, performance testing plan for QA team
 - Audit logging validation
 ```

--- a/docs/how-to/workflows/run-test-design.md
+++ b/docs/how-to/workflows/run-test-design.md
@@ -55,20 +55,44 @@ For epic-level:

 ### 5. Review the Output

-TEA generates a comprehensive test design document.
+TEA generates test design document(s) based on mode.

 ## What You Get

-**System-Level Output (`test-design-system.md`):**
- Testability review of architecture
- ADR → test mapping
- Architecturally Significant Requirements (ASRs)
- Environment needs
- Test infrastructure recommendations
+**System-Level Output (TWO Documents):**

-**Epic-Level Output (`test-design-epic-N.md`):**
+TEA produces two focused documents for system-level mode:
+
+1. **`test-design-architecture.md`** (for Architecture/Dev teams)
+   - Purpose: Architectural concerns, testability gaps, NFR requirements
+   - Quick Guide with 🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY
+   - Risk assessment (high/medium/low-priority with scoring)
+   - Testability concerns and architectural gaps
+   - Risk mitigation plans for high-priority risks (≥6)
+   - Assumptions and dependencies
+
+2. **`test-design-qa.md`** (for QA team)
+   - Purpose: Test execution recipe, coverage plan, Sprint 0 setup
+   - Quick Reference for QA (Before You Start, Execution Order, Need Help)
+   - System architecture summary
+   - Test environment requirements (moved up - early in doc)
+   - Testability assessment (prerequisites checklist)
+   - Test levels strategy (unit/integration/E2E split)
+   - Test coverage plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+   - Sprint 0 setup requirements (blockers, infrastructure, environments)
+   - NFR readiness summary
+
+**Why Two Documents?**
+- **Architecture teams** can scan blockers in <5 min (Quick Guide format)
+- **QA teams** have actionable test recipes (step-by-step with checklists)
+- **No redundancy** between documents (cross-references instead of duplication)
+- **Clear separation** of concerns (what to deliver vs how to test)
+
+**Epic-Level Output (ONE Document):**
+
+**`test-design-epic-N.md`** (combined risk assessment + test plan)
 - Risk assessment for the epic
- Test priorities
+- Test priorities (P0-P3)
 - Coverage plan
 - Regression hotspots (for brownfield)
 - Integration risks
@@ -82,12 +106,25 @@ TEA generates a comprehensive test design document.
 | **Brownfield** | System-level + existing test baseline | Regression hotspots, integration risks |
 | **Enterprise** | Compliance-aware testability | Security/performance/compliance focus |

+## Examples
+
+**System-Level (Two Documents):**
+- `cluster-search/cluster-search-test-design-architecture.md` - Architecture doc with Quick Guide
+- `cluster-search/cluster-search-test-design-qa.md` - QA doc with test scenarios
+
+**Key Pattern:**
+- Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
+- QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
+- No duplication, just cross-references
+
 ## Tips

 - **Run system-level right after architecture** — Early testability review
 - **Run epic-level at the start of each epic** — Targeted test planning
 - **Update if ADRs change** — Keep test design aligned
 - **Use output to guide other workflows** — Feeds into `*atdd` and `*automate`
+- **Architecture teams review Architecture doc** — Focus on blockers and mitigation plans
+- **QA teams use QA doc as implementation guide** — Follow test scenarios and Sprint 0 checklist

 ## Next Steps

--- a/docs/reference/tea/commands.md
+++ b/docs/reference/tea/commands.md
@@ -72,17 +72,39 @@ Quick reference for all 8 TEA (Test Architect) workflows. For detailed step-by-s
 **Frequency:** Once (system), per epic (epic-level)

 **Modes:**
- **System-level:** Architecture testability review
- **Epic-level:** Per-epic risk assessment
+- **System-level:** Architecture testability review (TWO documents)
+- **Epic-level:** Per-epic risk assessment (ONE document)

 **Key Inputs:**
- Architecture/epic, requirements, ADRs
+- System-level: Architecture, PRD, ADRs
+- Epic-level: Epic, stories, acceptance criteria

 **Key Outputs:**
- `test-design-system.md` or `test-design-epic-N.md`
- Risk assessment (probability × impact scores)
- Test priorities (P0-P3)
- Coverage strategy
+
+**System-Level (TWO Documents):**
+- `test-design-architecture.md` - For Architecture/Dev teams
+  - Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
+  - Risk assessment with scoring
+  - Testability concerns and gaps
+  - Mitigation plans
+- `test-design-qa.md` - For QA team
+  - Test execution recipe
+  - Coverage plan (P0/P1/P2/P3 with checkboxes)
+  - Sprint 0 setup requirements
+  - NFR readiness summary
+
+**Epic-Level (ONE Document):**
+- `test-design-epic-N.md`
+  - Risk assessment (probability × impact scores)
+  - Test priorities (P0-P3)
+  - Coverage strategy
+  - Mitigation plans
+
+**Why Two Documents for System-Level?**
+- Architecture teams scan blockers in <5 min
+- QA teams have actionable test recipes
+- No redundancy (cross-references instead)
+- Clear separation (what to deliver vs how to test)

 **MCP Enhancement:** Exploratory mode (live browser UI discovery)

--- a/docs/reference/tea/configuration.md
+++ b/docs/reference/tea/configuration.md
@@ -197,7 +197,7 @@ output_folder: _bmad-output
 ```

 **TEA Output Files:**
- `test-design-system.md` (from *test-design system-level)
+- `test-design-architecture.md` + `test-design-qa.md` (from *test-design system-level - TWO documents)
 - `test-design-epic-N.md` (from *test-design epic-level)
 - `test-review.md` (from *test-review)
 - `traceability-matrix.md` (from *trace Phase 1)
--- a/docs/tutorials/getting-started/tea-lite-quickstart.md
+++ b/docs/tutorials/getting-started/tea-lite-quickstart.md
@@ -15,7 +15,7 @@ By the end of this 30-minute tutorial, you'll have:
 :::note[Prerequisites]
 - Node.js installed (v20 or later)
 - 30 minutes of focused time
- We'll use TodoMVC (<https://todomvc.com/examples/react/>) as our demo app
+- We'll use TodoMVC (<https://todomvc.com/examples/react/dist/>) as our demo app
 :::

 :::tip[Quick Path]
--- a/src/bmm/testarch/knowledge/adr-quality-readiness-checklist.md
+++ b/src/bmm/testarch/knowledge/adr-quality-readiness-checklist.md
@@ -0,0 +1,350 @@
+# ADR Quality Readiness Checklist
+
+**Purpose:** Standardized 8-category, 29-criteria framework for evaluating system testability and NFR compliance during architecture review (Phase 3) and NFR assessment.
+
+**When to Use:**
+- System-level test design (Phase 3): Identify testability gaps in architecture
+- NFR assessment workflow: Structured evaluation with evidence
+- Gate decisions: Quantifiable criteria (X/29 met = PASS/CONCERNS/FAIL)
+
+**How to Use:**
+1. For each criterion, assess status: ✅ Covered / ⚠️ Gap / ⬜ Not Assessed
+2. Document gap description if ⚠️
+3. Describe risk if criterion unmet
+4. Map to test scenarios (what tests validate this criterion)
+
+---
+
+## 1. Testability & Automation
+
+**Question:** Can we verify this effectively without manual toil?
+
+| #   | Criterion                                                                                                                                  | Risk if Unmet                                  | Typical Test Scenarios (P0-P2)                                                                          |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
+| 1.1 | **Isolation:** Can the service be tested with all downstream dependencies (DBs, APIs, Queues) mocked or stubbed?                           | Flaky tests; inability to test in isolation    | P1: Service runs with mocked DB, P1: Service runs with mocked API, P2: Integration tests with real deps |
+| 1.2 | **Headless Interaction:** Is 100% of the business logic accessible via API (REST/gRPC) to bypass the UI for testing?                       | Slow, brittle UI-based automation              | P0: All core logic callable via API, P1: No UI dependency for critical paths                            |
+| 1.3 | **State Control:** Do we have "Seeding APIs" or scripts to inject specific data states (e.g., "User with expired subscription") instantly? | Long setup times; inability to test edge cases | P0: Seed baseline data, P0: Inject edge case data states, P1: Cleanup after tests                       |
+| 1.4 | **Sample Requests:** Are there valid and invalid cURL/JSON sample requests provided in the design doc for QA to build upon?                | Ambiguity on how to consume the service        | P1: Valid request succeeds, P1: Invalid request fails with clear error                                  |
+
+**Common Gaps:**
+- No mock endpoints for external services (Athena, Milvus, third-party APIs)
+- Business logic tightly coupled to UI (requires E2E tests for everything)
+- No seeding APIs (manual database setup required)
+- ADR has architecture diagrams but no sample API requests
+
+**Mitigation Examples:**
+- 1.1 (Isolation): Provide mock endpoints, dependency injection, interface abstractions
+- 1.2 (Headless): Expose all business logic via REST/GraphQL APIs
+- 1.3 (State Control): Implement `/api/test-data` seeding endpoints (dev/staging only)
+- 1.4 (Sample Requests): Add "Example API Calls" section to ADR with cURL commands
+
+---
+
+## 2. Test Data Strategy
+
+**Question:** How do we fuel our tests safely?
+
+| #   | Criterion                                                                                                                             | Risk if Unmet                                | Typical Test Scenarios (P0-P2)                                                                 |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | ---------------------------------------------------------------------------------------------- |
+| 2.1 | **Segregation:** Does the design support multi-tenancy or specific headers (e.g., x-test-user) to keep test data out of prod metrics? | Skewed business analytics; data pollution    | P0: Multi-tenant isolation (customer A ≠ customer B), P1: Test data excluded from prod metrics |
+| 2.2 | **Generation:** Can we use synthetic data, or do we rely on scrubbing production data (GDPR/PII risk)?                                | Privacy violations; dependency on stale data | P0: Faker-based synthetic data, P1: No production data in tests                                |
+| 2.3 | **Teardown:** Is there a mechanism to "reset" the environment or clean up data after destructive tests?                               | Environment rot; subsequent test failures    | P0: Automated cleanup after tests, P2: Environment reset script                                |
+
+**Common Gaps:**
+- No `customer_id` scoping in queries (cross-tenant data leakage risk)
+- Reliance on production data dumps (GDPR/PII violations)
+- No cleanup mechanism (tests leave data behind, polluting environment)
+
+**Mitigation Examples:**
+- 2.1 (Segregation): Enforce `customer_id` in all queries, add test-specific headers
+- 2.2 (Generation): Use Faker library, create synthetic data generators, prohibit prod dumps
+- 2.3 (Teardown): Auto-cleanup hooks in test framework, isolated test customer IDs
+
+---
+
+## 3. Scalability & Availability
+
+**Question:** Can it grow, and will it stay up?
+
+| #   | Criterion                                                                                                                   | Risk if Unmet                                     | Typical Test Scenarios (P0-P2)                                                                       |
+| --- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| 3.1 | **Statelessness:** Is the service stateless? If not, how is session state replicated across instances?                      | Inability to auto-scale horizontally              | P1: Service restart mid-request → no data loss, P2: Horizontal scaling under load                    |
+| 3.2 | **Bottlenecks:** Have we identified the weakest link (e.g., database connections, API rate limits) under load?              | System crash during peak traffic                  | P2: Load test identifies bottleneck, P2: Connection pool exhaustion handled                          |
+| 3.3 | **SLA Definitions:** What is the target Availability (e.g., 99.9%) and does the architecture support redundancy to meet it? | Breach of contract; customer churn                | P1: Availability target defined, P2: Redundancy validated (multi-region/zone)                        |
+| 3.4 | **Circuit Breakers:** If a dependency fails, does this service fail fast or hang?                                           | Cascading failures taking down the whole platform | P1: Circuit breaker opens on 5 failures, P1: Auto-reset after recovery, P2: Timeout prevents hanging |
+
+**Common Gaps:**
+- Stateful session management (can't scale horizontally)
+- No load testing, bottlenecks unknown
+- SLA undefined or unrealistic (99.99% without redundancy)
+- No circuit breakers (cascading failures)
+
+**Mitigation Examples:**
+- 3.1 (Statelessness): Externalize session to Redis/JWT, design for horizontal scaling
+- 3.2 (Bottlenecks): Load test with k6, monitor connection pools, identify weak links
+- 3.3 (SLA): Define realistic SLA (99.9% = 43 min/month downtime), add redundancy
+- 3.4 (Circuit Breakers): Implement circuit breakers (Hystrix pattern), fail fast on errors
+
+---
+
+## 4. Disaster Recovery (DR)
+
+**Question:** What happens when the worst-case scenario occurs?
+
+| #   | Criterion                                                                                                            | Risk if Unmet                                  | Typical Test Scenarios (P0-P2)                                          |
+| --- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | ----------------------------------------------------------------------- |
+| 4.1 | **RTO/RPO:** What is the Recovery Time Objective (how long to restore) and Recovery Point Objective (max data loss)? | Extended outages; data loss liability          | P2: RTO defined and tested, P2: RPO validated (backup frequency)        |
+| 4.2 | **Failover:** Is region/zone failover automated or manual? Has it been practiced?                                    | "Heroics" required during outages; human error | P2: Automated failover works, P2: Manual failover documented and tested |
+| 4.3 | **Backups:** Are backups immutable and tested for restoration integrity?                                             | Ransomware vulnerability; corrupted backups    | P2: Backup restore succeeds, P2: Backup immutability validated          |
+
+**Common Gaps:**
+- RTO/RPO undefined (no recovery plan)
+- Failover never tested (manual process, prone to errors)
+- Backups exist but restoration never validated (untested backups = no backups)
+
+**Mitigation Examples:**
+- 4.1 (RTO/RPO): Define RTO (e.g., 4 hours) and RPO (e.g., 1 hour), document recovery procedures
+- 4.2 (Failover): Automate multi-region failover, practice failover drills quarterly
+- 4.3 (Backups): Implement immutable backups (S3 versioning), test restore monthly
+
+---
+
+## 5. Security
+
+**Question:** Is the design safe by default?
+
+| #   | Criterion                                                                                                        | Risk if Unmet                            | Typical Test Scenarios (P0-P2)                                                                                   |
+| --- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
+| 5.1 | **AuthN/AuthZ:** Does it implement standard protocols (OAuth2/OIDC)? Are permissions granular (Least Privilege)? | Unauthorized access; data leaks          | P0: OAuth flow works, P0: Expired token rejected, P0: Insufficient permissions return 403, P1: Scope enforcement |
+| 5.2 | **Encryption:** Is data encrypted at rest (DB) and in transit (TLS)?                                             | Compliance violations; data theft        | P1: Milvus data-at-rest encrypted, P1: TLS 1.2+ enforced, P2: Certificate rotation works                         |
+| 5.3 | **Secrets:** Are API keys/passwords stored in a Vault (not in code or config files)?                             | Credentials leaked in git history        | P1: No hardcoded secrets in code, P1: Secrets loaded from AWS Secrets Manager                                    |
+| 5.4 | **Input Validation:** Are inputs sanitized against Injection attacks (SQLi, XSS)?                                | System compromise via malicious payloads | P1: SQL injection sanitized, P1: XSS escaped, P2: Command injection prevented                                    |
+
+**Common Gaps:**
+- Weak authentication (no OAuth, hardcoded API keys)
+- No encryption at rest (plaintext in database)
+- Secrets in git (API keys, passwords in config files)
+- No input validation (vulnerable to SQLi, XSS, command injection)
+
+**Mitigation Examples:**
+- 5.1 (AuthN/AuthZ): Implement OAuth 2.1/OIDC, enforce least privilege, validate scopes
+- 5.2 (Encryption): Enable TDE (Transparent Data Encryption), enforce TLS 1.2+
+- 5.3 (Secrets): Migrate to AWS Secrets Manager/Vault, scan git history for leaks
+- 5.4 (Input Validation): Sanitize all inputs, use parameterized queries, escape outputs
+
+---
+
+## 6. Monitorability, Debuggability & Manageability
+
+**Question:** Can we operate and fix this in production?
+
+| #   | Criterion                                                                                            | Risk if Unmet                                      | Typical Test Scenarios (P0-P2)                                                                    |
+| --- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
+| 6.1 | **Tracing:** Does the service propagate W3C Trace Context / Correlation IDs for distributed tracing? | Impossible to debug errors across microservices    | P2: W3C Trace Context propagated (EventBridge → Lambda → Service), P2: Correlation ID in all logs |
+| 6.2 | **Logs:** Can log levels (INFO vs DEBUG) be toggled dynamically without a redeploy?                  | Inability to diagnose issues in real-time          | P2: Log level toggle works without redeploy, P2: Logs structured (JSON format)                    |
+| 6.3 | **Metrics:** Does it expose RED metrics (Rate, Errors, Duration) for Prometheus/Datadog?             | Flying blind regarding system health               | P2: /metrics endpoint exposes RED metrics, P2: Prometheus/Datadog scrapes successfully            |
+| 6.4 | **Config:** Is configuration externalized? Can we change behavior without a code build?              | Rigid system; full deploys needed for minor tweaks | P2: Config change without code build, P2: Feature flags toggle behavior                           |
+
+**Common Gaps:**
+- No distributed tracing (can't debug across microservices)
+- Static log levels (requires redeploy to enable DEBUG)
+- No metrics endpoint (blind to system health)
+- Configuration hardcoded (requires full deploy for minor changes)
+
+**Mitigation Examples:**
+- 6.1 (Tracing): Implement W3C Trace Context, add correlation IDs to all logs
+- 6.2 (Logs): Use dynamic log levels (environment variable), structured logging (JSON)
+- 6.3 (Metrics): Expose /metrics endpoint, track RED metrics (Rate, Errors, Duration)
+- 6.4 (Config): Externalize config (AWS SSM/AppConfig), use feature flags (LaunchDarkly)
+
+---
+
+## 7. QoS (Quality of Service) & QoE (Quality of Experience)
+
+**Question:** How does it perform, and how does it feel?
+
+| #   | Criterion                                                                                            | Risk if Unmet                                          | Typical Test Scenarios (P0-P2)                                                                  |
+| --- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------- |
+| 7.1 | **Latency (QoS):** What are the P95 and P99 latency targets?                                         | Slow API responses affecting throughput                | P3: P95 latency <Xs (load test), P3: P99 latency <Ys (load test)                                |
+| 7.2 | **Throttling (QoS):** Is there Rate Limiting to prevent "noisy neighbors" or DDoS?                   | Service degradation for all users due to one bad actor | P2: Rate limiting enforced, P2: 429 returned when limit exceeded                                |
+| 7.3 | **Perceived Performance (QoE):** Does the UI show optimistic updates or skeletons while loading?     | App feels sluggish to the user                         | P2: Skeleton/spinner shown while loading (E2E), P2: Optimistic updates (E2E)                    |
+| 7.4 | **Degradation (QoE):** If the service is slow, does it show a friendly message or a raw stack trace? | Poor user trust; frustration                           | P2: Friendly error message shown (not stack trace), P1: Error boundary catches exceptions (E2E) |
+
+**Common Gaps:**
+- Latency targets undefined (no SLOs)
+- No rate limiting (vulnerable to DDoS, noisy neighbors)
+- Poor perceived performance (blank screen while loading)
+- Raw error messages (stack traces exposed to users)
+
+**Mitigation Examples:**
+- 7.1 (Latency): Define SLOs (P95 <2s, P99 <5s), load test to validate
+- 7.2 (Throttling): Implement rate limiting (per-user, per-IP), return 429 with Retry-After
+- 7.3 (Perceived Performance): Add skeleton screens, optimistic updates, progressive loading
+- 7.4 (Degradation): Implement error boundaries, show friendly messages, log stack traces server-side
+
+---
+
+## 8. Deployability
+
+**Question:** How easily can we ship this?
+
+| #   | Criterion                                                                                  | Risk if Unmet                                          | Typical Test Scenarios (P0-P2)                                                 |
+| --- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------------------------ |
+| 8.1 | **Zero Downtime:** Does the design support Blue/Green or Canary deployments?               | Maintenance windows required (downtime)                | P2: Blue/Green deployment works, P2: Canary deployment gradual rollout         |
+| 8.2 | **Backward Compatibility:** Can we deploy the DB changes separately from the Code changes? | "Lock-step" deployments; high risk of breaking changes | P2: DB migration before code deploy, P2: Code handles old and new schema       |
+| 8.3 | **Rollback:** Is there an automated rollback trigger if Health Checks fail post-deploy?    | Prolonged outages after a bad deploy                   | P2: Health check fails → automated rollback, P2: Rollback completes within RTO |
+
+**Common Gaps:**
+- No zero-downtime strategy (requires maintenance window)
+- Tight coupling between DB and code (lock-step deployments)
+- No automated rollback (manual intervention required)
+
+**Mitigation Examples:**
+- 8.1 (Zero Downtime): Implement Blue/Green or Canary deployments, use feature flags
+- 8.2 (Backward Compatibility): Separate DB migrations from code deploys, support N-1 schema
+- 8.3 (Rollback): Automate rollback on health check failures, test rollback procedures
+
+---
+
+## Usage in Test Design Workflow
+
+**System-Level Mode (Phase 3):**
+
+**In test-design-architecture.md:**
+- Add "NFR Testability Requirements" section after ASRs
+- Use 8 categories with checkboxes (29 criteria)
+- For each criterion: Status (⬜ Not Assessed, ⚠️ Gap, ✅ Covered), Gap description, Risk if unmet
+- Example:
+
+```markdown
+## NFR Testability Requirements
+
+**Based on ADR Quality Readiness Checklist**
+
+### 1. Testability & Automation
+
+Can we verify this effectively without manual toil?
+
+| Criterion                                                       | Status         | Gap/Requirement                      | Risk if Unmet                           |
+| --------------------------------------------------------------- | -------------- | ------------------------------------ | --------------------------------------- |
+| ⬜ Isolation: Can service be tested with downstream deps mocked? | ⚠️ Gap          | No mock endpoints for Athena queries | Flaky tests; can't test in isolation    |
+| ⬜ Headless: 100% business logic accessible via API?             | ✅ Covered      | All MCP tools are REST APIs          | N/A                                     |
+| ⬜ State Control: Seeding APIs to inject data states?            | ⚠️ Gap          | Need `/api/test-data` endpoints      | Long setup times; can't test edge cases |
+| ⬜ Sample Requests: Valid/invalid cURL/JSON samples provided?    | ⬜ Not Assessed | Pending ADR Tool schemas finalized   | Ambiguity on how to consume service     |
+
+**Actions Required:**
+- [ ] Backend: Implement mock endpoints for Athena (R-002 blocker)
+- [ ] Backend: Implement `/api/test-data` seeding APIs (R-002 blocker)
+- [ ] PM: Finalize ADR Tool schemas with sample requests (Q4)
+```
+
+**In test-design-qa.md:**
+- Map each criterion to test scenarios
+- Add "NFR Test Coverage Plan" section with P0/P1/P2 priority for each category
+- Reference Architecture doc gaps
+- Example:
+
+```markdown
+## NFR Test Coverage Plan
+
+**Based on ADR Quality Readiness Checklist**
+
+### 1. Testability & Automation (4 criteria)
+
+**Prerequisites from Architecture doc:**
+- [ ] R-002: Test data seeding APIs implemented (blocker)
+- [ ] Mock endpoints available for Athena queries
+
+| Criterion                       | Test Scenarios                                                       | Priority | Test Count | Owner            |
+| ------------------------------- | -------------------------------------------------------------------- | -------- | ---------- | ---------------- |
+| Isolation: Mock downstream deps | Mock Athena queries, Mock Milvus, Service runs isolated              | P1       | 3          | Backend Dev + QA |
+| Headless: API-accessible logic  | All MCP tools callable via REST, No UI dependency for business logic | P0       | 5          | QA               |
+| State Control: Seeding APIs     | Create test customer, Seed 1000 transactions, Inject edge cases      | P0       | 4          | QA               |
+| Sample Requests: cURL examples  | Valid request succeeds, Invalid request fails with clear error       | P1       | 2          | QA               |
+
+**Detailed Test Scenarios:**
+- [ ] Isolation: Service runs with Athena mocked (returns fixture data)
+- [ ] Isolation: Service runs with Milvus mocked (returns ANN fixture)
+- [ ] State Control: Seed test customer with 1000 baseline transactions
+- [ ] State Control: Inject edge case (expired subscription user)
+```
+
+---
+
+## Usage in NFR Assessment Workflow
+
+**Output Structure:**
+
+```markdown
+# NFR Assessment: {Feature Name}
+
+**Based on ADR Quality Readiness Checklist (8 categories, 29 criteria)**
+
+## Assessment Summary
+
+| Category                      | Status     | Criteria Met | Evidence                               | Next Action          |
+| ----------------------------- | ---------- | ------------ | -------------------------------------- | -------------------- |
+| 1. Testability & Automation   | ⚠️ CONCERNS | 2/4          | Mock endpoints missing                 | Implement R-002      |
+| 2. Test Data Strategy         | ✅ PASS     | 3/3          | Faker + auto-cleanup                   | None                 |
+| 3. Scalability & Availability | ⚠️ CONCERNS | 1/4          | SLA undefined                          | Define SLA           |
+| 4. Disaster Recovery          | ⚠️ CONCERNS | 0/3          | No RTO/RPO defined                     | Define recovery plan |
+| 5. Security                   | ✅ PASS     | 4/4          | OAuth 2.1 + TLS + Vault + Sanitization | None                 |
+| 6. Monitorability             | ⚠️ CONCERNS | 2/4          | No metrics endpoint                    | Add /metrics         |
+| 7. QoS & QoE                  | ⚠️ CONCERNS | 1/4          | Latency targets undefined              | Define SLOs          |
+| 8. Deployability              | ✅ PASS     | 3/3          | Blue/Green + DB migrations + Rollback  | None                 |
+
+**Overall:** 14/29 criteria met (48%) → ⚠️ CONCERNS
+
+**Gate Decision:** CONCERNS (requires mitigation plan before GA)
+
+---
+
+## Detailed Assessment
+
+### 1. Testability & Automation (2/4 criteria met)
+
+**Question:** Can we verify this effectively without manual toil?
+
+| Criterion                   | Status | Evidence                 | Gap/Action               |
+| --------------------------- | ------ | ------------------------ | ------------------------ |
+| ⬜ Isolation: Mock deps      | ⚠️      | No Athena mock           | Implement mock endpoints |
+| ⬜ Headless: API-accessible  | ✅      | All MCP tools are REST   | N/A                      |
+| ⬜ State Control: Seeding    | ⚠️      | `/api/test-data` pending | Sprint 0 blocker         |
+| ⬜ Sample Requests: Examples | ⬜      | Pending schemas          | Finalize ADR Tools       |
+
+**Overall Status:** ⚠️ CONCERNS (2/4 criteria met)
+
+**Next Actions:**
+- [ ] Backend: Implement Athena mock endpoints (Sprint 0)
+- [ ] Backend: Implement `/api/test-data` (Sprint 0)
+- [ ] PM: Finalize sample requests (Sprint 1)
+
+{Repeat for all 8 categories}
+```
+
+---
+
+## Benefits
+
+**For test-design workflow:**
+- ✅ Standard NFR structure (same 8 categories every project)
+- ✅ Clear testability requirements for Architecture team
+- ✅ Direct mapping: criterion → requirement → test scenario
+- ✅ Comprehensive coverage (29 criteria = no blind spots)
+
+**For nfr-assess workflow:**
+- ✅ Structured assessment (not ad-hoc)
+- ✅ Quantifiable (X/29 criteria met)
+- ✅ Evidence-based (each criterion has evidence field)
+- ✅ Actionable (gaps → next actions with owners)
+
+**For Architecture teams:**
+- ✅ Clear checklist (29 yes/no questions)
+- ✅ Risk-aware (each criterion has "risk if unmet")
+- ✅ Scoped work (only implement what's needed, not everything)
+
+**For QA teams:**
+- ✅ Comprehensive test coverage (29 criteria → test scenarios)
+- ✅ Clear priorities (P0 for security/isolation, P1 for monitoring, etc.)
+- ✅ No ambiguity (each criterion has specific test scenarios)
+
--- a/src/bmm/testarch/tea-index.csv
+++ b/src/bmm/testarch/tea-index.csv
@@ -32,3 +32,4 @@ burn-in,Burn-in Runner,"Smart test selection, git diff for CI optimization","ci,
 network-error-monitor,Network Error Monitor,"HTTP 4xx/5xx detection for UI tests","monitoring,playwright-utils,ui",knowledge/network-error-monitor.md
 fixtures-composition,Fixtures Composition,"mergeTests composition patterns for combining utilities","fixtures,playwright-utils",knowledge/fixtures-composition.md
 api-testing-patterns,API Testing Patterns,"Pure API test patterns without browser: service testing, microservices, GraphQL","api,backend,service-testing,api-testing,microservices,graphql,no-browser",knowledge/api-testing-patterns.md
+adr-quality-readiness-checklist,ADR Quality Readiness Checklist,"8-category 29-criteria framework for ADR testability and NFR assessment","nfr,testability,adr,quality,assessment,checklist",knowledge/adr-quality-readiness-checklist.md
--- a/src/bmm/workflows/testarch/nfr-assess/instructions.md
+++ b/src/bmm/workflows/testarch/nfr-assess/instructions.md
@@ -51,7 +51,7 @@ This workflow performs a comprehensive assessment of non-functional requirements
 **Actions:**

 1. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
-   - `nfr-criteria.md` - Non-functional requirements criteria and thresholds (security, performance, reliability, maintainability with code examples, 658 lines, 4 examples)
+   - `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, test data, scalability, DR, security, monitorability, QoS/QoE, deployability, ~450 lines)
   - `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation (10-iteration detection, sharding, selective execution, 678 lines, 4 examples)
   - `test-quality.md` - Test quality expectations for maintainability (deterministic, isolated, explicit assertions, length/time limits, 658 lines, 5 examples)
   - `playwright-config.md` - Performance configuration patterns: parallelization, timeout standards, artifact output (722 lines, 5 examples)
@@ -75,13 +75,17 @@ This workflow performs a comprehensive assessment of non-functional requirements

 **Actions:**

-1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
-   - **Performance**: Response time, throughput, resource usage
-   - **Security**: Authentication, authorization, data protection, vulnerability scanning
-   - **Reliability**: Error handling, recovery, availability, fault tolerance
-   - **Maintainability**: Code quality, test coverage, documentation, technical debt
+1. Determine which NFR categories to assess using ADR Quality Readiness Checklist (8 standard categories):
+   - **1. Testability & Automation**: Isolation, headless interaction, state control, sample requests (4 criteria)
+   - **2. Test Data Strategy**: Segregation, generation, teardown (3 criteria)
+   - **3. Scalability & Availability**: Statelessness, bottlenecks, SLA definitions, circuit breakers (4 criteria)
+   - **4. Disaster Recovery**: RTO/RPO, failover, backups (3 criteria)
+   - **5. Security**: AuthN/AuthZ, encryption, secrets, input validation (4 criteria)
+   - **6. Monitorability, Debuggability & Manageability**: Tracing, logs, metrics, config (4 criteria)
+   - **7. QoS & QoE**: Latency, throttling, perceived performance, degradation (4 criteria)
+   - **8. Deployability**: Zero downtime, backward compatibility, rollback (3 criteria)

-2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
+2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance) beyond the 8 standard categories

 3. Gather thresholds for each NFR:
   - From tech-spec.md (primary source)
--- a/src/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
+++ b/src/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
@@ -355,13 +355,24 @@ Note: This assessment summarizes existing evidence; it does not run tests or CI

 ## Findings Summary

-| Category        | PASS             | CONCERNS             | FAIL             | Overall Status                      |
-| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
-| Performance     | {P_PASS_COUNT}   | {P_CONCERNS_COUNT}   | {P_FAIL_COUNT}   | {P_STATUS} {P_ICON}                 |
-| Security        | {S_PASS_COUNT}   | {S_CONCERNS_COUNT}   | {S_FAIL_COUNT}   | {S_STATUS} {S_ICON}                 |
-| Reliability     | {R_PASS_COUNT}   | {R_CONCERNS_COUNT}   | {R_FAIL_COUNT}   | {R_STATUS} {R_ICON}                 |
-| Maintainability | {M_PASS_COUNT}   | {M_CONCERNS_COUNT}   | {M_FAIL_COUNT}   | {M_STATUS} {M_ICON}                 |
-| **Total**       | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
+**Based on ADR Quality Readiness Checklist (8 categories, 29 criteria)**
+
+| Category | Criteria Met | PASS | CONCERNS | FAIL | Overall Status |
+|----------|--------------|------|----------|------|----------------|
+| 1. Testability & Automation | {T_MET}/4 | {T_PASS} | {T_CONCERNS} | {T_FAIL} | {T_STATUS} {T_ICON} |
+| 2. Test Data Strategy | {TD_MET}/3 | {TD_PASS} | {TD_CONCERNS} | {TD_FAIL} | {TD_STATUS} {TD_ICON} |
+| 3. Scalability & Availability | {SA_MET}/4 | {SA_PASS} | {SA_CONCERNS} | {SA_FAIL} | {SA_STATUS} {SA_ICON} |
+| 4. Disaster Recovery | {DR_MET}/3 | {DR_PASS} | {DR_CONCERNS} | {DR_FAIL} | {DR_STATUS} {DR_ICON} |
+| 5. Security | {SEC_MET}/4 | {SEC_PASS} | {SEC_CONCERNS} | {SEC_FAIL} | {SEC_STATUS} {SEC_ICON} |
+| 6. Monitorability, Debuggability & Manageability | {MON_MET}/4 | {MON_PASS} | {MON_CONCERNS} | {MON_FAIL} | {MON_STATUS} {MON_ICON} |
+| 7. QoS & QoE | {QOS_MET}/4 | {QOS_PASS} | {QOS_CONCERNS} | {QOS_FAIL} | {QOS_STATUS} {QOS_ICON} |
+| 8. Deployability | {DEP_MET}/3 | {DEP_PASS} | {DEP_CONCERNS} | {DEP_FAIL} | {DEP_STATUS} {DEP_ICON} |
+| **Total** | **{TOTAL_MET}/29** | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
+
+**Criteria Met Scoring:**
+- ≥26/29 (90%+) = Strong foundation
+- 20-25/29 (69-86%) = Room for improvement
+- <20/29 (<69%) = Significant gaps

 ---

@@ -372,11 +383,16 @@ nfr_assessment:
  date: '{DATE}'
  story_id: '{STORY_ID}'
  feature_name: '{FEATURE_NAME}'
+  adr_checklist_score: '{TOTAL_MET}/29'  # ADR Quality Readiness Checklist
  categories:
-    performance: '{PERFORMANCE_STATUS}'
-    security: '{SECURITY_STATUS}'
-    reliability: '{RELIABILITY_STATUS}'
-    maintainability: '{MAINTAINABILITY_STATUS}'
+    testability_automation: '{T_STATUS}'
+    test_data_strategy: '{TD_STATUS}'
+    scalability_availability: '{SA_STATUS}'
+    disaster_recovery: '{DR_STATUS}'
+    security: '{SEC_STATUS}'
+    monitorability: '{MON_STATUS}'
+    qos_qoe: '{QOS_STATUS}'
+    deployability: '{DEP_STATUS}'
  overall_status: '{OVERALL_STATUS}'
  critical_issues: { CRITICAL_COUNT }
  high_priority_issues: { HIGH_COUNT }
--- a/src/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/bmm/workflows/testarch/test-design/checklist.md
@@ -1,10 +1,17 @@
 # Test Design and Risk Assessment - Validation Checklist

-## Prerequisites
+## Prerequisites (Mode-Dependent)

+**System-Level Mode (Phase 3):**
+- [ ] PRD exists with functional and non-functional requirements
+- [ ] ADR (Architecture Decision Record) exists
+- [ ] Architecture document available (architecture.md or tech-spec)
+- [ ] Requirements are testable and unambiguous
+
+**Epic-Level Mode (Phase 4):**
 - [ ] Story markdown with clear acceptance criteria exists
 - [ ] PRD or epic documentation available
- [ ] Architecture documents available (optional)
+- [ ] Architecture documents available (test-design-architecture.md + test-design-qa.md from Phase 3, if exists)
 - [ ] Requirements are testable and unambiguous

 ## Process Steps
@@ -157,6 +164,80 @@
 - [ ] Risk assessment informs `gate` workflow criteria
 - [ ] Integrates with `ci` workflow execution order

+## System-Level Mode: Two-Document Validation
+
+**When in system-level mode (PRD + ADR input), validate BOTH documents:**
+
+### test-design-architecture.md
+
+- [ ] **Purpose statement** at top (serves as contract with Architecture team)
+- [ ] **Executive Summary** with scope, business context, architecture decisions, risk summary
+- [ ] **Quick Guide** section with three tiers:
+  - [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
+  - [ ] ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
+  - [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
+- [ ] **Risk Assessment** section
+  - [ ] Total risks identified count
+  - [ ] High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
+  - [ ] Medium and low-priority risks tables
+  - [ ] Risk category legend included
+- [ ] **Testability Concerns** section (if system has architectural constraints)
+  - [ ] Blockers to fast feedback table
+  - [ ] Explanation of why standard CI/CD may not apply (if applicable)
+  - [ ] Tiered testing strategy table (if forced by architecture)
+  - [ ] Architectural improvements needed (or acknowledgment system supports testing well)
+- [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
+  - [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
+- [ ] **Assumptions and Dependencies** section
+  - [ ] Assumptions list (numbered)
+  - [ ] Dependencies list with required dates
+  - [ ] Risks to plan with impact and contingency
+- [ ] **NO test implementation code** (long examples belong in QA doc)
+- [ ] **NO test scenario checklists** (belong in QA doc)
+- [ ] **Cross-references to QA doc** where appropriate
+
+### test-design-qa.md
+
+- [ ] **Purpose statement** at top (execution recipe for QA team)
+- [ ] **Quick Reference for QA** section
+  - [ ] Before You Start checklist
+  - [ ] Test Execution Order
+  - [ ] Need Help? guidance
+- [ ] **System Architecture Summary** (brief overview of services and data flow)
+- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
+  - [ ] Table with Local/Dev/Staging environments
+  - [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
+  - [ ] Code example provided
+- [ ] **Testability Assessment** with prerequisites checklist
+  - [ ] References Architecture doc blockers (not duplication)
+- [ ] **Test Levels Strategy** with unit/integration/E2E split
+  - [ ] System type identified
+  - [ ] Recommended split percentages with rationale
+  - [ ] Test count summary (P0/P1/P2/P3 totals)
+- [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
+  - [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
+  - [ ] Detailed test scenarios WITH CHECKBOXES
+  - [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
+- [ ] **Sprint 0 Setup Requirements**
+  - [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
+  - [ ] QA Test Infrastructure section (factories, fixtures)
+  - [ ] Test Environments section (Local, CI/CD, Staging, Production)
+  - [ ] Sprint 0 NFR Gates checklist
+  - [ ] Sprint 1 Items clearly separated
+- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
+  - [ ] Table with NFR categories, status, evidence, blocker, next action
+- [ ] **Cross-references to Architecture doc** (not duplication)
+- [ ] **NO architectural theory** (just reference Architecture doc)
+
+### Cross-Document Consistency
+
+- [ ] Both documents reference same risks by ID (R-001, R-002, etc.)
+- [ ] Both documents use consistent priority levels (P0, P1, P2, P3)
+- [ ] Both documents reference same Sprint 0 blockers
+- [ ] No duplicate content (cross-reference instead)
+- [ ] Dates and authors match across documents
+- [ ] ADR and PRD references consistent
+
 ## Completion Criteria

 **All must be true:**
@@ -166,7 +247,9 @@
 - [ ] All output validations passed
 - [ ] All quality checks passed
 - [ ] All integration points verified
- [ ] Output file complete and well-formatted
+- [ ] Output file(s) complete and well-formatted
+- [ ] **System-level mode:** Both documents validated (if applicable)
+- [ ] **Epic-level mode:** Single document validated (if applicable)
 - [ ] Team review scheduled (if required)

 ## Post-Workflow Actions
--- a/src/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/bmm/workflows/testarch/test-design/instructions.md
@@ -22,28 +22,61 @@ The workflow auto-detects which mode to use based on project phase.

 **Critical:** Determine mode before proceeding.

-### Mode Detection
+### Mode Detection (Flexible for Standalone Use)

-1. **Check for sprint-status.yaml**
-   - If `{implementation_artifacts}/sprint-status.yaml` exists → **Epic-Level Mode** (Phase 4)
-   - If NOT exists → Check workflow status
+TEA test-design workflow supports TWO modes, detected automatically:

-2. **Mode-Specific Requirements**
+1. **Check User Intent Explicitly (Priority 1)**
+   - Did user provide PRD + ADR? → **System-Level Mode**
+   - Did user provide Epic + Stories? → **Epic-Level Mode**
+   - If user intent is clear, use that mode regardless of file structure

-   **System-Level Mode (Phase 3 - Testability Review):**
-   - ✅ Architecture document exists (architecture.md or tech-spec)
-   - ✅ PRD exists with functional and non-functional requirements
-   - ✅ Epics documented (epics.md)
-   - ⚠️ Output: `{output_folder}/test-design-system.md`
+2. **Fallback to File-Based Detection (Priority 2 - BMad-Integrated)**
+   - Check for `{implementation_artifacts}/sprint-status.yaml`
+   - If exists → **Epic-Level Mode** (Phase 4, single document output)
+   - If NOT exists → **System-Level Mode** (Phase 3, TWO document outputs)

-   **Epic-Level Mode (Phase 4 - Per-Epic Planning):**
-   - ✅ Story markdown with acceptance criteria available
-   - ✅ PRD or epic documentation exists for context
-   - ✅ Architecture documents available (optional but recommended)
-   - ✅ Requirements are clear and testable
-   - ⚠️ Output: `{output_folder}/test-design-epic-{epic_num}.md`
+3. **If Ambiguous, ASK USER (Priority 3)**
+   - "I see you have [PRD/ADR/Epic/Stories]. Should I create:
+     - (A) System-level test design (PRD + ADR → Architecture doc + QA doc)?
+     - (B) Epic-level test design (Epic → Single test plan)?"

-**Halt Condition:** If mode cannot be determined or required files missing, HALT and notify user with missing prerequisites.
+**Mode Descriptions:**
+
+**System-Level Mode (PRD + ADR Input)**
+- **When to use:** Early in project (Phase 3 Solutioning), architecture being designed
+- **Input:** PRD, ADR, architecture.md (optional)
+- **Output:** TWO documents
+  - `test-design-architecture.md` (for Architecture/Dev teams)
+  - `test-design-qa.md` (for QA team)
+- **Focus:** Testability assessment, ASRs, NFR requirements, Sprint 0 setup
+
+**Epic-Level Mode (Epic + Stories Input)**
+- **When to use:** During implementation (Phase 4), per-epic planning
+- **Input:** Epic, Stories, tech-specs (optional)
+- **Output:** ONE document
+  - `test-design-epic-{N}.md` (combined risk assessment + test plan)
+- **Focus:** Risk assessment, coverage plan, execution order, quality gates
+
+**Key Insight: TEA Works Standalone OR Integrated**
+
+**Standalone (No BMad artifacts):**
+- User provides PRD + ADR → System-Level Mode
+- User provides Epic description → Epic-Level Mode
+- TEA doesn't mandate full BMad workflow
+
+**BMad-Integrated (Full workflow):**
+- BMad creates `sprint-status.yaml` → Automatic Epic-Level detection
+- BMad creates PRD, ADR, architecture.md → Automatic System-Level detection
+- TEA leverages BMad artifacts for richer context
+
+**Message to User:**
+> You don't need to follow full BMad methodology to use TEA test-design.
+> Just provide PRD + ADR for system-level, or Epic for epic-level.
+> TEA will auto-detect and produce appropriate documents.
+
+**Halt Condition:** If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user:
+- "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design"

 ---

@@ -70,7 +103,7 @@ The workflow auto-detects which mode to use based on project phase.
 3. **Load Knowledge Base Fragments (System-Level)**

   **Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
-   - `nfr-criteria.md` - NFR validation approach (security, performance, reliability, maintainability)
+   - `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.)
   - `test-levels-framework.md` - Test levels strategy guidance
   - `risk-governance.md` - Testability risk identification
   - `test-quality.md` - Quality standards and Definition of Done
@@ -91,7 +124,7 @@ The workflow auto-detects which mode to use based on project phase.
 2. **Load Architecture Context**
   - Read architecture.md for system design
   - Read tech-spec for implementation details
-   - Read test-design-system.md (if exists from Phase 3)
+   - Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design)
   - Identify technical constraints and dependencies
   - Note integration points and external systems

@@ -173,50 +206,128 @@ The workflow auto-detects which mode to use based on project phase.

   **Critical:** If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.

-6. **Output System-Level Test Design**
+6. **Output System-Level Test Design (TWO Documents)**

-   Write to `{output_folder}/test-design-system.md` containing:
+   **IMPORTANT:** System-level mode produces TWO documents instead of one:
+
+   **Document 1: test-design-architecture.md** (for Architecture/Dev teams)
+   - Purpose: Architectural concerns, testability gaps, NFR requirements
+   - Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers
+   - Focus: What architecture must deliver for testability
+   - Template: `test-design-architecture-template.md`
+
+   **Document 2: test-design-qa.md** (for QA team)
+   - Purpose: Test execution recipe, coverage plan, Sprint 0 setup
+   - Audience: QA Engineers, Test Automation Engineers, QA Leads
+   - Focus: How QA will execute tests
+   - Template: `test-design-qa-template.md`
+
+   **Standard Structures (REQUIRED):**
+
+   **test-design-architecture.md sections (in this order):**
+   1. Executive Summary (scope, business context, architecture, risk summary)
+   2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
+   3. Risk Assessment (high/medium/low-priority risks with scoring)
+   4. Testability Concerns and Architectural Gaps (if system has constraints)
+   5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
+   6. Assumptions and Dependencies
+
+   **test-design-qa.md sections (in this order):**
+   1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
+   2. System Architecture Summary (brief overview)
+   3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
+   4. Testability Assessment (lightweight prerequisites checklist)
+   5. Test Levels Strategy (unit/integration/E2E split with rationale)
+   6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+   7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
+   8. NFR Readiness Summary (reference to Architecture doc)
+
+   **Content Guidelines:**
+
+   **Architecture doc (DO):**
+   - ✅ Risk scoring visible (Probability × Impact = Score)
+   - ✅ Clear ownership (each blocker/ASR has owner + timeline)
+   - ✅ Testability requirements (what architecture must support)
+   - ✅ Mitigation plans (for each high-risk item ≥6)
+   - ✅ Short code examples (5-10 lines max showing what to support)
+
+   **Architecture doc (DON'T):**
+   - ❌ NO long test code examples (belongs in QA doc)
+   - ❌ NO test scenario checklists (belongs in QA doc)
+   - ❌ NO implementation details (how QA will test)
+
+   **QA doc (DO):**
+   - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
+   - ✅ Environment setup (Sprint 0 checklist with blockers)
+   - ✅ Tool setup (factories, fixtures, frameworks)
+   - ✅ Cross-references to Architecture doc (not duplication)
+
+   **QA doc (DON'T):**
+   - ❌ NO architectural theory (just reference Architecture doc)
+   - ❌ NO ASR explanations (link to Architecture doc instead)
+   - ❌ NO duplicate risk assessments (reference Architecture doc)
+
+   **Anti-Patterns to Avoid (Cross-Document Redundancy):**
+
+   ❌ **DON'T duplicate OAuth requirements:**
+   - Architecture doc: Explain OAuth 2.1 flow in detail
+   - QA doc: Re-explain why OAuth 2.1 is required
+
+   ✅ **DO cross-reference instead:**
+   - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
+   - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
+
+   **Markdown Cross-Reference Syntax Examples:**

   ```markdown
-   # System-Level Test Design
+   # In test-design-architecture.md
+
+   ### 🚨 R-001: Multi-Tenant Isolation (Score: 9)
+
+   **Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests---security-critical) for detailed scenarios)
+
+   ---
+
+   # In test-design-qa.md

   ## Testability Assessment

-   - Controllability: [PASS/CONCERNS/FAIL with details]
-   - Observability: [PASS/CONCERNS/FAIL with details]
-   - Reliability: [PASS/CONCERNS/FAIL with details]
+   **Prerequisites from Architecture Doc:**
+   - [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#-r-001-multi-tenant-isolation-score-9) for mitigation plan)
+   - [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#-blockers---team-must-decide-cant-proceed-without))

-   ## Architecturally Significant Requirements (ASRs)
+   ## Sprint 0 Setup Requirements

-   [Risk-scored quality requirements]
-
-   ## Test Levels Strategy
-
-   - Unit: [X%] - [Rationale]
-   - Integration: [Y%] - [Rationale]
-   - E2E: [Z%] - [Rationale]
-
-   ## NFR Testing Approach
-
-   - Security: [Approach with tools]
-   - Performance: [Approach with tools]
-   - Reliability: [Approach with tools]
-   - Maintainability: [Approach with tools]
-
-   ## Test Environment Requirements
-
-   [Infrastructure needs based on deployment architecture]
-
-   ## Testability Concerns (if any)
-
-   [Blockers or concerns that should inform solutioning gate check]
-
-   ## Recommendations for Sprint 0
-
-   [Specific actions for *framework and *ci workflows]
+   **Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plans
   ```

-**After System-Level Mode:** Skip to Step 4 (Generate Deliverables) - Steps 2-3 are epic-level only.
+   **Key Points:**
+   - Use relative links: `[Link Text](test-design-qa.md#section-anchor)`
+   - Anchor format: lowercase, hyphens for spaces, remove emojis/special chars
+   - Example anchor: `### 🚨 R-001: Title` → `#-r-001-title`
+
+   ❌ **DON'T put long code examples in Architecture doc:**
+   - Example: 50+ lines of test implementation
+
+   ✅ **DO keep examples SHORT in Architecture doc:**
+   - Example: 5-10 lines max showing what architecture must support
+   - Full implementation goes in QA doc
+
+   ❌ **DON'T repeat same note 10+ times:**
+   - Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section
+
+   ✅ **DO consolidate repeated notes:**
+   - Single timing note at top
+   - Reference briefly throughout: "(pessimistic)"
+
+   **Write Both Documents:**
+   - Use `test-design-architecture-template.md` for Architecture doc
+   - Use `test-design-qa-template.md` for QA doc
+   - Follow standard structures defined above
+   - Cross-reference between docs (no duplication)
+   - Validate against checklist.md (System-Level Mode section)
+
+**After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.

 ---

--- a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
@@ -0,0 +1,216 @@
+# Test Design for Architecture: {Feature Name}
+
+**Purpose:** Architectural concerns, testability gaps, and NFR requirements for review by Architecture/Dev teams. Serves as a contract between QA and Engineering on what must be addressed before test development begins.
+
+**Date:** {date}
+**Author:** {author}
+**Status:** Architecture Review Pending
+**Project:** {project_name}
+**PRD Reference:** {prd_link}
+**ADR Reference:** {adr_link}
+
+---
+
+## Executive Summary
+
+**Scope:** {Brief description of feature scope}
+
+**Business Context** (from PRD):
+- **Revenue/Impact:** {Business metrics if applicable}
+- **Problem:** {Problem being solved}
+- **GA Launch:** {Target date or timeline}
+
+**Architecture** (from ADR {adr_number}):
+- **Key Decision 1:** {e.g., OAuth 2.1 authentication}
+- **Key Decision 2:** {e.g., Centralized MCP Server pattern}
+- **Key Decision 3:** {e.g., Stack: TypeScript, SDK v1.x}
+
+**Expected Scale** (from ADR):
+- {RPS, volume, users, etc.}
+
+**Risk Summary:**
+- **Total risks**: {N}
+- **High-priority (≥6)**: {N} risks requiring immediate mitigation
+- **Test effort**: ~{N} tests (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
+
+---
+
+## Quick Guide
+
+### 🚨 BLOCKERS - Team Must Decide (Can't Proceed Without)
+
+**Sprint 0 Critical Path** - These MUST be completed before QA can write integration tests:
+
+1. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+2. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+3. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+
+**What we need from team:** Complete these {N} items in Sprint 0 or test development is blocked.
+
+---
+
+### ⚠️ HIGH PRIORITY - Team Should Validate (We Provide Recommendation, You Approve)
+
+1. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+2. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+3. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+
+**What we need from team:** Review recommendations and approve (or suggest changes).
+
+---
+
+### 📋 INFO ONLY - Solutions Provided (Review, No Decisions Needed)
+
+1. **Test strategy**: {Test level split} ({Rationale})
+2. **Tooling**: {Test frameworks and utilities}
+3. **Tiered CI/CD**: {Execution tiers with timing}
+4. **Coverage**: ~{N} test scenarios prioritized P0-P3 with risk-based classification
+5. **Quality gates**: {Pass criteria}
+
+**What we need from team:** Just review and acknowledge (we already have the solution).
+
+---
+
+## For Architects and Devs - Open Topics 👷
+
+### Risk Assessment
+
+**Total risks identified**: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
+
+#### High-Priority Risks (Score ≥6) - IMMEDIATE ATTENTION
+
+| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner | Timeline |
+|---------|----------|-------------|-------------|--------|-------|------------|-------|----------|
+| **{R-ID}** | **{CAT}** | {Description} | {1-3} | {1-3} | **{Score}** | {Mitigation strategy} | {Owner} | {Date} |
+
+#### Medium-Priority Risks (Score 3-4)
+
+| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner |
+|---------|----------|-------------|-------------|--------|-------|------------|-------|
+| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | {Mitigation} | {Owner} |
+
+#### Low-Priority Risks (Score 1-2)
+
+| Risk ID | Category | Description | Probability | Impact | Score | Action |
+|---------|----------|-------------|-------------|--------|-------|--------|
+| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | Monitor |
+
+#### Risk Category Legend
+
+- **TECH**: Technical/Architecture (flaws, integration, scalability)
+- **SEC**: Security (access controls, auth, data exposure)
+- **PERF**: Performance (SLA violations, degradation, resource limits)
+- **DATA**: Data Integrity (loss, corruption, inconsistency)
+- **BUS**: Business Impact (UX harm, logic errors, revenue)
+- **OPS**: Operations (deployment, config, monitoring)
+
+---
+
+### Testability Concerns and Architectural Gaps
+
+**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
+
+#### Blockers to Fast Feedback
+
+| Blocker | Impact | Current Mitigation | Ideal Solution |
+|---------|--------|-------------------|----------------|
+| **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
+
+#### Why This Matters
+
+**Standard CI/CD expectations:**
+- Full test suite on every commit (~5-15 min feedback)
+- Parallel test execution (isolated test data per worker)
+- Ephemeral test environments (spin up → test → tear down)
+- Fast feedback loop (devs stay in flow state)
+
+**Current reality for {Feature}:**
+- {Actual situation - what's different from standard}
+
+#### Tiered Testing Strategy
+
+{If forced by architecture, explain. If standard approach works, state that.}
+
+| Tier | When | Duration | Coverage | Why Not Full Suite? |
+|------|------|----------|----------|---------------------|
+| **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
+| **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
+| **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
+| **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
+
+**Note**: {Any timing assumptions or constraints}
+
+#### Architectural Improvements Needed
+
+{If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
+
+1. **{Improvement name}**
+   - {What to change}
+   - **Impact**: {How it improves testing}
+
+#### Acceptance of Trade-offs
+
+For {Feature} Phase 1, the team accepts:
+- **{Trade-off 1}** ({Reasoning})
+- **{Trade-off 2}** ({Reasoning})
+- ⚠️ **{Known limitation}** ({Why acceptable for now})
+
+This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
+
+---
+
+### Risk Mitigation Plans (High-Priority Risks ≥6)
+
+**Purpose**: Detailed mitigation strategies for all {N} high-priority risks (score ≥6). These risks MUST be addressed before {GA launch date or milestone}.
+
+#### {R-ID}: {Risk Description} (Score: {Score}) - {CRITICALITY LEVEL}
+
+**Mitigation Strategy:**
+1. {Step 1}
+2. {Step 2}
+3. {Step 3}
+
+**Owner:** {Owner}
+**Timeline:** {Sprint or date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {How to verify mitigation is effective}
+
+---
+
+{Repeat for all high-priority risks}
+
+---
+
+### Assumptions and Dependencies
+
+#### Assumptions
+
+1. {Assumption about architecture or requirements}
+2. {Assumption about team or timeline}
+3. {Assumption about scope or constraints}
+
+#### Dependencies
+
+1. {Dependency} - Required by {date/sprint}
+2. {Dependency} - Required by {date/sprint}
+
+#### Risks to Plan
+
+- **Risk**: {Risk to the test plan itself}
+  - **Impact**: {How it affects testing}
+  - **Contingency**: {Backup plan}
+
+---
+
+**End of Architecture Document**
+
+**Next Steps for Architecture Team:**
+1. Review Quick Guide (🚨/⚠️/📋) and prioritize blockers
+2. Assign owners and timelines for high-priority risks (≥6)
+3. Validate assumptions and dependencies
+4. Provide feedback to QA on testability gaps
+
+**Next Steps for QA Team:**
+1. Wait for Sprint 0 blockers to be resolved
+2. Refer to companion QA doc (test-design-qa.md) for test scenarios
+3. Begin test infrastructure setup (factories, fixtures, environments)
--- a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
@@ -0,0 +1,315 @@
+# Test Design for QA: {Feature Name}
+
+**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
+
+**Date:** {date}
+**Author:** {author}
+**Status:** Draft / Ready for Implementation
+**Project:** {project_name}
+**PRD Reference:** {prd_link}
+**ADR Reference:** {adr_link}
+
+---
+
+## Quick Reference for QA
+
+**Before You Start:**
+- [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
+- [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
+- [ ] Confirm test infrastructure ready (factories, fixtures, environments)
+
+**Test Execution Order:**
+1. **Smoke tests** (<5 min) - Fast feedback on critical paths
+2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
+3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
+4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
+
+**Need Help?**
+- Blockers: See Architecture doc "Quick Guide" for mitigation plans
+- Test scenarios: See "Test Coverage Plan" section below
+- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
+
+---
+
+## System Architecture Summary
+
+**Data Pipeline:**
+{Brief description of system flow}
+
+**Key Services:**
+- **{Service 1}**: {Purpose and key responsibilities}
+- **{Service 2}**: {Purpose and key responsibilities}
+- **{Service 3}**: {Purpose and key responsibilities}
+
+**Data Stores:**
+- **{Database 1}**: {What it stores}
+- **{Database 2}**: {What it stores}
+
+**Expected Scale** (from ADR):
+- {Key metrics: RPS, volume, users, etc.}
+
+---
+
+## Test Environment Requirements
+
+**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
+
+| Environment | Database | Test Data Strategy | Purpose |
+|-------------|----------|-------------------|---------|
+| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
+| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
+| **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
+
+**Key Principles:**
+- **Shared database per environment** (no ephemeral)
+- **Randomization for isolation** (faker-based unique IDs)
+- **Parallel-safe** (concurrent test runs don't conflict)
+- **Self-cleaning** (tests delete their own data)
+- **Shift-left** (test against real DBs early)
+
+**Example:**
+
+```typescript
+import { faker } from "@faker-js/faker";
+
+test("example with randomized test data @p0", async ({ apiRequest }) => {
+  const testData = {
+    id: `test-${faker.string.uuid()}`,
+    customerId: `test-customer-${faker.string.alphanumeric(8)}`,
+    // ... unique test data
+  };
+
+  // Seed, test, cleanup
+});
+```
+
+---
+
+## Testability Assessment
+
+**Prerequisites from Architecture Doc:**
+
+Verify these blockers are resolved before test development:
+- [ ] {Blocker 1} (see Architecture doc Quick Guide → 🚨 BLOCKERS)
+- [ ] {Blocker 2}
+- [ ] {Blocker 3}
+
+**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
+
+---
+
+## Test Levels Strategy
+
+**System Type:** {API-heavy / UI-heavy / Mixed backend system}
+
+**Recommended Split:**
+- **Unit Tests: {X}%** - {What to unit test}
+- **Integration/API Tests: {X}%** - ⭐ **PRIMARY FOCUS** - {What to integration test}
+- **E2E Tests: {X}%** - {What to E2E test}
+
+**Rationale:** {Why this split makes sense for this system}
+
+**Test Count Summary:**
+- P0: ~{N} tests - Critical paths, run on every commit
+- P1: ~{N} tests - Important features, run on PR to main
+- P2: ~{N} tests - Edge cases, run nightly/weekly
+- P3: ~{N} tests - Exploratory, run on-demand
+- **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
+
+---
+
+## Test Coverage Plan
+
+**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
+
+### P0 (Critical) - Run on every commit (~{X} min)
+
+**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
+
+**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
+
+**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
+
+**Key Smoke Tests** (subset of P0, run first for fast feedback):
+- {Smoke test 1} - {Duration}
+- {Smoke test 2} - {Duration}
+- {Smoke test 3} - {Duration}
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P0:** ~{N} tests (~{X} weeks)
+
+#### P0 Test Scenarios (Detailed)
+
+**1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
+
+- [ ] {Scenario 1 with checkbox}
+- [ ] {Scenario 2}
+- [ ] {Scenario 3}
+
+**2. {Test Category 2} ({N} tests)**
+
+- [ ] {Scenario 1}
+- [ ] {Scenario 2}
+
+{Continue for all P0 categories}
+
+---
+
+### P1 (High) - Run on PR to main (~{X} min additional)
+
+**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
+
+**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
+
+**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P1:** ~{N} tests (~{X} weeks)
+
+#### P1 Test Scenarios (Detailed)
+
+**1. {Test Category} ({N} tests)**
+
+- [ ] {Scenario 1}
+- [ ] {Scenario 2}
+
+{Continue for all P1 categories}
+
+---
+
+### P2 (Medium) - Run nightly/weekly (~{X} min)
+
+**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
+
+**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
+
+**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P2:** ~{N} tests (~{X} weeks)
+
+---
+
+### P3 (Low) - Run on-demand (exploratory)
+
+**Execution:** Manual trigger or weekly scheduled run, performance testing
+
+**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
+
+**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
+
+| Requirement | Test Level | Test Count | Owner | Notes |
+|-------------|------------|------------|-------|-------|
+| {Requirement 1} | {Level} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {N} | QA | {Notes} |
+
+**Total P3:** ~{N} tests (~{X} days)
+
+---
+
+### Coverage Matrix (Requirements → Tests)
+
+| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+|-------------|------------|----------|-----------|------------|-------|
+| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+
+---
+
+## Sprint 0 Setup Requirements
+
+**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
+
+### Architecture/Backend Blockers (from Architecture doc)
+
+**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
+
+1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
+   - {What needs to be provided}
+   - **Details:** Architecture doc {Risk-ID} mitigation plan
+
+2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
+   - {What needs to be provided}
+   - **Details:** Architecture doc {Risk-ID} mitigation plan
+
+### QA Test Infrastructure
+
+1. **{Factory/Fixture Name}** - QA
+   - Faker-based generator: `{function_signature}`
+   - Auto-cleanup after tests
+
+2. **{Entity} Fixtures** - QA
+   - Seed scripts for {states/scenarios}
+   - Isolated {id_pattern} per test
+
+### Test Environments
+
+**Local:** {Setup details - Docker, LocalStack, etc.}
+
+**CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
+
+**Staging:** {Setup details - shared multi-tenant, nightly E2E}
+
+**Production:** {Setup details - feature flags, canary transactions}
+
+**Sprint 0 NFR Gates** (MUST complete before integration testing):
+- [ ] {Gate 1}: {Description} (Owner) 🚨
+- [ ] {Gate 2}: {Description} (Owner) 🚨
+- [ ] {Gate 3}: {Description} (Owner) 🚨
+
+### Sprint 1 Items (Not Sprint 0)
+
+- **{Item 1}** ({Owner}): {Description}
+- **{Item 2}** ({Owner}): {Description}
+
+**Sprint 1 NFR Gates** (MUST complete before GA):
+- [ ] {Gate 1}: {Description} (Owner)
+- [ ] {Gate 2}: {Description} (Owner)
+
+---
+
+## NFR Readiness Summary
+
+**Based on Architecture Doc Risk Assessment**
+
+| NFR Category | Status | Evidence Status | Blocker | Next Action |
+|--------------|--------|-----------------|---------|-------------|
+| **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Performance** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Reliability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Data Integrity** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Scalability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Monitorability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Maintainability** | PASS | Test design complete (~{N} scenarios) | None | Proceed with implementation |
+
+**Total:** {N} PASS, {N} CONCERNS across {N} categories
+
+---
+
+**End of QA Document**
+
+**Next Steps for QA Team:**
+1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
+2. Set up test infrastructure (factories, fixtures, environments)
+3. Begin test implementation following priority order (P0 → P1 → P2 → P3)
+4. Run smoke tests first for fast feedback
+5. Track progress using test scenario checklists above
+
+**Next Steps for Architecture Team:**
+1. Monitor Sprint 0 blocker resolution
+2. Provide support for QA infrastructure setup if needed
+3. Review test results and address any newly discovered testability gaps
--- a/src/bmm/workflows/testarch/test-design/workflow.yaml
+++ b/src/bmm/workflows/testarch/test-design/workflow.yaml
@@ -15,6 +15,9 @@ date: system-generated
 installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-design"
 instructions: "{installed_path}/instructions.md"
 validation: "{installed_path}/checklist.md"
+# Note: Template selection is mode-based (see instructions.md Step 1.5):
+#   - System-level: test-design-architecture-template.md + test-design-qa-template.md
+#   - Epic-level: test-design-template.md (unchanged)
 template: "{installed_path}/test-design-template.md"

 # Variables and inputs
@@ -26,13 +29,25 @@ variables:
 # Note: Actual output file determined dynamically based on mode detection
 # Declared outputs for new workflow format
 outputs:
-  - id: system-level
-    description: "System-level testability review (Phase 3)"
-    path: "{output_folder}/test-design-system.md"
+  # System-Level Mode (Phase 3) - TWO documents
+  - id: test-design-architecture
+    description: "System-level test architecture: Architectural concerns, testability gaps, NFR requirements for Architecture/Dev teams"
+    path: "{output_folder}/test-design-architecture.md"
+    mode: system-level
+    audience: architecture
+
+  - id: test-design-qa
+    description: "System-level test design: Test execution recipe, coverage plan, Sprint 0 setup for QA team"
+    path: "{output_folder}/test-design-qa.md"
+    mode: system-level
+    audience: qa
+
+  # Epic-Level Mode (Phase 4) - ONE document (unchanged)
  - id: epic-level
    description: "Epic-level test plan (Phase 4)"
    path: "{output_folder}/test-design-epic-{epic_num}.md"
-default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+    mode: epic-level
+# Note: No default_output_file - mode detection determines which outputs to write

 # Required tools
 required_tools: