docs: update test-design workflow to generate two documents for system-level mode

This commit is contained in:
murat
2026-01-22 12:21:00 -06:00
parent 9b9f43fcb9
commit 9e9387991d
15 changed files with 1270 additions and 101 deletions

View File

@@ -160,7 +160,7 @@ graph TB
**TEA workflows:** `*framework` and `*ci` run once in Phase 3 after architecture. `*test-design` is **dual-mode**:
- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce `test-design-system.md` (testability review, ADR → test mapping, Architecturally Significant Requirements (ASRs), environment needs). Feeds the implementation-readiness gate.
- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce TWO documents: `test-design-architecture.md` (for Architecture/Dev teams: testability gaps, ASRs, NFR requirements) + `test-design-qa.md` (for QA team: test execution recipe, coverage plan, Sprint 0 setup). Feeds the implementation-readiness gate.
- **Epic-level (Phase 4):** Run per-epic to produce `test-design-epic-N.md` (risk, priorities, coverage plan).
The Quick Flow track skips Phases 1 and 3.

View File

@@ -114,10 +114,9 @@ Focus areas:
- Performance requirements (SLA: P99 <200ms)
- Compliance (HIPAA PHI handling, audit logging)
Output: test-design-system.md with:
- Security testing strategy
- Compliance requirement → test mapping
- Performance testing plan
Output: TWO documents (system-level):
- `test-design-architecture.md`: Security gaps, compliance requirements, performance SLOs for Architecture team
- `test-design-qa.md`: Security testing strategy, compliance test mapping, performance testing plan for QA team
- Audit logging validation
```

View File

@@ -55,20 +55,44 @@ For epic-level:
### 5. Review the Output
TEA generates a comprehensive test design document.
TEA generates test design document(s) based on mode.
## What You Get
**System-Level Output (`test-design-system.md`):**
- Testability review of architecture
- ADR → test mapping
- Architecturally Significant Requirements (ASRs)
- Environment needs
- Test infrastructure recommendations
**System-Level Output (TWO Documents):**
**Epic-Level Output (`test-design-epic-N.md`):**
TEA produces two focused documents for system-level mode:
1. **`test-design-architecture.md`** (for Architecture/Dev teams)
- Purpose: Architectural concerns, testability gaps, NFR requirements
- Quick Guide with 🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY
- Risk assessment (high/medium/low-priority with scoring)
- Testability concerns and architectural gaps
- Risk mitigation plans for high-priority risks (≥6)
- Assumptions and dependencies
2. **`test-design-qa.md`** (for QA team)
- Purpose: Test execution recipe, coverage plan, Sprint 0 setup
- Quick Reference for QA (Before You Start, Execution Order, Need Help)
- System architecture summary
- Test environment requirements (moved up - early in doc)
- Testability assessment (prerequisites checklist)
- Test levels strategy (unit/integration/E2E split)
- Test coverage plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
- Sprint 0 setup requirements (blockers, infrastructure, environments)
- NFR readiness summary
**Why Two Documents?**
- **Architecture teams** can scan blockers in <5 min (Quick Guide format)
- **QA teams** have actionable test recipes (step-by-step with checklists)
- **No redundancy** between documents (cross-references instead of duplication)
- **Clear separation** of concerns (what to deliver vs how to test)
**Epic-Level Output (ONE Document):**
**`test-design-epic-N.md`** (combined risk assessment + test plan)
- Risk assessment for the epic
- Test priorities
- Test priorities (P0-P3)
- Coverage plan
- Regression hotspots (for brownfield)
- Integration risks
@@ -82,12 +106,25 @@ TEA generates a comprehensive test design document.
| **Brownfield** | System-level + existing test baseline | Regression hotspots, integration risks |
| **Enterprise** | Compliance-aware testability | Security/performance/compliance focus |
## Examples
**System-Level (Two Documents):**
- `cluster-search/cluster-search-test-design-architecture.md` - Architecture doc with Quick Guide
- `cluster-search/cluster-search-test-design-qa.md` - QA doc with test scenarios
**Key Pattern:**
- Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
- QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
- No duplication, just cross-references
## Tips
- **Run system-level right after architecture** Early testability review
- **Run epic-level at the start of each epic** Targeted test planning
- **Update if ADRs change** Keep test design aligned
- **Use output to guide other workflows** Feeds into `*atdd` and `*automate`
- **Architecture teams review Architecture doc** Focus on blockers and mitigation plans
- **QA teams use QA doc as implementation guide** Follow test scenarios and Sprint 0 checklist
## Next Steps

View File

@@ -72,17 +72,39 @@ Quick reference for all 8 TEA (Test Architect) workflows. For detailed step-by-s
**Frequency:** Once (system), per epic (epic-level)
**Modes:**
- **System-level:** Architecture testability review
- **Epic-level:** Per-epic risk assessment
- **System-level:** Architecture testability review (TWO documents)
- **Epic-level:** Per-epic risk assessment (ONE document)
**Key Inputs:**
- Architecture/epic, requirements, ADRs
- System-level: Architecture, PRD, ADRs
- Epic-level: Epic, stories, acceptance criteria
**Key Outputs:**
- `test-design-system.md` or `test-design-epic-N.md`
- Risk assessment (probability × impact scores)
- Test priorities (P0-P3)
- Coverage strategy
**System-Level (TWO Documents):**
- `test-design-architecture.md` - For Architecture/Dev teams
- Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
- Risk assessment with scoring
- Testability concerns and gaps
- Mitigation plans
- `test-design-qa.md` - For QA team
- Test execution recipe
- Coverage plan (P0/P1/P2/P3 with checkboxes)
- Sprint 0 setup requirements
- NFR readiness summary
**Epic-Level (ONE Document):**
- `test-design-epic-N.md`
- Risk assessment (probability × impact scores)
- Test priorities (P0-P3)
- Coverage strategy
- Mitigation plans
**Why Two Documents for System-Level?**
- Architecture teams scan blockers in <5 min
- QA teams have actionable test recipes
- No redundancy (cross-references instead)
- Clear separation (what to deliver vs how to test)
**MCP Enhancement:** Exploratory mode (live browser UI discovery)

View File

@@ -197,7 +197,7 @@ output_folder: _bmad-output
```
**TEA Output Files:**
- `test-design-system.md` (from *test-design system-level)
- `test-design-architecture.md` + `test-design-qa.md` (from *test-design system-level - TWO documents)
- `test-design-epic-N.md` (from *test-design epic-level)
- `test-review.md` (from *test-review)
- `traceability-matrix.md` (from *trace Phase 1)

View File

@@ -15,7 +15,7 @@ By the end of this 30-minute tutorial, you'll have:
:::note[Prerequisites]
- Node.js installed (v20 or later)
- 30 minutes of focused time
- We'll use TodoMVC (<https://todomvc.com/examples/react/>) as our demo app
- We'll use TodoMVC (<https://todomvc.com/examples/react/dist/>) as our demo app
:::
:::tip[Quick Path]

View File

@@ -0,0 +1,350 @@
# ADR Quality Readiness Checklist
**Purpose:** Standardized 8-category, 29-criteria framework for evaluating system testability and NFR compliance during architecture review (Phase 3) and NFR assessment.
**When to Use:**
- System-level test design (Phase 3): Identify testability gaps in architecture
- NFR assessment workflow: Structured evaluation with evidence
- Gate decisions: Quantifiable criteria (X/29 met = PASS/CONCERNS/FAIL)
**How to Use:**
1. For each criterion, assess status: ✅ Covered / ⚠️ Gap / ⬜ Not Assessed
2. Document gap description if ⚠️
3. Describe risk if criterion unmet
4. Map to test scenarios (what tests validate this criterion)
---
## 1. Testability & Automation
**Question:** Can we verify this effectively without manual toil?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| 1.1 | **Isolation:** Can the service be tested with all downstream dependencies (DBs, APIs, Queues) mocked or stubbed? | Flaky tests; inability to test in isolation | P1: Service runs with mocked DB, P1: Service runs with mocked API, P2: Integration tests with real deps |
| 1.2 | **Headless Interaction:** Is 100% of the business logic accessible via API (REST/gRPC) to bypass the UI for testing? | Slow, brittle UI-based automation | P0: All core logic callable via API, P1: No UI dependency for critical paths |
| 1.3 | **State Control:** Do we have "Seeding APIs" or scripts to inject specific data states (e.g., "User with expired subscription") instantly? | Long setup times; inability to test edge cases | P0: Seed baseline data, P0: Inject edge case data states, P1: Cleanup after tests |
| 1.4 | **Sample Requests:** Are there valid and invalid cURL/JSON sample requests provided in the design doc for QA to build upon? | Ambiguity on how to consume the service | P1: Valid request succeeds, P1: Invalid request fails with clear error |
**Common Gaps:**
- No mock endpoints for external services (Athena, Milvus, third-party APIs)
- Business logic tightly coupled to UI (requires E2E tests for everything)
- No seeding APIs (manual database setup required)
- ADR has architecture diagrams but no sample API requests
**Mitigation Examples:**
- 1.1 (Isolation): Provide mock endpoints, dependency injection, interface abstractions
- 1.2 (Headless): Expose all business logic via REST/GraphQL APIs
- 1.3 (State Control): Implement `/api/test-data` seeding endpoints (dev/staging only)
- 1.4 (Sample Requests): Add "Example API Calls" section to ADR with cURL commands
---
## 2. Test Data Strategy
**Question:** How do we fuel our tests safely?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| 2.1 | **Segregation:** Does the design support multi-tenancy or specific headers (e.g., x-test-user) to keep test data out of prod metrics? | Skewed business analytics; data pollution | P0: Multi-tenant isolation (customer A ≠ customer B), P1: Test data excluded from prod metrics |
| 2.2 | **Generation:** Can we use synthetic data, or do we rely on scrubbing production data (GDPR/PII risk)? | Privacy violations; dependency on stale data | P0: Faker-based synthetic data, P1: No production data in tests |
| 2.3 | **Teardown:** Is there a mechanism to "reset" the environment or clean up data after destructive tests? | Environment rot; subsequent test failures | P0: Automated cleanup after tests, P2: Environment reset script |
**Common Gaps:**
- No `customer_id` scoping in queries (cross-tenant data leakage risk)
- Reliance on production data dumps (GDPR/PII violations)
- No cleanup mechanism (tests leave data behind, polluting environment)
**Mitigation Examples:**
- 2.1 (Segregation): Enforce `customer_id` in all queries, add test-specific headers
- 2.2 (Generation): Use Faker library, create synthetic data generators, prohibit prod dumps
- 2.3 (Teardown): Auto-cleanup hooks in test framework, isolated test customer IDs
---
## 3. Scalability & Availability
**Question:** Can it grow, and will it stay up?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| 3.1 | **Statelessness:** Is the service stateless? If not, how is session state replicated across instances? | Inability to auto-scale horizontally | P1: Service restart mid-request → no data loss, P2: Horizontal scaling under load |
| 3.2 | **Bottlenecks:** Have we identified the weakest link (e.g., database connections, API rate limits) under load? | System crash during peak traffic | P2: Load test identifies bottleneck, P2: Connection pool exhaustion handled |
| 3.3 | **SLA Definitions:** What is the target Availability (e.g., 99.9%) and does the architecture support redundancy to meet it? | Breach of contract; customer churn | P1: Availability target defined, P2: Redundancy validated (multi-region/zone) |
| 3.4 | **Circuit Breakers:** If a dependency fails, does this service fail fast or hang? | Cascading failures taking down the whole platform | P1: Circuit breaker opens on 5 failures, P1: Auto-reset after recovery, P2: Timeout prevents hanging |
**Common Gaps:**
- Stateful session management (can't scale horizontally)
- No load testing, bottlenecks unknown
- SLA undefined or unrealistic (99.99% without redundancy)
- No circuit breakers (cascading failures)
**Mitigation Examples:**
- 3.1 (Statelessness): Externalize session to Redis/JWT, design for horizontal scaling
- 3.2 (Bottlenecks): Load test with k6, monitor connection pools, identify weak links
- 3.3 (SLA): Define realistic SLA (99.9% = 43 min/month downtime), add redundancy
- 3.4 (Circuit Breakers): Implement circuit breakers (Hystrix pattern), fail fast on errors
---
## 4. Disaster Recovery (DR)
**Question:** What happens when the worst-case scenario occurs?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | ----------------------------------------------------------------------- |
| 4.1 | **RTO/RPO:** What is the Recovery Time Objective (how long to restore) and Recovery Point Objective (max data loss)? | Extended outages; data loss liability | P2: RTO defined and tested, P2: RPO validated (backup frequency) |
| 4.2 | **Failover:** Is region/zone failover automated or manual? Has it been practiced? | "Heroics" required during outages; human error | P2: Automated failover works, P2: Manual failover documented and tested |
| 4.3 | **Backups:** Are backups immutable and tested for restoration integrity? | Ransomware vulnerability; corrupted backups | P2: Backup restore succeeds, P2: Backup immutability validated |
**Common Gaps:**
- RTO/RPO undefined (no recovery plan)
- Failover never tested (manual process, prone to errors)
- Backups exist but restoration never validated (untested backups = no backups)
**Mitigation Examples:**
- 4.1 (RTO/RPO): Define RTO (e.g., 4 hours) and RPO (e.g., 1 hour), document recovery procedures
- 4.2 (Failover): Automate multi-region failover, practice failover drills quarterly
- 4.3 (Backups): Implement immutable backups (S3 versioning), test restore monthly
---
## 5. Security
**Question:** Is the design safe by default?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| 5.1 | **AuthN/AuthZ:** Does it implement standard protocols (OAuth2/OIDC)? Are permissions granular (Least Privilege)? | Unauthorized access; data leaks | P0: OAuth flow works, P0: Expired token rejected, P0: Insufficient permissions return 403, P1: Scope enforcement |
| 5.2 | **Encryption:** Is data encrypted at rest (DB) and in transit (TLS)? | Compliance violations; data theft | P1: Milvus data-at-rest encrypted, P1: TLS 1.2+ enforced, P2: Certificate rotation works |
| 5.3 | **Secrets:** Are API keys/passwords stored in a Vault (not in code or config files)? | Credentials leaked in git history | P1: No hardcoded secrets in code, P1: Secrets loaded from AWS Secrets Manager |
| 5.4 | **Input Validation:** Are inputs sanitized against Injection attacks (SQLi, XSS)? | System compromise via malicious payloads | P1: SQL injection sanitized, P1: XSS escaped, P2: Command injection prevented |
**Common Gaps:**
- Weak authentication (no OAuth, hardcoded API keys)
- No encryption at rest (plaintext in database)
- Secrets in git (API keys, passwords in config files)
- No input validation (vulnerable to SQLi, XSS, command injection)
**Mitigation Examples:**
- 5.1 (AuthN/AuthZ): Implement OAuth 2.1/OIDC, enforce least privilege, validate scopes
- 5.2 (Encryption): Enable TDE (Transparent Data Encryption), enforce TLS 1.2+
- 5.3 (Secrets): Migrate to AWS Secrets Manager/Vault, scan git history for leaks
- 5.4 (Input Validation): Sanitize all inputs, use parameterized queries, escape outputs
---
## 6. Monitorability, Debuggability & Manageability
**Question:** Can we operate and fix this in production?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| 6.1 | **Tracing:** Does the service propagate W3C Trace Context / Correlation IDs for distributed tracing? | Impossible to debug errors across microservices | P2: W3C Trace Context propagated (EventBridge → Lambda → Service), P2: Correlation ID in all logs |
| 6.2 | **Logs:** Can log levels (INFO vs DEBUG) be toggled dynamically without a redeploy? | Inability to diagnose issues in real-time | P2: Log level toggle works without redeploy, P2: Logs structured (JSON format) |
| 6.3 | **Metrics:** Does it expose RED metrics (Rate, Errors, Duration) for Prometheus/Datadog? | Flying blind regarding system health | P2: /metrics endpoint exposes RED metrics, P2: Prometheus/Datadog scrapes successfully |
| 6.4 | **Config:** Is configuration externalized? Can we change behavior without a code build? | Rigid system; full deploys needed for minor tweaks | P2: Config change without code build, P2: Feature flags toggle behavior |
**Common Gaps:**
- No distributed tracing (can't debug across microservices)
- Static log levels (requires redeploy to enable DEBUG)
- No metrics endpoint (blind to system health)
- Configuration hardcoded (requires full deploy for minor changes)
**Mitigation Examples:**
- 6.1 (Tracing): Implement W3C Trace Context, add correlation IDs to all logs
- 6.2 (Logs): Use dynamic log levels (environment variable), structured logging (JSON)
- 6.3 (Metrics): Expose /metrics endpoint, track RED metrics (Rate, Errors, Duration)
- 6.4 (Config): Externalize config (AWS SSM/AppConfig), use feature flags (LaunchDarkly)
---
## 7. QoS (Quality of Service) & QoE (Quality of Experience)
**Question:** How does it perform, and how does it feel?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------- |
| 7.1 | **Latency (QoS):** What are the P95 and P99 latency targets? | Slow API responses affecting throughput | P3: P95 latency <Xs (load test), P3: P99 latency <Ys (load test) |
| 7.2 | **Throttling (QoS):** Is there Rate Limiting to prevent "noisy neighbors" or DDoS? | Service degradation for all users due to one bad actor | P2: Rate limiting enforced, P2: 429 returned when limit exceeded |
| 7.3 | **Perceived Performance (QoE):** Does the UI show optimistic updates or skeletons while loading? | App feels sluggish to the user | P2: Skeleton/spinner shown while loading (E2E), P2: Optimistic updates (E2E) |
| 7.4 | **Degradation (QoE):** If the service is slow, does it show a friendly message or a raw stack trace? | Poor user trust; frustration | P2: Friendly error message shown (not stack trace), P1: Error boundary catches exceptions (E2E) |
**Common Gaps:**
- Latency targets undefined (no SLOs)
- No rate limiting (vulnerable to DDoS, noisy neighbors)
- Poor perceived performance (blank screen while loading)
- Raw error messages (stack traces exposed to users)
**Mitigation Examples:**
- 7.1 (Latency): Define SLOs (P95 <2s, P99 <5s), load test to validate
- 7.2 (Throttling): Implement rate limiting (per-user, per-IP), return 429 with Retry-After
- 7.3 (Perceived Performance): Add skeleton screens, optimistic updates, progressive loading
- 7.4 (Degradation): Implement error boundaries, show friendly messages, log stack traces server-side
---
## 8. Deployability
**Question:** How easily can we ship this?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
| --- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------------------------ |
| 8.1 | **Zero Downtime:** Does the design support Blue/Green or Canary deployments? | Maintenance windows required (downtime) | P2: Blue/Green deployment works, P2: Canary deployment gradual rollout |
| 8.2 | **Backward Compatibility:** Can we deploy the DB changes separately from the Code changes? | "Lock-step" deployments; high risk of breaking changes | P2: DB migration before code deploy, P2: Code handles old and new schema |
| 8.3 | **Rollback:** Is there an automated rollback trigger if Health Checks fail post-deploy? | Prolonged outages after a bad deploy | P2: Health check fails automated rollback, P2: Rollback completes within RTO |
**Common Gaps:**
- No zero-downtime strategy (requires maintenance window)
- Tight coupling between DB and code (lock-step deployments)
- No automated rollback (manual intervention required)
**Mitigation Examples:**
- 8.1 (Zero Downtime): Implement Blue/Green or Canary deployments, use feature flags
- 8.2 (Backward Compatibility): Separate DB migrations from code deploys, support N-1 schema
- 8.3 (Rollback): Automate rollback on health check failures, test rollback procedures
---
## Usage in Test Design Workflow
**System-Level Mode (Phase 3):**
**In test-design-architecture.md:**
- Add "NFR Testability Requirements" section after ASRs
- Use 8 categories with checkboxes (29 criteria)
- For each criterion: Status (⬜ Not Assessed, Gap, Covered), Gap description, Risk if unmet
- Example:
```markdown
## NFR Testability Requirements
**Based on ADR Quality Readiness Checklist**
### 1. Testability & Automation
Can we verify this effectively without manual toil?
| Criterion | Status | Gap/Requirement | Risk if Unmet |
| --------------------------------------------------------------- | -------------- | ------------------------------------ | --------------------------------------- |
| ⬜ Isolation: Can service be tested with downstream deps mocked? | ⚠️ Gap | No mock endpoints for Athena queries | Flaky tests; can't test in isolation |
| ⬜ Headless: 100% business logic accessible via API? | ✅ Covered | All MCP tools are REST APIs | N/A |
| ⬜ State Control: Seeding APIs to inject data states? | ⚠️ Gap | Need `/api/test-data` endpoints | Long setup times; can't test edge cases |
| ⬜ Sample Requests: Valid/invalid cURL/JSON samples provided? | ⬜ Not Assessed | Pending ADR Tool schemas finalized | Ambiguity on how to consume service |
**Actions Required:**
- [ ] Backend: Implement mock endpoints for Athena (R-002 blocker)
- [ ] Backend: Implement `/api/test-data` seeding APIs (R-002 blocker)
- [ ] PM: Finalize ADR Tool schemas with sample requests (Q4)
```
**In test-design-qa.md:**
- Map each criterion to test scenarios
- Add "NFR Test Coverage Plan" section with P0/P1/P2 priority for each category
- Reference Architecture doc gaps
- Example:
```markdown
## NFR Test Coverage Plan
**Based on ADR Quality Readiness Checklist**
### 1. Testability & Automation (4 criteria)
**Prerequisites from Architecture doc:**
- [ ] R-002: Test data seeding APIs implemented (blocker)
- [ ] Mock endpoints available for Athena queries
| Criterion | Test Scenarios | Priority | Test Count | Owner |
| ------------------------------- | -------------------------------------------------------------------- | -------- | ---------- | ---------------- |
| Isolation: Mock downstream deps | Mock Athena queries, Mock Milvus, Service runs isolated | P1 | 3 | Backend Dev + QA |
| Headless: API-accessible logic | All MCP tools callable via REST, No UI dependency for business logic | P0 | 5 | QA |
| State Control: Seeding APIs | Create test customer, Seed 1000 transactions, Inject edge cases | P0 | 4 | QA |
| Sample Requests: cURL examples | Valid request succeeds, Invalid request fails with clear error | P1 | 2 | QA |
**Detailed Test Scenarios:**
- [ ] Isolation: Service runs with Athena mocked (returns fixture data)
- [ ] Isolation: Service runs with Milvus mocked (returns ANN fixture)
- [ ] State Control: Seed test customer with 1000 baseline transactions
- [ ] State Control: Inject edge case (expired subscription user)
```
---
## Usage in NFR Assessment Workflow
**Output Structure:**
```markdown
# NFR Assessment: {Feature Name}
**Based on ADR Quality Readiness Checklist (8 categories, 29 criteria)**
## Assessment Summary
| Category | Status | Criteria Met | Evidence | Next Action |
| ----------------------------- | ---------- | ------------ | -------------------------------------- | -------------------- |
| 1. Testability & Automation | ⚠️ CONCERNS | 2/4 | Mock endpoints missing | Implement R-002 |
| 2. Test Data Strategy | ✅ PASS | 3/3 | Faker + auto-cleanup | None |
| 3. Scalability & Availability | ⚠️ CONCERNS | 1/4 | SLA undefined | Define SLA |
| 4. Disaster Recovery | ⚠️ CONCERNS | 0/3 | No RTO/RPO defined | Define recovery plan |
| 5. Security | ✅ PASS | 4/4 | OAuth 2.1 + TLS + Vault + Sanitization | None |
| 6. Monitorability | ⚠️ CONCERNS | 2/4 | No metrics endpoint | Add /metrics |
| 7. QoS & QoE | ⚠️ CONCERNS | 1/4 | Latency targets undefined | Define SLOs |
| 8. Deployability | ✅ PASS | 3/3 | Blue/Green + DB migrations + Rollback | None |
**Overall:** 14/29 criteria met (48%) → ⚠️ CONCERNS
**Gate Decision:** CONCERNS (requires mitigation plan before GA)
---
## Detailed Assessment
### 1. Testability & Automation (2/4 criteria met)
**Question:** Can we verify this effectively without manual toil?
| Criterion | Status | Evidence | Gap/Action |
| --------------------------- | ------ | ------------------------ | ------------------------ |
| ⬜ Isolation: Mock deps | ⚠️ | No Athena mock | Implement mock endpoints |
| ⬜ Headless: API-accessible | ✅ | All MCP tools are REST | N/A |
| ⬜ State Control: Seeding | ⚠️ | `/api/test-data` pending | Sprint 0 blocker |
| ⬜ Sample Requests: Examples | ⬜ | Pending schemas | Finalize ADR Tools |
**Overall Status:** ⚠️ CONCERNS (2/4 criteria met)
**Next Actions:**
- [ ] Backend: Implement Athena mock endpoints (Sprint 0)
- [ ] Backend: Implement `/api/test-data` (Sprint 0)
- [ ] PM: Finalize sample requests (Sprint 1)
{Repeat for all 8 categories}
```
---
## Benefits
**For test-design workflow:**
- Standard NFR structure (same 8 categories every project)
- Clear testability requirements for Architecture team
- Direct mapping: criterion requirement test scenario
- Comprehensive coverage (29 criteria = no blind spots)
**For nfr-assess workflow:**
- Structured assessment (not ad-hoc)
- Quantifiable (X/29 criteria met)
- Evidence-based (each criterion has evidence field)
- Actionable (gaps next actions with owners)
**For Architecture teams:**
- Clear checklist (29 yes/no questions)
- Risk-aware (each criterion has "risk if unmet")
- Scoped work (only implement what's needed, not everything)
**For QA teams:**
- Comprehensive test coverage (29 criteria test scenarios)
- Clear priorities (P0 for security/isolation, P1 for monitoring, etc.)
- No ambiguity (each criterion has specific test scenarios)

View File

@@ -32,3 +32,4 @@ burn-in,Burn-in Runner,"Smart test selection, git diff for CI optimization","ci,
network-error-monitor,Network Error Monitor,"HTTP 4xx/5xx detection for UI tests","monitoring,playwright-utils,ui",knowledge/network-error-monitor.md
fixtures-composition,Fixtures Composition,"mergeTests composition patterns for combining utilities","fixtures,playwright-utils",knowledge/fixtures-composition.md
api-testing-patterns,API Testing Patterns,"Pure API test patterns without browser: service testing, microservices, GraphQL","api,backend,service-testing,api-testing,microservices,graphql,no-browser",knowledge/api-testing-patterns.md
adr-quality-readiness-checklist,ADR Quality Readiness Checklist,"8-category 29-criteria framework for ADR testability and NFR assessment","nfr,testability,adr,quality,assessment,checklist",knowledge/adr-quality-readiness-checklist.md
1 id name description tags fragment_file
32 network-error-monitor Network Error Monitor HTTP 4xx/5xx detection for UI tests monitoring,playwright-utils,ui knowledge/network-error-monitor.md
33 fixtures-composition Fixtures Composition mergeTests composition patterns for combining utilities fixtures,playwright-utils knowledge/fixtures-composition.md
34 api-testing-patterns API Testing Patterns Pure API test patterns without browser: service testing, microservices, GraphQL api,backend,service-testing,api-testing,microservices,graphql,no-browser knowledge/api-testing-patterns.md
35 adr-quality-readiness-checklist ADR Quality Readiness Checklist 8-category 29-criteria framework for ADR testability and NFR assessment nfr,testability,adr,quality,assessment,checklist knowledge/adr-quality-readiness-checklist.md

View File

@@ -51,7 +51,7 @@ This workflow performs a comprehensive assessment of non-functional requirements
**Actions:**
1. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
- `nfr-criteria.md` - Non-functional requirements criteria and thresholds (security, performance, reliability, maintainability with code examples, 658 lines, 4 examples)
- `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, test data, scalability, DR, security, monitorability, QoS/QoE, deployability, ~450 lines)
- `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation (10-iteration detection, sharding, selective execution, 678 lines, 4 examples)
- `test-quality.md` - Test quality expectations for maintainability (deterministic, isolated, explicit assertions, length/time limits, 658 lines, 5 examples)
- `playwright-config.md` - Performance configuration patterns: parallelization, timeout standards, artifact output (722 lines, 5 examples)
@@ -75,13 +75,17 @@ This workflow performs a comprehensive assessment of non-functional requirements
**Actions:**
1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
- **Performance**: Response time, throughput, resource usage
- **Security**: Authentication, authorization, data protection, vulnerability scanning
- **Reliability**: Error handling, recovery, availability, fault tolerance
- **Maintainability**: Code quality, test coverage, documentation, technical debt
1. Determine which NFR categories to assess using ADR Quality Readiness Checklist (8 standard categories):
- **1. Testability & Automation**: Isolation, headless interaction, state control, sample requests (4 criteria)
- **2. Test Data Strategy**: Segregation, generation, teardown (3 criteria)
- **3. Scalability & Availability**: Statelessness, bottlenecks, SLA definitions, circuit breakers (4 criteria)
- **4. Disaster Recovery**: RTO/RPO, failover, backups (3 criteria)
- **5. Security**: AuthN/AuthZ, encryption, secrets, input validation (4 criteria)
- **6. Monitorability, Debuggability & Manageability**: Tracing, logs, metrics, config (4 criteria)
- **7. QoS & QoE**: Latency, throttling, perceived performance, degradation (4 criteria)
- **8. Deployability**: Zero downtime, backward compatibility, rollback (3 criteria)
2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance) beyond the 8 standard categories
3. Gather thresholds for each NFR:
- From tech-spec.md (primary source)

View File

@@ -355,13 +355,24 @@ Note: This assessment summarizes existing evidence; it does not run tests or CI
## Findings Summary
| Category | PASS | CONCERNS | FAIL | Overall Status |
| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
| Performance | {P_PASS_COUNT} | {P_CONCERNS_COUNT} | {P_FAIL_COUNT} | {P_STATUS} {P_ICON} |
| Security | {S_PASS_COUNT} | {S_CONCERNS_COUNT} | {S_FAIL_COUNT} | {S_STATUS} {S_ICON} |
| Reliability | {R_PASS_COUNT} | {R_CONCERNS_COUNT} | {R_FAIL_COUNT} | {R_STATUS} {R_ICON} |
| Maintainability | {M_PASS_COUNT} | {M_CONCERNS_COUNT} | {M_FAIL_COUNT} | {M_STATUS} {M_ICON} |
| **Total** | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
**Based on ADR Quality Readiness Checklist (8 categories, 29 criteria)**
| Category | Criteria Met | PASS | CONCERNS | FAIL | Overall Status |
|----------|--------------|------|----------|------|----------------|
| 1. Testability & Automation | {T_MET}/4 | {T_PASS} | {T_CONCERNS} | {T_FAIL} | {T_STATUS} {T_ICON} |
| 2. Test Data Strategy | {TD_MET}/3 | {TD_PASS} | {TD_CONCERNS} | {TD_FAIL} | {TD_STATUS} {TD_ICON} |
| 3. Scalability & Availability | {SA_MET}/4 | {SA_PASS} | {SA_CONCERNS} | {SA_FAIL} | {SA_STATUS} {SA_ICON} |
| 4. Disaster Recovery | {DR_MET}/3 | {DR_PASS} | {DR_CONCERNS} | {DR_FAIL} | {DR_STATUS} {DR_ICON} |
| 5. Security | {SEC_MET}/4 | {SEC_PASS} | {SEC_CONCERNS} | {SEC_FAIL} | {SEC_STATUS} {SEC_ICON} |
| 6. Monitorability, Debuggability & Manageability | {MON_MET}/4 | {MON_PASS} | {MON_CONCERNS} | {MON_FAIL} | {MON_STATUS} {MON_ICON} |
| 7. QoS & QoE | {QOS_MET}/4 | {QOS_PASS} | {QOS_CONCERNS} | {QOS_FAIL} | {QOS_STATUS} {QOS_ICON} |
| 8. Deployability | {DEP_MET}/3 | {DEP_PASS} | {DEP_CONCERNS} | {DEP_FAIL} | {DEP_STATUS} {DEP_ICON} |
| **Total** | **{TOTAL_MET}/29** | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
**Criteria Met Scoring:**
- ≥26/29 (90%+) = Strong foundation
- 20-25/29 (69-86%) = Room for improvement
- <20/29 (<69%) = Significant gaps
---
@@ -372,11 +383,16 @@ nfr_assessment:
date: '{DATE}'
story_id: '{STORY_ID}'
feature_name: '{FEATURE_NAME}'
adr_checklist_score: '{TOTAL_MET}/29' # ADR Quality Readiness Checklist
categories:
performance: '{PERFORMANCE_STATUS}'
security: '{SECURITY_STATUS}'
reliability: '{RELIABILITY_STATUS}'
maintainability: '{MAINTAINABILITY_STATUS}'
testability_automation: '{T_STATUS}'
test_data_strategy: '{TD_STATUS}'
scalability_availability: '{SA_STATUS}'
disaster_recovery: '{DR_STATUS}'
security: '{SEC_STATUS}'
monitorability: '{MON_STATUS}'
qos_qoe: '{QOS_STATUS}'
deployability: '{DEP_STATUS}'
overall_status: '{OVERALL_STATUS}'
critical_issues: { CRITICAL_COUNT }
high_priority_issues: { HIGH_COUNT }

View File

@@ -1,10 +1,17 @@
# Test Design and Risk Assessment - Validation Checklist
## Prerequisites
## Prerequisites (Mode-Dependent)
**System-Level Mode (Phase 3):**
- [ ] PRD exists with functional and non-functional requirements
- [ ] ADR (Architecture Decision Record) exists
- [ ] Architecture document available (architecture.md or tech-spec)
- [ ] Requirements are testable and unambiguous
**Epic-Level Mode (Phase 4):**
- [ ] Story markdown with clear acceptance criteria exists
- [ ] PRD or epic documentation available
- [ ] Architecture documents available (optional)
- [ ] Architecture documents available (test-design-architecture.md + test-design-qa.md from Phase 3, if exists)
- [ ] Requirements are testable and unambiguous
## Process Steps
@@ -157,6 +164,80 @@
- [ ] Risk assessment informs `gate` workflow criteria
- [ ] Integrates with `ci` workflow execution order
## System-Level Mode: Two-Document Validation
**When in system-level mode (PRD + ADR input), validate BOTH documents:**
### test-design-architecture.md
- [ ] **Purpose statement** at top (serves as contract with Architecture team)
- [ ] **Executive Summary** with scope, business context, architecture decisions, risk summary
- [ ] **Quick Guide** section with three tiers:
- [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
- [ ] HIGH PRIORITY - Team Should Validate (recommendations for approval)
- [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
- [ ] **Risk Assessment** section
- [ ] Total risks identified count
- [ ] High-priority risks table (score 6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
- [ ] Medium and low-priority risks tables
- [ ] Risk category legend included
- [ ] **Testability Concerns** section (if system has architectural constraints)
- [ ] Blockers to fast feedback table
- [ ] Explanation of why standard CI/CD may not apply (if applicable)
- [ ] Tiered testing strategy table (if forced by architecture)
- [ ] Architectural improvements needed (or acknowledgment system supports testing well)
- [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
- [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
- [ ] **Assumptions and Dependencies** section
- [ ] Assumptions list (numbered)
- [ ] Dependencies list with required dates
- [ ] Risks to plan with impact and contingency
- [ ] **NO test implementation code** (long examples belong in QA doc)
- [ ] **NO test scenario checklists** (belong in QA doc)
- [ ] **Cross-references to QA doc** where appropriate
### test-design-qa.md
- [ ] **Purpose statement** at top (execution recipe for QA team)
- [ ] **Quick Reference for QA** section
- [ ] Before You Start checklist
- [ ] Test Execution Order
- [ ] Need Help? guidance
- [ ] **System Architecture Summary** (brief overview of services and data flow)
- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
- [ ] Table with Local/Dev/Staging environments
- [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
- [ ] Code example provided
- [ ] **Testability Assessment** with prerequisites checklist
- [ ] References Architecture doc blockers (not duplication)
- [ ] **Test Levels Strategy** with unit/integration/E2E split
- [ ] System type identified
- [ ] Recommended split percentages with rationale
- [ ] Test count summary (P0/P1/P2/P3 totals)
- [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
- [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
- [ ] Detailed test scenarios WITH CHECKBOXES
- [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
- [ ] **Sprint 0 Setup Requirements**
- [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
- [ ] QA Test Infrastructure section (factories, fixtures)
- [ ] Test Environments section (Local, CI/CD, Staging, Production)
- [ ] Sprint 0 NFR Gates checklist
- [ ] Sprint 1 Items clearly separated
- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
- [ ] Table with NFR categories, status, evidence, blocker, next action
- [ ] **Cross-references to Architecture doc** (not duplication)
- [ ] **NO architectural theory** (just reference Architecture doc)
### Cross-Document Consistency
- [ ] Both documents reference same risks by ID (R-001, R-002, etc.)
- [ ] Both documents use consistent priority levels (P0, P1, P2, P3)
- [ ] Both documents reference same Sprint 0 blockers
- [ ] No duplicate content (cross-reference instead)
- [ ] Dates and authors match across documents
- [ ] ADR and PRD references consistent
## Completion Criteria
**All must be true:**
@@ -166,7 +247,9 @@
- [ ] All output validations passed
- [ ] All quality checks passed
- [ ] All integration points verified
- [ ] Output file complete and well-formatted
- [ ] Output file(s) complete and well-formatted
- [ ] **System-level mode:** Both documents validated (if applicable)
- [ ] **Epic-level mode:** Single document validated (if applicable)
- [ ] Team review scheduled (if required)
## Post-Workflow Actions

View File

@@ -22,28 +22,61 @@ The workflow auto-detects which mode to use based on project phase.
**Critical:** Determine mode before proceeding.
### Mode Detection
### Mode Detection (Flexible for Standalone Use)
1. **Check for sprint-status.yaml**
- If `{implementation_artifacts}/sprint-status.yaml` exists → **Epic-Level Mode** (Phase 4)
- If NOT exists → Check workflow status
TEA test-design workflow supports TWO modes, detected automatically:
2. **Mode-Specific Requirements**
1. **Check User Intent Explicitly (Priority 1)**
- Did user provide PRD + ADR? → **System-Level Mode**
- Did user provide Epic + Stories? → **Epic-Level Mode**
- If user intent is clear, use that mode regardless of file structure
**System-Level Mode (Phase 3 - Testability Review):**
- ✅ Architecture document exists (architecture.md or tech-spec)
- ✅ PRD exists with functional and non-functional requirements
- ✅ Epics documented (epics.md)
- ⚠️ Output: `{output_folder}/test-design-system.md`
2. **Fallback to File-Based Detection (Priority 2 - BMad-Integrated)**
- Check for `{implementation_artifacts}/sprint-status.yaml`
- If exists **Epic-Level Mode** (Phase 4, single document output)
- If NOT exists → **System-Level Mode** (Phase 3, TWO document outputs)
**Epic-Level Mode (Phase 4 - Per-Epic Planning):**
- ✅ Story markdown with acceptance criteria available
- ✅ PRD or epic documentation exists for context
- ✅ Architecture documents available (optional but recommended)
- ✅ Requirements are clear and testable
- ⚠️ Output: `{output_folder}/test-design-epic-{epic_num}.md`
3. **If Ambiguous, ASK USER (Priority 3)**
- "I see you have [PRD/ADR/Epic/Stories]. Should I create:
- (A) System-level test design (PRD + ADR → Architecture doc + QA doc)?
- (B) Epic-level test design (Epic → Single test plan)?"
**Halt Condition:** If mode cannot be determined or required files missing, HALT and notify user with missing prerequisites.
**Mode Descriptions:**
**System-Level Mode (PRD + ADR Input)**
- **When to use:** Early in project (Phase 3 Solutioning), architecture being designed
- **Input:** PRD, ADR, architecture.md (optional)
- **Output:** TWO documents
- `test-design-architecture.md` (for Architecture/Dev teams)
- `test-design-qa.md` (for QA team)
- **Focus:** Testability assessment, ASRs, NFR requirements, Sprint 0 setup
**Epic-Level Mode (Epic + Stories Input)**
- **When to use:** During implementation (Phase 4), per-epic planning
- **Input:** Epic, Stories, tech-specs (optional)
- **Output:** ONE document
- `test-design-epic-{N}.md` (combined risk assessment + test plan)
- **Focus:** Risk assessment, coverage plan, execution order, quality gates
**Key Insight: TEA Works Standalone OR Integrated**
**Standalone (No BMad artifacts):**
- User provides PRD + ADR → System-Level Mode
- User provides Epic description → Epic-Level Mode
- TEA doesn't mandate full BMad workflow
**BMad-Integrated (Full workflow):**
- BMad creates `sprint-status.yaml` → Automatic Epic-Level detection
- BMad creates PRD, ADR, architecture.md → Automatic System-Level detection
- TEA leverages BMad artifacts for richer context
**Message to User:**
> You don't need to follow full BMad methodology to use TEA test-design.
> Just provide PRD + ADR for system-level, or Epic for epic-level.
> TEA will auto-detect and produce appropriate documents.
**Halt Condition:** If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user:
- "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design"
---
@@ -70,7 +103,7 @@ The workflow auto-detects which mode to use based on project phase.
3. **Load Knowledge Base Fragments (System-Level)**
**Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
- `nfr-criteria.md` - NFR validation approach (security, performance, reliability, maintainability)
- `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.)
- `test-levels-framework.md` - Test levels strategy guidance
- `risk-governance.md` - Testability risk identification
- `test-quality.md` - Quality standards and Definition of Done
@@ -91,7 +124,7 @@ The workflow auto-detects which mode to use based on project phase.
2. **Load Architecture Context**
- Read architecture.md for system design
- Read tech-spec for implementation details
- Read test-design-system.md (if exists from Phase 3)
- Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design)
- Identify technical constraints and dependencies
- Note integration points and external systems
@@ -173,50 +206,128 @@ The workflow auto-detects which mode to use based on project phase.
**Critical:** If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.
6. **Output System-Level Test Design**
6. **Output System-Level Test Design (TWO Documents)**
Write to `{output_folder}/test-design-system.md` containing:
**IMPORTANT:** System-level mode produces TWO documents instead of one:
**Document 1: test-design-architecture.md** (for Architecture/Dev teams)
- Purpose: Architectural concerns, testability gaps, NFR requirements
- Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers
- Focus: What architecture must deliver for testability
- Template: `test-design-architecture-template.md`
**Document 2: test-design-qa.md** (for QA team)
- Purpose: Test execution recipe, coverage plan, Sprint 0 setup
- Audience: QA Engineers, Test Automation Engineers, QA Leads
- Focus: How QA will execute tests
- Template: `test-design-qa-template.md`
**Standard Structures (REQUIRED):**
**test-design-architecture.md sections (in this order):**
1. Executive Summary (scope, business context, architecture, risk summary)
2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
3. Risk Assessment (high/medium/low-priority risks with scoring)
4. Testability Concerns and Architectural Gaps (if system has constraints)
5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
6. Assumptions and Dependencies
**test-design-qa.md sections (in this order):**
1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
2. System Architecture Summary (brief overview)
3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
4. Testability Assessment (lightweight prerequisites checklist)
5. Test Levels Strategy (unit/integration/E2E split with rationale)
6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
8. NFR Readiness Summary (reference to Architecture doc)
**Content Guidelines:**
**Architecture doc (DO):**
- ✅ Risk scoring visible (Probability × Impact = Score)
- ✅ Clear ownership (each blocker/ASR has owner + timeline)
- ✅ Testability requirements (what architecture must support)
- ✅ Mitigation plans (for each high-risk item ≥6)
- ✅ Short code examples (5-10 lines max showing what to support)
**Architecture doc (DON'T):**
- ❌ NO long test code examples (belongs in QA doc)
- ❌ NO test scenario checklists (belongs in QA doc)
- ❌ NO implementation details (how QA will test)
**QA doc (DO):**
- ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
- ✅ Environment setup (Sprint 0 checklist with blockers)
- ✅ Tool setup (factories, fixtures, frameworks)
- ✅ Cross-references to Architecture doc (not duplication)
**QA doc (DON'T):**
- ❌ NO architectural theory (just reference Architecture doc)
- ❌ NO ASR explanations (link to Architecture doc instead)
- ❌ NO duplicate risk assessments (reference Architecture doc)
**Anti-Patterns to Avoid (Cross-Document Redundancy):**
**DON'T duplicate OAuth requirements:**
- Architecture doc: Explain OAuth 2.1 flow in detail
- QA doc: Re-explain why OAuth 2.1 is required
**DO cross-reference instead:**
- Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
- QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
**Markdown Cross-Reference Syntax Examples:**
```markdown
# System-Level Test Design
# In test-design-architecture.md
### 🚨 R-001: Multi-Tenant Isolation (Score: 9)
**Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests---security-critical) for detailed scenarios)
---
# In test-design-qa.md
## Testability Assessment
- Controllability: [PASS/CONCERNS/FAIL with details]
- Observability: [PASS/CONCERNS/FAIL with details]
- Reliability: [PASS/CONCERNS/FAIL with details]
**Prerequisites from Architecture Doc:**
- [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#-r-001-multi-tenant-isolation-score-9) for mitigation plan)
- [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#-blockers---team-must-decide-cant-proceed-without))
## Architecturally Significant Requirements (ASRs)
## Sprint 0 Setup Requirements
[Risk-scored quality requirements]
## Test Levels Strategy
- Unit: [X%] - [Rationale]
- Integration: [Y%] - [Rationale]
- E2E: [Z%] - [Rationale]
## NFR Testing Approach
- Security: [Approach with tools]
- Performance: [Approach with tools]
- Reliability: [Approach with tools]
- Maintainability: [Approach with tools]
## Test Environment Requirements
[Infrastructure needs based on deployment architecture]
## Testability Concerns (if any)
[Blockers or concerns that should inform solutioning gate check]
## Recommendations for Sprint 0
[Specific actions for *framework and *ci workflows]
**Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plans
```
**After System-Level Mode:** Skip to Step 4 (Generate Deliverables) - Steps 2-3 are epic-level only.
**Key Points:**
- Use relative links: `[Link Text](test-design-qa.md#section-anchor)`
- Anchor format: lowercase, hyphens for spaces, remove emojis/special chars
- Example anchor: `### 🚨 R-001: Title` → `#-r-001-title`
❌ **DON'T put long code examples in Architecture doc:**
- Example: 50+ lines of test implementation
✅ **DO keep examples SHORT in Architecture doc:**
- Example: 5-10 lines max showing what architecture must support
- Full implementation goes in QA doc
❌ **DON'T repeat same note 10+ times:**
- Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section
✅ **DO consolidate repeated notes:**
- Single timing note at top
- Reference briefly throughout: "(pessimistic)"
**Write Both Documents:**
- Use `test-design-architecture-template.md` for Architecture doc
- Use `test-design-qa-template.md` for QA doc
- Follow standard structures defined above
- Cross-reference between docs (no duplication)
- Validate against checklist.md (System-Level Mode section)
**After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
---

View File

@@ -0,0 +1,216 @@
# Test Design for Architecture: {Feature Name}
**Purpose:** Architectural concerns, testability gaps, and NFR requirements for review by Architecture/Dev teams. Serves as a contract between QA and Engineering on what must be addressed before test development begins.
**Date:** {date}
**Author:** {author}
**Status:** Architecture Review Pending
**Project:** {project_name}
**PRD Reference:** {prd_link}
**ADR Reference:** {adr_link}
---
## Executive Summary
**Scope:** {Brief description of feature scope}
**Business Context** (from PRD):
- **Revenue/Impact:** {Business metrics if applicable}
- **Problem:** {Problem being solved}
- **GA Launch:** {Target date or timeline}
**Architecture** (from ADR {adr_number}):
- **Key Decision 1:** {e.g., OAuth 2.1 authentication}
- **Key Decision 2:** {e.g., Centralized MCP Server pattern}
- **Key Decision 3:** {e.g., Stack: TypeScript, SDK v1.x}
**Expected Scale** (from ADR):
- {RPS, volume, users, etc.}
**Risk Summary:**
- **Total risks**: {N}
- **High-priority (≥6)**: {N} risks requiring immediate mitigation
- **Test effort**: ~{N} tests (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
---
## Quick Guide
### 🚨 BLOCKERS - Team Must Decide (Can't Proceed Without)
**Sprint 0 Critical Path** - These MUST be completed before QA can write integration tests:
1. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
2. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
3. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
**What we need from team:** Complete these {N} items in Sprint 0 or test development is blocked.
---
### ⚠️ HIGH PRIORITY - Team Should Validate (We Provide Recommendation, You Approve)
1. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
2. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
3. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
**What we need from team:** Review recommendations and approve (or suggest changes).
---
### 📋 INFO ONLY - Solutions Provided (Review, No Decisions Needed)
1. **Test strategy**: {Test level split} ({Rationale})
2. **Tooling**: {Test frameworks and utilities}
3. **Tiered CI/CD**: {Execution tiers with timing}
4. **Coverage**: ~{N} test scenarios prioritized P0-P3 with risk-based classification
5. **Quality gates**: {Pass criteria}
**What we need from team:** Just review and acknowledge (we already have the solution).
---
## For Architects and Devs - Open Topics 👷
### Risk Assessment
**Total risks identified**: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
#### High-Priority Risks (Score ≥6) - IMMEDIATE ATTENTION
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner | Timeline |
|---------|----------|-------------|-------------|--------|-------|------------|-------|----------|
| **{R-ID}** | **{CAT}** | {Description} | {1-3} | {1-3} | **{Score}** | {Mitigation strategy} | {Owner} | {Date} |
#### Medium-Priority Risks (Score 3-4)
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner |
|---------|----------|-------------|-------------|--------|-------|------------|-------|
| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | {Mitigation} | {Owner} |
#### Low-Priority Risks (Score 1-2)
| Risk ID | Category | Description | Probability | Impact | Score | Action |
|---------|----------|-------------|-------------|--------|-------|--------|
| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | Monitor |
#### Risk Category Legend
- **TECH**: Technical/Architecture (flaws, integration, scalability)
- **SEC**: Security (access controls, auth, data exposure)
- **PERF**: Performance (SLA violations, degradation, resource limits)
- **DATA**: Data Integrity (loss, corruption, inconsistency)
- **BUS**: Business Impact (UX harm, logic errors, revenue)
- **OPS**: Operations (deployment, config, monitoring)
---
### Testability Concerns and Architectural Gaps
**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
#### Blockers to Fast Feedback
| Blocker | Impact | Current Mitigation | Ideal Solution |
|---------|--------|-------------------|----------------|
| **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
#### Why This Matters
**Standard CI/CD expectations:**
- Full test suite on every commit (~5-15 min feedback)
- Parallel test execution (isolated test data per worker)
- Ephemeral test environments (spin up → test → tear down)
- Fast feedback loop (devs stay in flow state)
**Current reality for {Feature}:**
- {Actual situation - what's different from standard}
#### Tiered Testing Strategy
{If forced by architecture, explain. If standard approach works, state that.}
| Tier | When | Duration | Coverage | Why Not Full Suite? |
|------|------|----------|----------|---------------------|
| **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
| **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
| **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
| **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
**Note**: {Any timing assumptions or constraints}
#### Architectural Improvements Needed
{If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
1. **{Improvement name}**
- {What to change}
- **Impact**: {How it improves testing}
#### Acceptance of Trade-offs
For {Feature} Phase 1, the team accepts:
- **{Trade-off 1}** ({Reasoning})
- **{Trade-off 2}** ({Reasoning})
- **{Known limitation}** ({Why acceptable for now})
This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
---
### Risk Mitigation Plans (High-Priority Risks ≥6)
**Purpose**: Detailed mitigation strategies for all {N} high-priority risks (score 6). These risks MUST be addressed before {GA launch date or milestone}.
#### {R-ID}: {Risk Description} (Score: {Score}) - {CRITICALITY LEVEL}
**Mitigation Strategy:**
1. {Step 1}
2. {Step 2}
3. {Step 3}
**Owner:** {Owner}
**Timeline:** {Sprint or date}
**Status:** Planned / In Progress / Complete
**Verification:** {How to verify mitigation is effective}
---
{Repeat for all high-priority risks}
---
### Assumptions and Dependencies
#### Assumptions
1. {Assumption about architecture or requirements}
2. {Assumption about team or timeline}
3. {Assumption about scope or constraints}
#### Dependencies
1. {Dependency} - Required by {date/sprint}
2. {Dependency} - Required by {date/sprint}
#### Risks to Plan
- **Risk**: {Risk to the test plan itself}
- **Impact**: {How it affects testing}
- **Contingency**: {Backup plan}
---
**End of Architecture Document**
**Next Steps for Architecture Team:**
1. Review Quick Guide (🚨/⚠/📋) and prioritize blockers
2. Assign owners and timelines for high-priority risks (≥6)
3. Validate assumptions and dependencies
4. Provide feedback to QA on testability gaps
**Next Steps for QA Team:**
1. Wait for Sprint 0 blockers to be resolved
2. Refer to companion QA doc (test-design-qa.md) for test scenarios
3. Begin test infrastructure setup (factories, fixtures, environments)

View File

@@ -0,0 +1,315 @@
# Test Design for QA: {Feature Name}
**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
**Date:** {date}
**Author:** {author}
**Status:** Draft / Ready for Implementation
**Project:** {project_name}
**PRD Reference:** {prd_link}
**ADR Reference:** {adr_link}
---
## Quick Reference for QA
**Before You Start:**
- [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
- [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
- [ ] Confirm test infrastructure ready (factories, fixtures, environments)
**Test Execution Order:**
1. **Smoke tests** (<5 min) - Fast feedback on critical paths
2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
**Need Help?**
- Blockers: See Architecture doc "Quick Guide" for mitigation plans
- Test scenarios: See "Test Coverage Plan" section below
- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
---
## System Architecture Summary
**Data Pipeline:**
{Brief description of system flow}
**Key Services:**
- **{Service 1}**: {Purpose and key responsibilities}
- **{Service 2}**: {Purpose and key responsibilities}
- **{Service 3}**: {Purpose and key responsibilities}
**Data Stores:**
- **{Database 1}**: {What it stores}
- **{Database 2}**: {What it stores}
**Expected Scale** (from ADR):
- {Key metrics: RPS, volume, users, etc.}
---
## Test Environment Requirements
**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
| Environment | Database | Test Data Strategy | Purpose |
|-------------|----------|-------------------|---------|
| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
| **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
**Key Principles:**
- **Shared database per environment** (no ephemeral)
- **Randomization for isolation** (faker-based unique IDs)
- **Parallel-safe** (concurrent test runs don't conflict)
- **Self-cleaning** (tests delete their own data)
- **Shift-left** (test against real DBs early)
**Example:**
```typescript
import { faker } from "@faker-js/faker";
test("example with randomized test data @p0", async ({ apiRequest }) => {
const testData = {
id: `test-${faker.string.uuid()}`,
customerId: `test-customer-${faker.string.alphanumeric(8)}`,
// ... unique test data
};
// Seed, test, cleanup
});
```
---
## Testability Assessment
**Prerequisites from Architecture Doc:**
Verify these blockers are resolved before test development:
- [ ] {Blocker 1} (see Architecture doc Quick Guide 🚨 BLOCKERS)
- [ ] {Blocker 2}
- [ ] {Blocker 3}
**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
---
## Test Levels Strategy
**System Type:** {API-heavy / UI-heavy / Mixed backend system}
**Recommended Split:**
- **Unit Tests: {X}%** - {What to unit test}
- **Integration/API Tests: {X}%** - **PRIMARY FOCUS** - {What to integration test}
- **E2E Tests: {X}%** - {What to E2E test}
**Rationale:** {Why this split makes sense for this system}
**Test Count Summary:**
- P0: ~{N} tests - Critical paths, run on every commit
- P1: ~{N} tests - Important features, run on PR to main
- P2: ~{N} tests - Edge cases, run nightly/weekly
- P3: ~{N} tests - Exploratory, run on-demand
- **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
---
## Test Coverage Plan
**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
### P0 (Critical) - Run on every commit (~{X} min)
**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
**Key Smoke Tests** (subset of P0, run first for fast feedback):
- {Smoke test 1} - {Duration}
- {Smoke test 2} - {Duration}
- {Smoke test 3} - {Duration}
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
|-------------|------------|-----------|------------|-------|-------|
| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
**Total P0:** ~{N} tests (~{X} weeks)
#### P0 Test Scenarios (Detailed)
**1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
- [ ] {Scenario 1 with checkbox}
- [ ] {Scenario 2}
- [ ] {Scenario 3}
**2. {Test Category 2} ({N} tests)**
- [ ] {Scenario 1}
- [ ] {Scenario 2}
{Continue for all P0 categories}
---
### P1 (High) - Run on PR to main (~{X} min additional)
**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
|-------------|------------|-----------|------------|-------|-------|
| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
**Total P1:** ~{N} tests (~{X} weeks)
#### P1 Test Scenarios (Detailed)
**1. {Test Category} ({N} tests)**
- [ ] {Scenario 1}
- [ ] {Scenario 2}
{Continue for all P1 categories}
---
### P2 (Medium) - Run nightly/weekly (~{X} min)
**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
|-------------|------------|-----------|------------|-------|-------|
| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
**Total P2:** ~{N} tests (~{X} weeks)
---
### P3 (Low) - Run on-demand (exploratory)
**Execution:** Manual trigger or weekly scheduled run, performance testing
**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
| Requirement | Test Level | Test Count | Owner | Notes |
|-------------|------------|------------|-------|-------|
| {Requirement 1} | {Level} | {N} | QA | {Notes} |
| {Requirement 2} | {Level} | {N} | QA | {Notes} |
**Total P3:** ~{N} tests (~{X} days)
---
### Coverage Matrix (Requirements → Tests)
| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
|-------------|------------|----------|-----------|------------|-------|
| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
---
## Sprint 0 Setup Requirements
**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
### Architecture/Backend Blockers (from Architecture doc)
**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
- {What needs to be provided}
- **Details:** Architecture doc {Risk-ID} mitigation plan
2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
- {What needs to be provided}
- **Details:** Architecture doc {Risk-ID} mitigation plan
### QA Test Infrastructure
1. **{Factory/Fixture Name}** - QA
- Faker-based generator: `{function_signature}`
- Auto-cleanup after tests
2. **{Entity} Fixtures** - QA
- Seed scripts for {states/scenarios}
- Isolated {id_pattern} per test
### Test Environments
**Local:** {Setup details - Docker, LocalStack, etc.}
**CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
**Staging:** {Setup details - shared multi-tenant, nightly E2E}
**Production:** {Setup details - feature flags, canary transactions}
**Sprint 0 NFR Gates** (MUST complete before integration testing):
- [ ] {Gate 1}: {Description} (Owner) 🚨
- [ ] {Gate 2}: {Description} (Owner) 🚨
- [ ] {Gate 3}: {Description} (Owner) 🚨
### Sprint 1 Items (Not Sprint 0)
- **{Item 1}** ({Owner}): {Description}
- **{Item 2}** ({Owner}): {Description}
**Sprint 1 NFR Gates** (MUST complete before GA):
- [ ] {Gate 1}: {Description} (Owner)
- [ ] {Gate 2}: {Description} (Owner)
---
## NFR Readiness Summary
**Based on Architecture Doc Risk Assessment**
| NFR Category | Status | Evidence Status | Blocker | Next Action |
|--------------|--------|-----------------|---------|-------------|
| **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Performance** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Reliability** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Data Integrity** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Scalability** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Monitorability** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
| **Maintainability** | PASS | Test design complete (~{N} scenarios) | None | Proceed with implementation |
**Total:** {N} PASS, {N} CONCERNS across {N} categories
---
**End of QA Document**
**Next Steps for QA Team:**
1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
2. Set up test infrastructure (factories, fixtures, environments)
3. Begin test implementation following priority order (P0 P1 P2 P3)
4. Run smoke tests first for fast feedback
5. Track progress using test scenario checklists above
**Next Steps for Architecture Team:**
1. Monitor Sprint 0 blocker resolution
2. Provide support for QA infrastructure setup if needed
3. Review test results and address any newly discovered testability gaps

View File

@@ -15,6 +15,9 @@ date: system-generated
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-design"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
# Note: Template selection is mode-based (see instructions.md Step 1.5):
# - System-level: test-design-architecture-template.md + test-design-qa-template.md
# - Epic-level: test-design-template.md (unchanged)
template: "{installed_path}/test-design-template.md"
# Variables and inputs
@@ -26,13 +29,25 @@ variables:
# Note: Actual output file determined dynamically based on mode detection
# Declared outputs for new workflow format
outputs:
- id: system-level
description: "System-level testability review (Phase 3)"
path: "{output_folder}/test-design-system.md"
# System-Level Mode (Phase 3) - TWO documents
- id: test-design-architecture
description: "System-level test architecture: Architectural concerns, testability gaps, NFR requirements for Architecture/Dev teams"
path: "{output_folder}/test-design-architecture.md"
mode: system-level
audience: architecture
- id: test-design-qa
description: "System-level test design: Test execution recipe, coverage plan, Sprint 0 setup for QA team"
path: "{output_folder}/test-design-qa.md"
mode: system-level
audience: qa
# Epic-Level Mode (Phase 4) - ONE document (unchanged)
- id: epic-level
description: "Epic-level test plan (Phase 4)"
path: "{output_folder}/test-design-epic-{epic_num}.md"
default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
mode: epic-level
# Note: No default_output_file - mode detection determines which outputs to write
# Required tools
required_tools: