| 1 |
command |
title |
when_to_use |
preflight |
flow_cues |
deliverables |
halt_rules |
notes |
knowledge_tags |
| 2 |
*framework *automate |
Initialize test architecture Automation expansion |
Run once per repo or when no production-ready harness exists After implementation or when reforging coverage |
package.json present|no existing E2E framework detected|architectural context available all acceptance criteria satisfied|code builds locally|framework configured |
Identify stack from package.json (React/Vue/Angular/Next.js); detect bundler (Vite/Webpack/Rollup/esbuild); match test language to source (JS/TS frontend -> JS/TS tests); choose Playwright for large or performance-critical repos, Cypress for small DX-first teams; create {framework}/tests/ and {framework}/support/fixtures/ and {framework}/support/helpers/; configure config files with timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit); create .env.example with TEST_ENV, BASE_URL, API_URL; implement pure function->fixture->mergeTests pattern and faker-based data factories; enable failure-only screenshots/videos and ensure .nvmrc recorded Review story source/diff to confirm automation target; ensure fixture architecture exists (mergeTests for Playwright, commands for Cypress) and implement apiRequest/network/auth/log fixtures if missing; map acceptance criteria with test-levels-framework.md guidance and avoid duplicate coverage; assign priorities using test-priorities-matrix.md; generate unit/integration/E2E specs with naming convention feature-name.spec.ts, covering happy, negative, and edge paths; enforce deterministic waits, self-cleaning factories, and <=1.5 minute execution per test; run suite and capture Definition of Done results; update package.json scripts and README instructions |
playwright/ or cypress/ folder with config + support tree; .env.example; .nvmrc; example tests; README with setup instructions New or enhanced spec files grouped by level; fixture modules under support/; data factory utilities; updated package.json scripts and README notes; DoD summary with remaining gaps; gate-ready coverage summary |
If package.json missing OR framework already configured, halt and instruct manual review If automation target unclear or framework missing, halt and request clarification |
Playwright: worker parallelism, trace viewer, multi-language support; Cypress: avoid if many dependent API calls; Component testing: Vitest (large) or Cypress CT (small); Contract testing: Pact for microservices; always use data-cy/data-testid selectors Never create page objects; keep tests <300 lines and stateless; forbid hard waits and conditional flow in tests; co-locate tests near source; flag flaky patterns immediately |
philosophy/core|patterns/fixtures|patterns/selectors philosophy/core|patterns/helpers|patterns/waits|patterns/dod |
| 3 |
*tdd *ci |
Acceptance Test Driven Development CI/CD quality pipeline |
Before implementation when team commits to TDD Once automation suite exists or needs optimization |
story approved with acceptance criteria|dev sandbox ready|framework scaffolding in place git repository initialized|tests pass locally|team agrees on target environments|access to CI platform settings |
Clarify acceptance criteria and affected systems; pick appropriate test level (E2E/API/Component); write failing acceptance tests using Given-When-Then with network interception first then navigation; create data factories and fixture stubs for required entities; outline mocks/fixtures infrastructure the dev team must supply; generate component tests for critical UI logic; compile implementation checklist mapping each test to source work; share failing tests with dev agent and maintain red -> green -> refactor loop Detect CI platform (default GitHub Actions, ask if GitLab/CircleCI/etc); scaffold workflow (.github/workflows/test.yml or platform equivalent) with triggers; set Node.js version from .nvmrc and cache node_modules + browsers; stage jobs: lint -> unit -> component -> e2e with matrix parallelization (shard by file not test); add selective execution script for affected tests; create burn-in job that reruns changed specs 3x to catch flakiness; attach artifacts on failure (traces/videos/HAR); configure retries/backoff and concurrency controls; document required secrets and environment variables; add Slack/email notifications and local script mirroring CI |
Failing acceptance test files; component test stubs; fixture/mocks skeleton; implementation checklist with test-to-code mapping; documented data-testid requirements .github/workflows/test.yml (or platform equivalent); scripts/test-changed.sh; scripts/burn-in-changed.sh; updated README/ci.md instructions; secrets checklist; dashboard or badge configuration |
If criteria ambiguous or framework missing, halt for clarification If git repo absent, test framework missing, or CI platform unspecified, halt and request setup |
Start red; one assertion per test; use beforeEach for visible setup (no shared state); remind devs to run tests before writing production code; update checklist as each test goes green Target 20x speedups via parallel shards + caching; shard by file; keep jobs under 10 minutes; wait-on-timeout 120s for app startup; ensure npm test locally matches CI run; mention alternative platform paths when not on GitHub |
philosophy/core|patterns/test-structure philosophy/core|ci-strategy |
| 4 |
*automate *framework |
Automation expansion Initialize test architecture |
After implementation or when reforging coverage Run once per repo or when no production-ready harness exists |
all acceptance criteria satisfied|code builds locally|framework configured package.json present|no existing E2E framework detected|architectural context available |
Review story source/diff to confirm automation target; ensure fixture architecture exists (mergeTests for Playwright, commands for Cypress) and implement apiRequest/network/auth/log fixtures if missing; map acceptance criteria with test-levels-framework.md guidance and avoid duplicate coverage; assign priorities using test-priorities-matrix.md; generate unit/integration/E2E specs with naming convention feature-name.spec.ts, covering happy, negative, and edge paths; enforce deterministic waits, self-cleaning factories, and <=1.5 minute execution per test; run suite and capture Definition of Done results; update package.json scripts and README instructions Identify stack from package.json (React/Vue/Angular/Next.js); detect bundler (Vite/Webpack/Rollup/esbuild); match test language to source (JS/TS frontend -> JS/TS tests); choose Playwright for large or performance-critical repos, Cypress for small DX-first teams; create {framework}/tests/ and {framework}/support/fixtures/ and {framework}/support/helpers/; configure config files with timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit); create .env.example with TEST_ENV, BASE_URL, API_URL; implement pure function->fixture->mergeTests pattern and faker-based data factories; enable failure-only screenshots/videos and ensure .nvmrc recorded |
New or enhanced spec files grouped by level; fixture modules under support/; data factory utilities; updated package.json scripts and README notes; DoD summary with remaining gaps; gate-ready coverage summary playwright/ or cypress/ folder with config + support tree; .env.example; .nvmrc; example tests; README with setup instructions |
If automation target unclear or framework missing, halt and request clarification If package.json missing OR framework already configured, halt and instruct manual review |
Never create page objects; keep tests <300 lines and stateless; forbid hard waits and conditional flow in tests; co-locate tests near source; flag flaky patterns immediately Playwright: worker parallelism, trace viewer, multi-language support; Cypress: avoid if many dependent API calls; Component testing: Vitest (large) or Cypress CT (small); Contract testing: Pact for microservices; always use data-cy/data-testid selectors |
philosophy/core|patterns/helpers|patterns/waits|patterns/dod philosophy/core|patterns/fixtures|patterns/selectors |
| 5 |
*ci *gate |
CI/CD quality pipeline Quality gate decision |
Once automation suite exists or needs optimization After review or mitigation updates |
git repository initialized|tests pass locally|team agrees on target environments|access to CI platform settings latest assessments gathered|team consensus on fixes |
Detect CI platform (default GitHub Actions, ask if GitLab/CircleCI/etc); scaffold workflow (.github/workflows/test.yml or platform equivalent) with triggers; set Node.js version from .nvmrc and cache node_modules + browsers; stage jobs: lint -> unit -> component -> e2e with matrix parallelization (shard by file not test); add selective execution script for affected tests; create burn-in job that reruns changed specs 3x to catch flakiness; attach artifacts on failure (traces/videos/HAR); configure retries/backoff and concurrency controls; document required secrets and environment variables; add Slack/email notifications and local script mirroring CI Assemble story metadata (id, title); choose gate status using deterministic rules (PASS all critical issues resolved, CONCERNS minor residual risk, FAIL critical blockers, WAIVED approved by business); update YAML schema with sections: metadata, waiver status, top_issues, risk_summary totals, recommendations (must_fix, monitor), nfr_validation statuses, history; capture rationale, owners, due dates, and summary comment back to story |
.github/workflows/test.yml (or platform equivalent); scripts/test-changed.sh; scripts/burn-in-changed.sh; updated README/ci.md instructions; secrets checklist; dashboard or badge configuration docs/qa/gates/{story}.yml updated with schema fields (schema, story, story_title, gate, status_reason, reviewer, updated, waiver, top_issues, risk_summary, recommendations, nfr_validation, history); summary message for team |
If git repo absent, test framework missing, or CI platform unspecified, halt and request setup If review incomplete or risk data outdated, halt and request rerun |
Target 20x speedups via parallel shards + caching; shard by file; keep jobs under 10 minutes; wait-on-timeout 120s for app startup; ensure npm test locally matches CI run; mention alternative platform paths when not on GitHub FAIL whenever unresolved P0 risks/tests or security holes remain; CONCERNS when mitigations planned but residual risk exists; WAIVED requires reason, approver, and expiry; maintain audit trail in history |
philosophy/core|ci-strategy philosophy/core|risk-model |
| 6 |
*risk-profile *nfr-assess |
Risk profile analysis NFR validation |
After story approval, before development Late development or pre-review for critical stories |
story markdown present|acceptance criteria clear|architecture/PRD accessible implementation deployed locally|non-functional goals defined or discoverable |
Filter requirements so only genuine risks remain; review PRD/architecture/story for unresolved gaps; classify risks across TECH, SEC, PERF, DATA, BUS, OPS with category definitions; request clarification when evidence missing; score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical) then compute totals; highlight risks >=6 and plan mitigations with owners and timelines; prepare gate summary with residual risk Ask which NFRs to assess; default to core four (security, performance, reliability, maintainability); gather thresholds from story/architecture/technical-preferences and mark unknown targets; inspect evidence (tests, telemetry, logs) for each NFR; classify status using deterministic pass/concerns/fail rules and list quick wins; produce gate block and assessment doc with recommended actions |
Risk assessment markdown in docs/qa/assessments; table of category/probability/impact/score; gate YAML snippet summarizing totals; mitigation matrix with owners and due dates NFR assessment markdown with findings; gate YAML block capturing statuses and notes; checklist of evidence gaps and follow-up owners |
If story missing or criteria unclear, halt for clarification If NFR targets undefined and no guidance available, request definition and halt |
Category definitions: TECH=unmitigated architecture flaws, SEC=missing controls/vulnerabilities, PERF=SLA-breaking performance, DATA=loss/corruption scenarios, BUS=user/business harm, OPS=deployment/run failures; rely on evidence, not speculation; score 9 -> FAIL, 6-8 -> CONCERNS; most stories should have 0-1 high risks Unknown thresholds -> CONCERNS, never guess; ensure each NFR has evidence or call it out; suggest monitoring hooks and fail-fast mechanisms when gaps exist |
philosophy/core|risk-model philosophy/core|nfr |
| 7 |
*test-design *tdd |
Test design playbook Acceptance Test Driven Development |
After risk profile, before coding Before implementation when team commits to TDD |
risk assessment completed|story acceptance criteria available story approved with acceptance criteria|dev sandbox ready|framework scaffolding in place |
Break acceptance criteria into atomic scenarios; reference test-levels-framework.md to pick unit/integration/E2E/component levels; avoid duplicate coverage and prefer lower levels when possible; assign priorities using test-priorities-matrix.md (P0 revenue/security, P1 core journeys, P2 secondary, P3 nice-to-have); map scenarios to risk mitigations and required data/tooling; follow naming {epic}.{story}-LEVEL-SEQ and plan execution order Clarify acceptance criteria and affected systems; pick appropriate test level (E2E/API/Component); write failing acceptance tests using Given-When-Then with network interception first then navigation; create data factories and fixture stubs for required entities; outline mocks/fixtures infrastructure the dev team must supply; generate component tests for critical UI logic; compile implementation checklist mapping each test to source work; share failing tests with dev agent and maintain red -> green -> refactor loop |
Test-design markdown saved to docs/qa/assessments; scenario table with requirement/level/priority/mitigation; gate YAML block summarizing scenario counts and coverage; recommended execution order Failing acceptance test files; component test stubs; fixture/mocks skeleton; implementation checklist with test-to-code mapping; documented data-testid requirements |
If risk profile missing or acceptance criteria unclear, request it and halt If criteria ambiguous or framework missing, halt for clarification |
Shift left: unit first, escalate only when needed; tie scenarios back to risk mitigations; keep scenarios independent and maintainable Start red; one assertion per test; use beforeEach for visible setup (no shared state); remind devs to run tests before writing production code; update checklist as each test goes green |
philosophy/core|patterns/test-structure |
| 8 |
*trace *test-design |
Requirements traceability Risk and test design planning |
Mid-development checkpoint or before review After story approval, before development |
tests exist for story|access to source + specs story markdown present|acceptance criteria clear|architecture/PRD accessible |
Gather acceptance criteria and implemented tests; map each criterion to concrete tests (file + describe/it) using Given-When-Then narrative; classify coverage status as FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY; flag severity based on priority (P0 gaps critical); recommend additional tests or refactors; generate gate YAML coverage summary Filter requirements so only genuine risks remain; review PRD/architecture/story for unresolved gaps; classify risks across TECH, SEC, PERF, DATA, BUS, OPS using category definitions; request clarification when evidence missing; score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical) then compute totals; highlight risks >=6 and plan mitigations with owners and timelines; break acceptance criteria into atomic scenarios mapped to mitigations; reference test-levels-framework.md to pick unit/integration/E2E/component levels; avoid duplicate coverage, prefer lower levels when possible; assign priorities using test-priorities-matrix.md; outline data/tooling prerequisites and execution order |
Traceability report saved under docs/qa/assessments; coverage matrix with status per criterion; gate YAML snippet for coverage totals and gaps Risk assessment markdown in docs/qa/assessments; table of category/probability/impact/score; mitigation matrix with owners and due dates; coverage matrix with requirement/level/priority/mitigation; gate YAML snippet summarizing risk totals and scenario counts; recommended execution order |
If story lacks implemented tests, pause and advise running *tdd or writing tests If story missing or criteria unclear, halt for clarification |
Definitions: FULL=all scenarios validated, PARTIAL=some coverage exists, NONE=no validation, UNIT-ONLY=missing higher level, INTEGRATION-ONLY=lacks lower confidence; ensure assertions explicit and avoid duplicate coverage Category definitions: TECH=architecture flaws; SEC=missing controls/vulnerabilities; PERF=SLA risk; DATA=loss/corruption; BUS=user/business harm; OPS=deployment/run failures; rely on evidence, not speculation; tie scenarios back to risk mitigations; keep scenarios independent and maintainable |
philosophy/core|patterns/assertions philosophy/core|risk-model|patterns/test-structure |
| 9 |
*nfr-assess *trace |
NFR validation Requirements traceability |
Late development or pre-review for critical stories Mid-development checkpoint or before review |
implementation deployed locally|non-functional goals defined or discoverable tests exist for story|access to source + specs |
Ask which NFRs to assess; default to core four (security, performance, reliability, maintainability); gather thresholds from story/architecture/technical-preferences and mark unknown targets; inspect evidence (tests, telemetry, logs) for each NFR; classify status using deterministic pass/concerns/fail rules and list quick wins; produce gate block and assessment doc with recommended actions Gather acceptance criteria and implemented tests; map each criterion to concrete tests (file + describe/it) using Given-When-Then narrative; classify coverage status as FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY; flag severity based on priority (P0 gaps critical); recommend additional tests or refactors; generate gate YAML coverage summary |
NFR assessment markdown with findings; gate YAML block capturing statuses and notes; checklist of evidence gaps and follow-up owners Traceability report saved under docs/qa/assessments; coverage matrix with status per criterion; gate YAML snippet for coverage totals and gaps |
If NFR targets undefined and no guidance available, request definition and halt If story lacks implemented tests, pause and advise running *tdd or writing tests |
Unknown thresholds -> CONCERNS, never guess; ensure each NFR has evidence or call it out; suggest monitoring hooks and fail-fast mechanisms when gaps exist Definitions: FULL=all scenarios validated, PARTIAL=some coverage exists, NONE=no validation, UNIT-ONLY=missing higher level, INTEGRATION-ONLY=lacks lower confidence; ensure assertions explicit and avoid duplicate coverage |
philosophy/core|nfr philosophy/core|patterns/assertions |
|
*gate |
Quality gate decision |
After review or mitigation updates |
latest assessments gathered|team consensus on fixes |
Assemble story metadata (id, title); choose gate status using deterministic rules (PASS all critical issues resolved, CONCERNS minor residual risk, FAIL critical blockers, WAIVED approved by business); update YAML schema with sections: metadata, waiver status, top_issues, risk_summary totals, recommendations (must_fix, monitor), nfr_validation statuses, history; capture rationale, owners, due dates, and summary comment back to story |
docs/qa/gates/{story}.yml updated with schema fields (schema, story, story_title, gate, status_reason, reviewer, updated, waiver, top_issues, risk_summary, recommendations, nfr_validation, history); summary message for team |
If review incomplete or risk data outdated, halt and request rerun |
FAIL whenever unresolved P0 risks/tests or security holes remain; CONCERNS when mitigations planned but residual risk exists; WAIVED requires reason, approver, and expiry; maintain audit trail in history |
philosophy/core|risk-model |