From a6be9bea316508b3e609db0cfb8c788344875b2c Mon Sep 17 00:00:00 2001 From: "den (work)" <53200638+localden@users.noreply.github.com> Date: Sun, 5 Oct 2025 22:55:34 -0700 Subject: [PATCH] Update checklist.md --- templates/commands/checklist.md | 264 +++++++++++++++++++++----------- 1 file changed, 175 insertions(+), 89 deletions(-) diff --git a/templates/commands/checklist.md b/templates/commands/checklist.md index acce2e06..b9290907 100644 --- a/templates/commands/checklist.md +++ b/templates/commands/checklist.md @@ -5,23 +5,24 @@ scripts: ps: scripts/powershell/check-prerequisites.ps1 -Json --- -## Checklist Purpose +## Checklist Purpose: "Unit Tests for English" -**CRITICAL CLARIFICATION**: Checklists generated by this command are for **requirements validation**, NOT: -- ❌ Verifying code execution or functionality -- ❌ Testing whether code matches the specification -- ❌ Checking implementation correctness -- ❌ Code review or quality assurance +**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. -**What checklists ARE for**: -- ✅ Ensuring requirements are clearly captured and complete -- ✅ Identifying ambiguities in specifications or plans -- ✅ Verifying proper scenario coverage across the spec and plan -- ✅ Confirming acceptance criteria are well-defined and measurable -- ✅ Detecting gaps, conflicts, or missing edge cases in requirements -- ✅ Validating that the problem domain is properly understood before implementation +**NOT for verification/testing**: +- ❌ NOT "Verify the button clicks correctly" +- ❌ NOT "Test error handling works" +- ❌ NOT "Confirm the API returns 200" +- ❌ NOT checking if code/implementation matches the spec -Think of checklists as a **pre-implementation review** to ensure the spec and plan are solid, not a post-implementation verification tool. +**FOR requirements quality validation**: +- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness) +- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) +- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency) +- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage) +- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases) + +**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. ## User Input @@ -85,69 +86,122 @@ You **MUST** consider the user input before proceeding (if not empty). - Use progressive disclosure: add follow-on retrieval only if gaps detected - If source docs are large, generate interim summary items instead of embedding raw text -5. **Generate checklist**: +5. **Generate checklist** - Create "Unit Tests for Requirements": - Create `FEATURE_DIR/checklists/` directory if it doesn't exist - Generate unique checklist filename: - - Use short, descriptive name based on checklist type - - Format: `[type].md` (e.g., `ux.md`, `test.md`, `security.md`, `deploy.md`) - - If file exists, append to existing file (e.g., use the same UX checklist) + - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) + - Format: `[domain].md` + - If file exists, append to existing file - Number items sequentially starting from CHK001 - Each `/checklist` run creates a NEW file (never overwrites existing checklists) - **Category Structure** - Group items ONLY using this controlled set: - - Primary Flows - - Alternate Flows - - Exception / Error Flows - - Recovery & Resilience - - Non-Functional Domains (sub‑grouped or prefixed: Performance, Reliability, Security & Privacy, Accessibility, Observability, Scalability, Data Lifecycle) - - Traceability & Coverage - - Ambiguities & Conflicts - - Assumptions & Dependencies + **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: + Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: + - **Completeness**: Are all necessary requirements present? + - **Clarity**: Are requirements unambiguous and specific? + - **Consistency**: Do requirements align with each other? + - **Measurability**: Can requirements be objectively verified? + - **Coverage**: Are all scenarios/edge cases addressed? - Do NOT invent ad-hoc categories; merge sparse categories (<2 items) into the closest higher-signal category. + **Category Structure** - Group items by requirement quality dimensions: + - **Requirement Completeness** (Are all necessary requirements documented?) + - **Requirement Clarity** (Are requirements specific and unambiguous?) + - **Requirement Consistency** (Do requirements align without conflicts?) + - **Acceptance Criteria Quality** (Are success criteria measurable?) + - **Scenario Coverage** (Are all flows/cases addressed?) + - **Edge Case Coverage** (Are boundary conditions defined?) + - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) + - **Dependencies & Assumptions** (Are they documented and validated?) + - **Ambiguities & Conflicts** (What needs clarification?) + + **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: + + ❌ **WRONG** (Testing implementation): + - "Verify landing page displays 3 episode cards" + - "Test hover states work on desktop" + - "Confirm logo click navigates home" + + ✅ **CORRECT** (Testing requirements quality): + - "Are the exact number and layout of featured episodes specified?" [Completeness] + - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] + - "Are hover state requirements consistent across all interactive elements?" [Consistency] + - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] + - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] + - "Are loading states defined for asynchronous episode data?" [Completeness] + - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] + + **ITEM STRUCTURE**: + Each item should follow this pattern: + - Question format asking about requirement quality + - Focus on what's WRITTEN (or not written) in the spec/plan + - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] + - Reference spec section `[Spec §X.Y]` when checking existing requirements + - Use `[Gap]` marker when checking for missing requirements + + **EXAMPLES BY QUALITY DIMENSION**: + + Completeness: + - "Are error handling requirements defined for all API failure modes? [Gap]" + - "Are accessibility requirements specified for all interactive elements? [Completeness]" + - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" + + Clarity: + - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]" + - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]" + - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]" + + Consistency: + - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]" + - "Are card component requirements consistent between landing and detail pages? [Consistency]" + + Coverage: + - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" + - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" + - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" + + Measurability: + - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]" + - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]" - **Scenario Classification & Coverage** (Requirements Validation Focus): - - Classify scenarios into: Primary, Alternate, Exception/Error, Recovery/Resilience, Non-Functional - - At least one item per present scenario class; if intentionally absent add: `Confirm intentional absence of scenarios` - - Include resilience/rollback coverage when state mutation or migrations occur (partial write, degraded mode, backward compatibility, rollback preconditions) - - If a major scenario lacks acceptance criteria, add an item to define measurable criteria - - **Focus on requirements validation**: Are scenarios clearly defined? Are acceptance criteria measurable? Are edge cases identified in the spec? + **Scenario Classification & Coverage** (Requirements Quality Focus): + - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios + - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" + - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" + - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" **Traceability Requirements**: - MINIMUM: ≥80% of items MUST include at least one traceability reference - - Each item should include ≥1 of: scenario class tag, spec ref `[Spec §X.Y]`, acceptance criterion `[AC-##]`, or marker `(Assumption)/(Dependency)/(Ambiguity)/(Conflict)` - - If no ID system exists, create an item: `Establish requirement & acceptance criteria ID scheme before proceeding` + - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` + - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" - **Surface & Resolve Issues** (Pre-Implementation Validation): - - Cluster and create one resolution item per cluster for: - - Ambiguities (vague terms in spec: "fast", "robust", "secure" - these need quantification) - - Conflicts (contradictory statements in requirements) - - Assumptions (unvalidated premises in the spec or plan) - - Dependencies (external systems, feature flags, migrations, upstream APIs - are they documented?) - - Items should focus on "Is this requirement clear enough to implement?" not "Does the code work?" + **Surface & Resolve Issues** (Requirements Quality Problems): + Ask questions about the requirements themselves: + - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]" + - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]" + - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" + - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" + - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" **Content Consolidation**: - - Soft cap: If raw candidate items > 40, prioritize by risk/impact and add: `Consolidate remaining low-impact scenarios (see source docs) after priority review` - - Merge near-duplicates when: same scenario class + same spec section + overlapping acceptance intent - - If >5 low-impact edge cases, cluster into a single aggregated item - - Do not repeat identical spec or acceptance refs in >3 items unless covering distinct scenario classes - - Treat context budget as finite: do not restate already-tagged requirements verbatim across multiple items + - Soft cap: If raw candidate items > 40, prioritize by risk/impact + - Merge near-duplicates checking the same requirement aspect + - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" - **🚫 PROHIBITED CONTENT** (Requirements Focus ONLY): - - Focus on requirements & scenario coverage quality, NOT implementation - - NEVER include: specific tests ("unit test", "integration test"), code symbols, frameworks, algorithmic prescriptions, deployment steps, test plan details, implementation strategy - - Rephrase any such user input into requirement clarity or coverage validation - - Optional brief rationale ONLY if it clarifies requirement intent or risk - - **✅ HOW TO PHRASE CHECKLIST ITEMS** (Requirements Validation): - - Good: "Verify error handling scenarios are defined for network failures" - - Bad: "Test error handling for network failures" - - Good: "Confirm acceptance criteria are measurable for performance requirements" - - Bad: "Run performance tests to verify requirements" - - Good: "Identify edge cases for concurrent user access in spec" - - Bad: "Implement thread-safe concurrent access" - - Good: "Clarify ambiguous term 'fast response' with specific timing requirements" - - Bad: "Verify response time is under 100ms" + **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: + - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior + - ❌ References to code execution, user actions, system behavior + - ❌ "Displays correctly", "works properly", "functions as expected" + - ❌ "Click", "navigate", "render", "load", "execute" + - ❌ Test cases, test plans, QA procedures + - ❌ Implementation details (frameworks, APIs, algorithms) + + **✅ REQUIRED PATTERNS** - These test requirements quality: + - ✅ "Are [requirement type] defined/specified/documented for [scenario]?" + - ✅ "Is [vague term] quantified/clarified with specific criteria?" + - ✅ "Are requirements consistent between [section A] and [section B]?" + - ✅ "Can [requirement] be objectively measured/verified?" + - ✅ "Are [edge cases/scenarios] addressed in requirements?" + - ✅ "Does the spec define [missing aspect]?" 6. **Structure Reference**: Generate the checklist following the canonical template in `templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. @@ -165,39 +219,71 @@ You **MUST** consider the user input before proceeding (if not empty). To avoid clutter, use descriptive types and clean up obsolete checklists when done. -## Example Checklist Types +## Example Checklist Types & Sample Items -**Specification Review:** `spec-review.md` +**UX Requirements Quality:** `ux.md` -- Requirement completeness and clarity -- User scenarios and edge cases coverage -- Acceptance criteria definition -- Domain-specific considerations +Sample items (testing the requirements, NOT the implementation): +- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]" +- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]" +- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" +- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" +- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" +- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]" -**Requirements Quality:** `requirements.md` +**API Requirements Quality:** `api.md` -- Testable and measurable outcomes -- Stakeholder alignment verification -- Assumptions and constraints documentation -- Success metrics definition +Sample items: +- "Are error response formats specified for all failure scenarios? [Completeness]" +- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" +- "Are authentication requirements consistent across all endpoints? [Consistency]" +- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" +- "Is versioning strategy documented in requirements? [Gap]" -**UX/Accessibility Scenarios:** `ux.md` or `a11y.md` +**Performance Requirements Quality:** `performance.md` -- User journey completeness -- Accessibility requirement coverage -- Responsive design considerations -- Internationalization needs +Sample items: +- "Are performance requirements quantified with specific metrics? [Clarity]" +- "Are performance targets defined for all critical user journeys? [Coverage]" +- "Are performance requirements under different load conditions specified? [Completeness]" +- "Can performance requirements be objectively measured? [Measurability]" +- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" -**Security Requirements:** `security.md` +**Security Requirements Quality:** `security.md` -- Threat model coverage -- Authentication/authorization requirements -- Data protection requirements -- Compliance and regulatory needs +Sample items: +- "Are authentication requirements specified for all protected resources? [Coverage]" +- "Are data protection requirements defined for sensitive information? [Completeness]" +- "Is the threat model documented and requirements aligned to it? [Traceability]" +- "Are security requirements consistent with compliance obligations? [Consistency]" +- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" -**API/Integration Scenarios:** `api.md` +## Anti-Examples: What NOT To Do -- Contract completeness -- Error handling scenarios -- Backward compatibility considerations -- Integration touchpoint coverage +**❌ WRONG - These test implementation, not requirements:** + +```markdown +- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001] +- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003] +- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010] +- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005] +``` + +**✅ CORRECT - These test requirements quality:** + +```markdown +- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001] +- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003] +- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010] +- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005] +- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] +- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001] +``` + +**Key Differences:** +- Wrong: Tests if the system works correctly +- Correct: Tests if the requirements are written correctly +- Wrong: Verification of behavior +- Correct: Validation of requirement quality +- Wrong: "Does it do X?" +- Correct: "Is X clearly specified?"