feat: add 5 gstack-inspired lifecycle commands (critique, review, qa, ship, retro)

Add 5 new core command templates inspired by Garry Tan's GStack to complete the spec-driven development lifecycle: - /speckit.critique: Dual-lens product + engineering review before implementation - /speckit.review: Staff-level code review (correctness, security, performance) - /speckit.qa: Systematic QA testing (browser-driven and CLI modes) - /speckit.ship: Release automation (pre-flight, changelog, CI, PR creation) - /speckit.retro: Sprint retrospective with metrics and improvement suggestions Each command includes: - Command template in templates/commands/ - Output report template in templates/ - Extension hook support (before_*/after_*) - YAML frontmatter with prerequisite scripts Updated README.md workflow from 6 to 11 steps and added CHANGELOG entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-02 18:53:09 +00:00 · 2026-04-01 11:22:57 +05:30
parent 3899dcc0d4
commit 6eb15a7a3e
12 changed files with 1664 additions and 2 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,17 @@
 <!-- insert new changelog below this comment -->
 ## [Unreleased]
 ### Added
 - feat: Add `/speckit.critique` command — dual-lens strategic and technical review (product + engineering perspectives) before implementation
 - feat: Add `/speckit.review` command — staff-level code review focused on correctness, security, performance, and spec compliance
 - feat: Add `/speckit.qa` command — systematic QA testing with browser-driven and CLI-based modes, validating acceptance criteria
 - feat: Add `/speckit.ship` command — release engineering automation (pre-flight checks, branch sync, changelog, CI verification, PR creation)
 - feat: Add `/speckit.retro` command — sprint retrospective with metrics, learnings, and improvement suggestions
 - feat: Add output templates for review, QA, ship, retro, and critique reports
 ## [0.4.3] - 2026-03-26
 ### Changed
--- a/README.md
+++ b/README.md
@@ -134,7 +134,15 @@ Use the **`/speckit.plan`** command to provide your tech stack and architecture
 /speckit.plan The application uses Vite with minimal number of libraries. Use vanilla HTML, CSS, and JavaScript as much as possible. Images are not uploaded anywhere and metadata is stored in a local SQLite database.
 ```
-### 5. Break down into tasks
+### 5. Challenge the plan
 Use **`/speckit.critique`** to critically evaluate your spec and plan from both product strategy and engineering risk perspectives before committing to implementation.
 ```bash
 /speckit.critique
 ```
 ### 6. Break down into tasks
 Use **`/speckit.tasks`** to create an actionable task list from your implementation plan.
@@ -142,7 +150,7 @@ Use **`/speckit.tasks`** to create an actionable task list from your implementat
 /speckit.tasks
 ```
-### 6. Execute implementation
+### 7. Execute implementation
 Use **`/speckit.implement`** to execute all tasks and build your feature according to the plan.
@@ -150,6 +158,38 @@ Use **`/speckit.implement`** to execute all tasks and build your feature accordi
 /speckit.implement
 ```
 ### 8. Review the code
 Use **`/speckit.review`** to perform a staff-level code review focused on correctness, security, performance, and spec compliance.
 ```bash
 /speckit.review
 ```
 ### 9. Run QA testing
 Use **`/speckit.qa`** to systematically test the implemented feature against acceptance criteria, using browser-driven or CLI-based testing.
 ```bash
 /speckit.qa
 ```
 ### 10. Ship it
 Use **`/speckit.ship`** to automate the release pipeline — pre-flight checks, branch sync, changelog generation, CI verification, and PR creation.
 ```bash
 /speckit.ship
 ```
 ### 11. Retrospective
 Use **`/speckit.retro`** to reflect on the completed development cycle with metrics, learnings, and improvement suggestions for the next iteration.
 ```bash
 /speckit.retro
 ```
 For detailed step-by-step instructions, see our [comprehensive guide](./spec-driven.md).
 ## 📽️ Video Overview
--- a/templates/commands/critique.md
+++ b/templates/commands/critique.md
@@ -0,0 +1,238 @@
 ---
 description: Perform a dual-lens critical review of the specification and plan from both product strategy and engineering risk perspectives before implementation.
 scripts:
  sh: scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks
  ps: scripts/powershell/check-prerequisites.ps1 -Json -RequireTasks -IncludeTasks
 ---
 ## User Input
 ```text
 $ARGUMENTS
 ```
 You **MUST** consider the user input before proceeding (if not empty).
 ## Pre-Execution Checks
 **Check for extension hooks (before critique)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.before_critique` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Pre-Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Pre-Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    Wait for the result of the hook command before proceeding to the Outline.
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
 ## Goal
 Challenge the specification and implementation plan through two distinct expert lenses BEFORE committing to implementation. The **Product Lens** evaluates whether the right problem is being solved in the right way for users. The **Engineering Lens** evaluates whether the technical approach is sound, scalable, and free of hidden risks. This dual review prevents costly mid-implementation pivots and catches strategic and technical blind spots early.
 ## Operating Constraints
 **STRICTLY READ-ONLY**: Do **not** modify `spec.md`, `plan.md`, or any other files. Output a structured critique report. Offer to apply approved changes only after the user reviews findings.
 **CONSTRUCTIVE CHALLENGE**: The goal is to strengthen the spec and plan, not to block progress. Every critique item must include a constructive suggestion for improvement.
 **Constitution Authority**: The project constitution (`/memory/constitution.md`) defines non-negotiable principles. Any spec/plan element conflicting with the constitution is automatically a 🎯 Must-Address item.
 ## Outline
 1. Run `{SCRIPT}` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
 2. **Load Critique Context**:
   - **REQUIRED**: Read `spec.md` for requirements, user stories, and acceptance criteria
   - **REQUIRED**: Read `plan.md` for architecture, tech stack, and implementation phases
   - **IF EXISTS**: Read `/memory/constitution.md` for governing principles
   - **IF EXISTS**: Read `tasks.md` for task breakdown (if already generated)
   - **IF EXISTS**: Read previous critique reports in FEATURE_DIR/critiques/ for context
 3. **Product Lens Review** (CEO/Product Lead Perspective):
   Adopt the mindset of an experienced product leader who cares deeply about user value, market fit, and business impact. Evaluate:
   #### 3a. Problem Validation
   - Is the problem statement clear and well-defined?
   - Is this solving a real user pain point, or is it a solution looking for a problem?
   - What evidence supports the need for this feature? (user research, data, customer requests)
   - Is the scope appropriate — not too broad (trying to do everything) or too narrow (missing the core value)?
   #### 3b. User Value Assessment
   - Does every user story deliver tangible user value?
   - Are the acceptance criteria written from the user's perspective (outcomes, not implementation)?
   - Is the user journey complete — or are there gaps where users would get stuck?
   - What's the simplest version that would deliver 80% of the value? (MVP analysis)
   - Are there unnecessary features that add complexity without proportional value?
   #### 3c. Alternative Approaches
   - Could a simpler solution achieve the same outcome?
   - Are there existing tools, libraries, or services that could replace custom implementation?
   - What would a competitor's approach look like?
   - What would happen if this feature were NOT built? What's the cost of inaction?
   #### 3d. Edge Cases & User Experience
   - What happens when things go wrong? (error states, empty states, loading states)
   - How does this feature interact with existing functionality?
   - Are accessibility considerations addressed?
   - Is the feature discoverable and intuitive?
   - What are the onboarding/migration implications for existing users?
   #### 3e. Success Measurement
   - Are the success criteria measurable and time-bound?
   - How will you know if this feature is successful after launch?
   - What metrics should be tracked?
   - What would trigger a rollback decision?
 4. **Engineering Lens Review** (Staff Engineer Perspective):
   Adopt the mindset of a senior staff engineer who has seen projects fail due to hidden technical risks. Evaluate:
   #### 4a. Architecture Soundness
   - Does the architecture follow established patterns for this type of system?
   - Are boundaries and interfaces well-defined (separation of concerns)?
   - Is the architecture testable at each layer?
   - Are there circular dependencies or tight coupling risks?
   - Does the architecture support future evolution without major refactoring?
   #### 4b. Failure Mode Analysis
   - What are the most likely failure modes? (network failures, data corruption, resource exhaustion)
   - How does the system degrade gracefully under each failure mode?
   - What happens under peak load? Is there a scaling bottleneck?
   - What are the blast radius implications — can a failure in this feature affect other parts of the system?
   - Are retry, timeout, and circuit-breaker strategies defined?
   #### 4c. Security & Privacy Review
   - What is the threat model? What attack vectors does this feature introduce?
   - Are trust boundaries clearly defined (user input, API responses, third-party data)?
   - Is sensitive data handled appropriately (encryption, access control, retention)?
   - Are there compliance implications (GDPR, SOC2, HIPAA)?
   - Is the principle of least privilege followed?
   #### 4d. Performance & Scalability
   - Are there potential bottlenecks in the data flow?
   - What are the expected data volumes? Will the design handle 10x growth?
   - Are caching strategies appropriate and cache invalidation well-defined?
   - Are database queries optimized (indexing, pagination, query complexity)?
   - Are there resource-intensive operations that should be async or batched?
   #### 4e. Testing Strategy
   - Is the testing plan comprehensive (unit, integration, E2E)?
   - Are the critical paths identified for priority testing?
   - Is the test data strategy realistic?
   - Are there testability concerns (hard-to-mock dependencies, race conditions)?
   - Is the test coverage target appropriate for the risk level?
   #### 4f. Operational Readiness
   - Is observability planned (logging, metrics, tracing)?
   - Are alerting thresholds defined?
   - Is there a rollback strategy?
   - Are database migrations reversible?
   - Is the deployment strategy clear (blue-green, canary, feature flags)?
   #### 4g. Dependencies & Integration Risks
   - Are third-party dependencies well-understood (stability, licensing, maintenance)?
   - Are integration points with existing systems well-defined?
   - What happens if an external service is unavailable?
   - Are API versioning and backward compatibility considered?
 5. **Cross-Lens Synthesis**:
   Identify items where both lenses converge (these are highest priority):
   - Product simplification that also reduces engineering risk
   - Engineering constraints that affect user experience
   - Scope adjustments that improve both value delivery and technical feasibility
 6. **Severity Classification**:
   Classify each finding:
   - 🎯 **Must-Address**: Blocks proceeding to implementation. Critical product gap, security vulnerability, architecture flaw, or constitution violation. Must be resolved before `/speckit.tasks`.
   - 💡 **Recommendation**: Strongly suggested improvement that would significantly improve quality, value, or risk profile. Should be addressed but won't block progress.
   - 🤔 **Question**: Ambiguity or assumption that needs stakeholder input. Cannot be resolved by the development team alone.
 7. **Generate Critique Report**:
   Create the critique report at `FEATURE_DIR/critiques/critique-{timestamp}.md` using the critique report template. The report must include:
   - **Executive Summary**: Overall assessment and readiness to proceed
   - **Product Lens Findings**: Organized by subcategory (3a-3e)
   - **Engineering Lens Findings**: Organized by subcategory (4a-4g)
   - **Cross-Lens Insights**: Items where both perspectives converge
   - **Findings Summary Table**: All items with ID, lens, severity, summary, suggestion
   **Findings Table Format**:
   | ID | Lens | Severity | Category | Finding | Suggestion |
   |----|------|----------|----------|---------|------------|
   | P1 | Product | 🎯 | Problem Validation | No evidence of user need | Conduct 5 user interviews or reference support tickets |
   | E1 | Engineering | 💡 | Failure Modes | No retry strategy for API calls | Add exponential backoff with circuit breaker |
   | X1 | Both | 🎯 | Scope × Risk | Feature X adds complexity with unclear value | Defer to v2; reduces both scope and technical risk |
 8. **Provide Verdict**:
   Based on findings, provide one of:
   - ✅ **PROCEED**: No must-address items. Spec and plan are solid. Run `/speckit.tasks` to proceed.
   - ⚠️ **PROCEED WITH UPDATES**: Must-address items found but are resolvable. Offer to apply fixes to spec/plan, then proceed.
   - 🛑 **RETHINK**: Fundamental product or architecture concerns. Recommend revisiting the spec with `/speckit.specify` or the plan with `/speckit.plan`.
 9. **Offer Remediation**:
   For each must-address item and recommendation:
   - Provide a specific suggested edit to `spec.md` or `plan.md`
   - Ask: "Would you like me to apply these changes? (all / select / none)"
   - If user approves, apply changes to the relevant files
   - After applying changes, recommend re-running `/speckit.critique` to verify
 ## Post-Critique Actions
 Suggest next steps based on verdict:
 - If PROCEED: "Run `/speckit.tasks` to break the plan into actionable tasks"
 - If PROCEED WITH UPDATES: "Review the suggested changes, then run `/speckit.tasks`"
 - If RETHINK: "Consider running `/speckit.specify` to refine the spec or `/speckit.plan` to revise the architecture"
 **Check for extension hooks (after critique)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.after_critique` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
--- a/templates/commands/qa.md
+++ b/templates/commands/qa.md
@@ -0,0 +1,199 @@
 ---
 description: Run systematic QA testing against the implemented feature, validating acceptance criteria through browser-driven or CLI-based testing.
 scripts:
  sh: scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks
  ps: scripts/powershell/check-prerequisites.ps1 -Json -RequireTasks -IncludeTasks
 ---
 ## User Input
 ```text
 $ARGUMENTS
 ```
 You **MUST** consider the user input before proceeding (if not empty).
 ## Pre-Execution Checks
 **Check for extension hooks (before QA)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.before_qa` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Pre-Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Pre-Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    Wait for the result of the hook command before proceeding to the Outline.
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
 ## Goal
 Perform systematic quality assurance testing of the implemented feature by validating acceptance criteria from the specification against actual application behavior. Supports two modes: **Browser QA** for web applications (using Playwright or similar browser automation) and **CLI QA** for non-web applications (using test runners, API calls, and command-line validation).
 ## Operating Constraints
 **NON-DESTRUCTIVE**: QA testing should not corrupt production data or leave the application in a broken state. Use test databases, test accounts, and cleanup procedures where applicable.
 **Evidence-Based**: Every pass/fail determination must include evidence (screenshots, response payloads, console output, or test results).
 ## Outline
 1. Run `{SCRIPT}` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
 2. **Load QA Context**:
   - **REQUIRED**: Read `spec.md` for acceptance criteria, user stories, and success criteria
   - **REQUIRED**: Read `tasks.md` to identify implemented features and affected areas
   - **IF EXISTS**: Read `plan.md` for technical details, routes, and API endpoints
   - **IF EXISTS**: Read review reports in FEATURE_DIR/reviews/ for known issues to verify
   - **IF EXISTS**: Read `/memory/constitution.md` for quality standards
 3. **Extract Test Scenarios**:
   From the loaded artifacts, build a structured test plan:
   - Map each user story to one or more test scenarios
   - Map each acceptance criterion to a verifiable test case
   - Identify happy paths, error paths, and edge cases
   - Prioritize scenarios: critical user flows → error handling → edge cases → performance
   Output the test plan as a numbered list:
   ```
   QA Test Plan:
   TC-001: [User Story X] - [Scenario description] - [Expected outcome]
   TC-002: [User Story Y] - [Scenario description] - [Expected outcome]
   ...
   ```
 4. **Detect QA Mode**:
   Determine the appropriate testing approach based on the project:
   **Browser QA Mode** (for web applications):
   - Detect if the project is a web application (check for: package.json with dev/start scripts, index.html, web framework in plan.md)
   - Check for browser automation tools: Playwright, Puppeteer, Cypress, Selenium
   - If available, use browser automation for UI testing
   - If not available but project is a web app, use `curl`/`fetch` for API-level testing
   **CLI QA Mode** (for non-web applications):
   - Use the project's existing test runner (npm test, pytest, go test, cargo test, etc.)
   - Execute CLI commands and validate output
   - Use API calls for service validation
   - Check database state for data integrity
 5. **Environment Setup**:
   - Attempt to start the application if it's not already running:
     - Check for common start commands: `npm run dev`, `npm start`, `python manage.py runserver`, `go run .`, `cargo run`, etc.
     - Use the dev/start command from `plan.md` if specified
     - Wait for the application to be responsive (health check endpoint or port availability)
   - If the application cannot be started, fall back to running the existing test suite
   - Create the QA output directories:
     - `FEATURE_DIR/qa/` for reports
     - `FEATURE_DIR/qa/screenshots/` for visual evidence (browser mode)
     - `FEATURE_DIR/qa/responses/` for API response captures (CLI mode)
 6. **Execute Test Scenarios — Browser QA Mode**:
   For each test scenario in the plan:
   - Navigate to the relevant route/page
   - Perform the user actions described in the scenario
   - Capture a screenshot at each key state transition
   - Validate the expected outcome:
     - UI element presence/absence
     - Text content verification
     - Form submission results
     - Navigation behavior
     - Error message display
   - Record the result: ✅ PASS, ❌ FAIL, ⚠️ PARTIAL, ⏭️ SKIPPED
   - For failures: capture the screenshot, console errors, and network errors
   - For partial passes: document what worked and what didn't
 7. **Execute Test Scenarios — CLI QA Mode**:
   For each test scenario in the plan:
   - Run the appropriate command or API call
   - Capture stdout, stderr, and exit codes
   - Validate the expected outcome:
     - Command output matches expected patterns
     - Exit codes are correct (0 for success, non-zero for expected errors)
     - API responses match expected schemas and status codes
     - Database state reflects expected changes
     - File system changes are correct
   - Record the result: ✅ PASS, ❌ FAIL, ⚠️ PARTIAL, ⏭️ SKIPPED
   - For failures: capture full output, error messages, and stack traces
 8. **Run Existing Test Suites**:
   In addition to scenario-based testing, run the project's existing test suites:
   - Detect test runner: `npm test`, `pytest`, `go test ./...`, `cargo test`, `dotnet test`, `mvn test`, etc.
   - Run the full test suite and capture results
   - Report: total tests, passed, failed, skipped, coverage percentage (if available)
   - Flag any pre-existing test failures vs. new failures from implementation changes
 9. **Generate QA Report**:
   Create the QA report at `FEATURE_DIR/qa/qa-{timestamp}.md` using the QA report template. The report must include:
   - **QA Summary**: Overall verdict (✅ ALL PASSED / ⚠️ PARTIAL PASS / ❌ FAILURES FOUND)
   - **Test Results Table**: Each scenario with ID, description, mode, result, evidence link
   - **Acceptance Criteria Coverage**: Matrix of criteria vs. test status
   - **Test Suite Results**: Existing test suite pass/fail summary
   - **Failures Detail**: For each failed scenario — steps to reproduce, expected vs. actual, evidence
   - **Environment Info**: OS, browser (if applicable), runtime versions, application URL
   - **Metrics**: Total scenarios, passed, failed, partial, skipped, coverage percentage
 10. **Provide QA Verdict**:
    Based on results, provide one of:
    - ✅ **QA PASSED**: All critical scenarios pass, no blockers. Safe to proceed to `/speckit.ship`
    - ⚠️ **QA PASSED WITH NOTES**: Critical paths pass but some edge cases or non-critical scenarios failed. List items.
    - ❌ **QA FAILED**: Critical user flows or acceptance criteria are not met. Must fix and re-test.
 ## Post-QA Actions
 Suggest next steps based on verdict:
 - If QA PASSED: "Run `/speckit.ship` to prepare the release"
 - If QA PASSED WITH NOTES: "Address noted items if possible, then run `/speckit.ship`"
 - If QA FAILED: "Fix failing scenarios, then run `/speckit.qa` again to re-test"
 **Check for extension hooks (after QA)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.after_qa` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
--- a/templates/commands/retro.md
+++ b/templates/commands/retro.md
@@ -0,0 +1,245 @@
 ---
 description: Conduct a structured retrospective analysis of the completed development cycle with metrics, learnings, and improvement suggestions.
 scripts:
  sh: scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks
  ps: scripts/powershell/check-prerequisites.ps1 -Json -RequireTasks -IncludeTasks
 ---
 ## User Input
 ```text
 $ARGUMENTS
 ```
 You **MUST** consider the user input before proceeding (if not empty).
 ## Pre-Execution Checks
 **Check for extension hooks (before retro)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.before_retro` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Pre-Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Pre-Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    Wait for the result of the hook command before proceeding to the Outline.
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
 ## Goal
 Conduct a structured retrospective analysis of the completed development cycle — from specification through shipping. Analyze what went well, what didn't, and generate actionable improvement suggestions for future iterations. Track metrics over time to identify trends and continuously improve the spec-driven development process.
 ## Operating Constraints
 **CONSTRUCTIVE FOCUS**: The retrospective should be balanced — celebrating successes alongside identifying improvements. Avoid blame; focus on process improvements.
 **DATA-DRIVEN**: Base analysis on actual artifacts, git history, and measurable outcomes rather than subjective impressions.
 **OPTIONAL WRITES**: The retro report is always written. Updates to `constitution.md` with new learnings are offered but require explicit user approval.
 ## Outline
 1. Run `{SCRIPT}` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
 2. **Gather Retrospective Data**:
   Load all available artifacts from the development cycle:
   - **REQUIRED**: Read `spec.md` — original specification and requirements
   - **REQUIRED**: Read `tasks.md` — task breakdown and completion status
   - **IF EXISTS**: Read `plan.md` — technical plan and architecture decisions
   - **IF EXISTS**: Read review reports in FEATURE_DIR/reviews/ — code review findings
   - **IF EXISTS**: Read QA reports in FEATURE_DIR/qa/ — testing results
   - **IF EXISTS**: Read release artifacts in FEATURE_DIR/releases/ — shipping data
   - **IF EXISTS**: Read critique reports in FEATURE_DIR/critiques/ — pre-implementation review
   - **IF EXISTS**: Read previous retros in FEATURE_DIR/retros/ — historical context
   - **IF EXISTS**: Read `/memory/constitution.md` — project principles
 3. **Collect Git Metrics**:
   Gather quantitative data from the git history:
   ```bash
   # Commit count for the feature
   git rev-list --count origin/{target_branch}..HEAD
   # Files changed
   git diff --stat origin/{target_branch}..HEAD
   # Lines added/removed
   git diff --shortstat origin/{target_branch}..HEAD
   # Number of authors
   git log origin/{target_branch}..HEAD --format='%an' | sort -u | wc -l
   # Date range (first commit to last)
   git log origin/{target_branch}..HEAD --format='%ai' | tail -1
   git log origin/{target_branch}..HEAD --format='%ai' | head -1
   ```
   If git data is not available (e.g., already merged), use artifact timestamps and content analysis as fallback.
 4. **Specification Accuracy Analysis**:
   Compare the original spec against what was actually built:
   - **Requirements fulfilled**: Count of spec requirements that were fully implemented
   - **Requirements partially fulfilled**: Requirements that were implemented with deviations
   - **Requirements not implemented**: Spec items that were deferred or dropped
   - **Unplanned additions**: Features implemented that were NOT in the original spec (scope creep)
   - **Surprises**: Requirements that turned out to be much harder or easier than expected
   - **Accuracy score**: (fulfilled + partial×0.5) / total requirements × 100%
 5. **Plan Effectiveness Analysis**:
   Evaluate how well the technical plan guided implementation:
   - **Architecture decisions validated**: Did the chosen patterns/stack work as planned?
   - **Architecture decisions revised**: Were any plan decisions changed during implementation?
   - **Task scoping accuracy**: Were tasks well-sized? Any tasks that were much larger/smaller than expected?
   - **Missing tasks**: Were any tasks added during implementation that weren't in the original breakdown?
   - **Task ordering issues**: Were there dependency problems or tasks that should have been reordered?
   - **Plan score**: Qualitative assessment (EXCELLENT / GOOD / ADEQUATE / NEEDS IMPROVEMENT)
 6. **Implementation Quality Analysis**:
   Analyze the quality of the implementation based on review and QA data:
   - **Review findings summary**: Total findings by severity from review reports
   - **Blocker resolution**: Were all blockers resolved before shipping?
   - **QA results summary**: Pass/fail rates from QA testing
   - **Test coverage**: Test suite results and coverage metrics
   - **Code quality indicators**: Lines of code, test-to-code ratio, cyclomatic complexity (if available)
   - **Quality score**: Based on review verdict and QA pass rate
 7. **Process Metrics Dashboard**:
   Compile a metrics summary:
   ```
   📊 Development Cycle Metrics
   ━━━━━━━━━━━━━━━━━━━━━━━━━━
   Feature:           {feature_name}
   Duration:          {first_commit} → {last_commit}
   📝 Specification
   Requirements:      {total} total, {fulfilled} fulfilled, {partial} partial
   Spec Accuracy:     {accuracy}%
   📋 Planning
   Tasks:             {total_tasks} total, {completed} completed
   Added during impl: {unplanned_tasks}
   Plan Score:        {plan_score}
   💻 Implementation
   Commits:           {commit_count}
   Files changed:     {files_changed}
   Lines:             +{additions} / -{deletions}
   Test/Code ratio:   {test_ratio}
   🔍 Quality
   Review findings:   🔴{blockers} 🟡{warnings} 🟢{suggestions}
   QA pass rate:      {qa_pass_rate}%
   Quality Score:     {quality_score}
   ```
 8. **What Went Well** (Keep Doing):
   Identify and celebrate successes:
   - Aspects of the spec that were clear and led to smooth implementation
   - Architecture decisions that proved effective
   - Tasks that were well-scoped and completed without issues
   - Quality practices that caught real issues
   - Any particularly efficient or elegant solutions
 9. **What Could Improve** (Start/Stop Doing):
   Identify areas for improvement:
   - Spec gaps that caused confusion or rework during implementation
   - Plan decisions that needed revision
   - Tasks that were poorly scoped or had missing dependencies
   - Quality issues that slipped through review/QA
   - Process friction points (tool issues, unclear workflows)
 10. **Actionable Improvement Suggestions**:
    Generate specific, actionable suggestions:
    - Rank by impact (HIGH / MEDIUM / LOW)
    - Each suggestion should be concrete and implementable
    - Group by category: Specification, Planning, Implementation, Quality, Process
    Example format:
    ```
    IMP-001 [HIGH] Add data model validation to spec template
    → The spec lacked entity relationship details, causing 3 unplanned tasks during implementation.
    → Suggestion: Add a "Data Model" section to the spec template with entity, attribute, and relationship requirements.
    IMP-002 [MEDIUM] Include browser compatibility in QA checklist
    → QA missed a CSS rendering issue in Safari that was caught post-merge.
    → Suggestion: Add cross-browser testing scenarios to the QA test plan.
    ```
 11. **Historical Trend Analysis** (if previous retros exist):
    If FEATURE_DIR/retros/ contains previous retrospective reports:
    - Compare key metrics across cycles (spec accuracy, QA pass rate, review findings)
    - Identify improving trends (celebrate!) and declining trends (flag for attention)
    - Check if previous improvement suggestions were adopted and whether they helped
    - Output a trend summary table
 12. **Generate Retrospective Report**:
    Create the retro report at `FEATURE_DIR/retros/retro-{timestamp}.md` using the retrospective report template.
 13. **Offer Constitution Update**:
    Based on the retrospective findings, offer to update `/memory/constitution.md` with new learnings:
    - "Based on this retrospective, I suggest adding the following principles to your constitution:"
    - List specific principle additions or modifications
    - **Wait for explicit user approval** before making any changes
    - If approved, append new principles with a "Learned from: {feature_name} retro" annotation
 14. **Suggest Next Actions**:
    - If this was a successful cycle: "Great work! Consider starting your next feature with `/speckit.specify`"
    - If improvements were identified: List the top 3 most impactful improvements to adopt
    - If trends are declining: Recommend a process review or team discussion
 **Check for extension hooks (after retro)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.after_retro` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
--- a/templates/commands/review.md
+++ b/templates/commands/review.md
@@ -0,0 +1,176 @@
 ---
 description: Perform a staff-level code review of implementation changes, focused on correctness, security, performance, and spec compliance.
 scripts:
  sh: scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks
  ps: scripts/powershell/check-prerequisites.ps1 -Json -RequireTasks -IncludeTasks
 ---
 ## User Input
 ```text
 $ARGUMENTS
 ```
 You **MUST** consider the user input before proceeding (if not empty).
 ## Pre-Execution Checks
 **Check for extension hooks (before review)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.before_review` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Pre-Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Pre-Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    Wait for the result of the hook command before proceeding to the Outline.
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
 ## Goal
 Conduct a thorough, staff-engineer-level code review of all changes made during the implementation phase. This review acts as a quality gate before shipping, catching bugs, security issues, performance regressions, and deviations from the specification. The review is **read-only** — it produces a structured report but does NOT modify code.
 ## Operating Constraints
 **STRICTLY READ-ONLY**: Do **not** modify any source files. Output a structured review report to the reviews directory. If critical issues are found, recommend specific fixes but do NOT apply them.
 **Constitution Authority**: The project constitution (`/memory/constitution.md`) defines non-negotiable quality standards. Any violation is automatically a 🔴 Blocker.
 ## Outline
 1. Run `{SCRIPT}` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
 2. **Gather Review Context**:
   - **REQUIRED**: Read `spec.md` for functional requirements and acceptance criteria
   - **REQUIRED**: Read `plan.md` for architecture decisions and technical constraints
   - **REQUIRED**: Read `tasks.md` for the full task list and implementation scope
   - **IF EXISTS**: Read `/memory/constitution.md` for quality standards and principles
   - **IF EXISTS**: Read any existing review reports in FEATURE_DIR/reviews/ for context
 3. **Identify Changes to Review**:
   - Run `git diff main --stat` (or appropriate base branch) to identify all changed files
   - If no git diff is available, use the file paths referenced in `tasks.md` as the review scope
   - Group changes by category: source code, tests, configuration, documentation
   - Prioritize review order: security-sensitive files → core logic → API surfaces → tests → config → docs
 4. **Review Pass 1 — Correctness & Logic**:
   - Verify each implementation matches its corresponding requirement in `spec.md`
   - Check for off-by-one errors, null/undefined handling, boundary conditions
   - Validate error handling: are errors caught, logged, and surfaced appropriately?
   - Check for race conditions in concurrent/async code
   - Verify data validation at trust boundaries (user input, API responses, file reads)
   - Ensure state management is consistent (no orphaned state, proper cleanup)
 5. **Review Pass 2 — Security**:
   - Check for injection vulnerabilities (SQL, XSS, command injection, path traversal)
   - Verify authentication and authorization checks on all protected endpoints
   - Look for hardcoded secrets, credentials, or API keys
   - Validate input sanitization and output encoding
   - Check for insecure dependencies or known vulnerability patterns
   - Verify CORS, CSP, and other security header configurations
   - Check for sensitive data exposure in logs, error messages, or responses
 6. **Review Pass 3 — Performance & Scalability**:
   - Identify N+1 query patterns or unnecessary database calls
   - Check for unbounded loops, missing pagination, or large in-memory collections
   - Look for blocking operations in async contexts
   - Verify caching strategies are appropriate and cache invalidation is correct
   - Check for resource leaks (file handles, connections, event listeners)
   - Validate that performance-critical paths noted in `plan.md` are optimized
 7. **Review Pass 4 — Spec Compliance & Architecture**:
   - Cross-reference each functional requirement (FR-###) against implementation
   - Verify architecture decisions from `plan.md` are followed (patterns, layers, boundaries)
   - Check API contracts match specification (request/response shapes, status codes)
   - Validate that acceptance criteria from user stories are testable and tested
   - Flag any implemented functionality NOT in the spec (scope creep)
   - Flag any spec requirements NOT implemented (missing coverage)
 8. **Review Pass 5 — Test Quality**:
   - Verify test coverage for critical paths and edge cases
   - Check test assertions are meaningful (not just "doesn't throw")
   - Validate test isolation (no shared mutable state between tests)
   - Verify mock/stub usage is appropriate and doesn't hide real bugs
   - Check that acceptance criteria have corresponding test cases
   - Flag untested error paths and boundary conditions
 9. **Severity Classification**:
   Apply the following severity levels to each finding:
   - 🔴 **Blocker**: Security vulnerability, data corruption risk, crashes, constitution violation, missing core functionality. **Must fix before shipping.**
   - 🟡 **Warning**: Performance issue, incomplete error handling, missing edge case coverage, test gap, architectural deviation. **Should fix before shipping.**
   - 🟢 **Suggestion**: Code clarity improvement, refactoring opportunity, documentation gap, minor style inconsistency. **Nice to fix but non-blocking.**
 10. **Generate Review Report**:
    Create the review report at `FEATURE_DIR/reviews/review-{timestamp}.md` using the review report template. The report must include:
    - **Executive Summary**: Overall assessment (APPROVED / APPROVED WITH CONDITIONS / CHANGES REQUIRED)
    - **Findings Table**: All findings with ID, severity, file, line(s), description, recommendation
    - **Spec Coverage Matrix**: Requirements vs implementation status
    - **Test Coverage Assessment**: Coverage gaps relative to acceptance criteria
    - **Metrics**: Total findings by severity, files reviewed, spec coverage percentage
    - **Recommended Actions**: Prioritized list of fixes, grouped by severity
 11. **Provide Verdict**:
    Based on findings, provide one of:
    - ✅ **APPROVED**: No blockers, minimal warnings. Safe to proceed to `/speckit.ship`
    - ⚠️ **APPROVED WITH CONDITIONS**: No blockers, but warnings should be addressed. List conditions.
    - ❌ **CHANGES REQUIRED**: Blockers found. Must fix and re-review before shipping. List blocking items.
 ## Post-Review Actions
 Suggest next steps based on verdict:
 - If APPROVED: "Run `/speckit.ship` to prepare the release"
 - If APPROVED WITH CONDITIONS: "Address warnings, then run `/speckit.ship`"
 - If CHANGES REQUIRED: "Fix blocker issues, then run `/speckit.review` again"
 **Check for extension hooks (after review)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.after_review` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
--- a/templates/commands/ship.md
+++ b/templates/commands/ship.md
@@ -0,0 +1,248 @@
 ---
 description: Automate the release pipeline including pre-flight checks, branch sync, changelog generation, CI verification, and pull request creation.
 scripts:
  sh: scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks
  ps: scripts/powershell/check-prerequisites.ps1 -Json -RequireTasks -IncludeTasks
 ---
 ## User Input
 ```text
 $ARGUMENTS
 ```
 You **MUST** consider the user input before proceeding (if not empty).
 ## Pre-Execution Checks
 **Check for extension hooks (before ship)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.before_ship` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Pre-Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Pre-Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    Wait for the result of the hook command before proceeding to the Outline.
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
 ## Goal
 Automate the complete release engineering workflow: verify readiness, synchronize branches, generate a changelog, verify CI status, create a well-structured pull request, and archive release artifacts. This command transforms the implemented feature into a shippable, reviewable PR with full traceability back to the original specification.
 ## Operating Constraints
 **SAFE BY DEFAULT**: Every destructive operation (force push, branch delete, PR creation) requires explicit user confirmation. Default to dry-run mode for destructive git operations.
 **TRACEABILITY**: The PR description and changelog must link back to spec, plan, tasks, review, and QA artifacts for full audit trail.
 ## Outline
 1. Run `{SCRIPT}` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
 2. **Pre-Flight Readiness Checks**:
   Run a comprehensive readiness assessment before proceeding:
   **Task Completion**:
   - Read `tasks.md` and count total tasks vs. completed tasks (marked `[X]` or `[x]`)
   - If any tasks are incomplete: **STOP** and warn. Ask user to confirm proceeding or run `/speckit.implement` first.
   **Review Status** (if FEATURE_DIR/reviews/ exists):
   - Read the most recent review report
   - If verdict is ❌ CHANGES REQUIRED: **STOP** and warn. Recommend running `/speckit.review` after fixes.
   - If verdict is ⚠️ APPROVED WITH CONDITIONS: Warn but allow proceeding with confirmation.
   **QA Status** (if FEATURE_DIR/qa/ exists):
   - Read the most recent QA report
   - If verdict is ❌ QA FAILED: **STOP** and warn. Recommend running `/speckit.qa` after fixes.
   - If verdict is ⚠️ QA PASSED WITH NOTES: Warn but allow proceeding with confirmation.
   **Working Tree**:
   - Run `git status` to check for uncommitted changes
   - If uncommitted changes exist: prompt user to commit or stash before proceeding
   Display a readiness summary:
   ```
   Ship Readiness Check:
   ✅ Tasks: 12/12 complete
   ✅ Review: APPROVED
   ⚠️ QA: PASSED WITH NOTES (2 non-critical items)
   ✅ Working tree: Clean
   Overall: READY TO SHIP (with notes)
   Proceed? (yes/no)
   ```
 3. **Determine Shipping Configuration**:
   - Detect the current feature branch: `git branch --show-current`
   - Determine the target branch (default: `main`; override via user input or `.specify/config.yml`)
   - Detect remote name (default: `origin`; check `git remote -v`)
   - Check if GitHub CLI (`gh`) is available for PR creation
   - If `gh` is not available, generate the PR description as a markdown file for manual creation
 4. **Branch Synchronization**:
   - Fetch latest from remote: `git fetch origin`
   - Check if feature branch is behind target branch:
     ```bash
     git rev-list --count HEAD..origin/{target_branch}
     ```
   - If behind, offer to rebase or merge:
     - **Rebase** (recommended for clean history): `git rebase origin/{target_branch}`
     - **Merge**: `git merge origin/{target_branch}`
   - If conflicts arise: **STOP** and provide conflict resolution guidance
   - After sync, push the updated feature branch: `git push origin {feature_branch}`
 5. **Changelog Generation**:
   - Collect changelog inputs:
     - Feature summary from `spec.md` (overview section)
     - Implementation highlights from completed tasks in `tasks.md`
     - Git commit messages: `git log origin/{target_branch}..HEAD --oneline`
   - Generate a structured changelog entry:
     ```markdown
     ## [Feature Name] - {date}
     ### Added
     - [List of new features/capabilities from spec]
     ### Changed
     - [List of modifications to existing behavior]
     ### Fixed
     - [List of bug fixes discovered during implementation]
     ### Technical Notes
     - [Key architecture decisions from plan.md]
     ```
   - If a CHANGELOG.md exists at repo root: prepend the new entry (ask for confirmation)
   - If no CHANGELOG.md exists: create one with the entry
 6. **CI Verification**:
   - If GitHub CLI (`gh`) is available:
     - Check CI status: `gh run list --branch {feature_branch} --limit 5`
     - If CI is running: wait and report status
     - If CI failed: **STOP** and display failure details. Recommend fixing before shipping.
     - If CI passed: record the passing run ID for the PR
   - If `gh` is not available:
     - Remind the user to verify CI status manually before merging
     - Check for local test commands and run them: `npm test`, `pytest`, `go test ./...`, etc.
 7. **Generate PR Description**:
   Compose a comprehensive PR description from `.specify/` artifacts:
   ```markdown
   ## Summary
   [One-paragraph summary from spec.md overview]
   ## Specification
   [Link to or summary of key requirements from spec.md]
   ## Implementation
   [Key implementation decisions from plan.md]
   [Summary of completed tasks from tasks.md]
   ## Testing
   [QA results summary if qa/ reports exist]
   [Test coverage information]
   ## Review Notes
   [Summary of review findings if reviews/ reports exist]
   [Any conditions or known limitations]
   ## Checklist
   - [ ] All tasks completed
   - [ ] Code review passed
   - [ ] QA testing passed
   - [ ] CI pipeline green
   - [ ] Changelog updated
   - [ ] Documentation updated (if applicable)
   ---
   *Generated by `/speckit.ship` from spec-driven development artifacts.*
   ```
 8. **Create Pull Request**:
   - If GitHub CLI (`gh`) is available:
     ```bash
     gh pr create --base {target_branch} --head {feature_branch} --title "{PR title}" --body-file {pr_description_file}
     ```
   - If `gh` is not available:
     - Save the PR description to `FEATURE_DIR/releases/pr-description-{timestamp}.md`
     - Provide instructions for manual PR creation
     - Output the PR title and description for copy-paste
   **PR Title Format**: `feat: {feature_name} — {one-line summary from spec}`
 9. **Archive Release Artifacts**:
   - Create `FEATURE_DIR/releases/release-{timestamp}.md` with:
     - PR URL (if created via `gh`)
     - Changelog entry
     - Readiness check results
     - Links to review and QA reports
     - Git commit range: `{base_sha}..{head_sha}`
 10. **Post-Ship Summary**:
    Display a completion summary:
    ```
    🚀 Ship Complete!
    PR: #{pr_number} - {pr_title} ({pr_url})
    Branch: {feature_branch} → {target_branch}
    Commits: {commit_count}
    Files changed: {files_changed}
    Changelog: Updated
    Next steps:
    - Review the PR at {pr_url}
    - After merge, run `/speckit.retro` for a retrospective
    ```
 **Check for extension hooks (after ship)**:
 - Check if `.specify/extensions.yml` exists in the project root.
 - If it exists, read it and look for entries under the `hooks.after_ship` key
 - If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
 - Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default.
 - For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions:
  - If the hook has no `condition` field, or it is null/empty, treat the hook as executable
  - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation
 - For each executable hook, output the following based on its `optional` flag:
  - **Optional hook** (`optional: true`):
    ```
    ## Extension Hooks
    **Optional Hook**: {extension}
    Command: `/{command}`
    Description: {description}
    Prompt: {prompt}
    To execute: `/{command}`
    ```
  - **Mandatory hook** (`optional: false`):
    ```
    ## Extension Hooks
    **Automatic Hook**: {extension}
    Executing: `/{command}`
    EXECUTE_COMMAND: {command}
    ```
 - If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently
--- a/templates/critique-template.md
+++ b/templates/critique-template.md
@@ -0,0 +1,127 @@
 # Critique Report: [FEATURE NAME]
 **Date**: [DATE]
 **Feature**: [Link to spec.md]
 **Plan**: [Link to plan.md]
 **Verdict**: [✅ PROCEED / ⚠️ PROCEED WITH UPDATES / 🛑 RETHINK]
 ---
 ## Executive Summary
 [One-paragraph assessment of the spec and plan quality, key strengths, and primary concerns from both product and engineering perspectives.]
 ---
 ## Product Lens Findings 🎯
 ### Problem Validation
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | P1 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### User Value Assessment
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | P2 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Alternative Approaches
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | P3 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Edge Cases & UX
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | P4 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Success Measurement
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | P5 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ---
 ## Engineering Lens Findings 🔧
 ### Architecture Soundness
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E1 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Failure Mode Analysis
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E2 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Security & Privacy
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E3 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Performance & Scalability
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E4 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Testing Strategy
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E5 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Operational Readiness
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E6 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ### Dependencies & Integration
 | ID | Severity | Finding | Suggestion |
 |----|----------|---------|------------|
 | E7 | 🎯/💡/🤔 | [finding] | [suggestion] |
 ---
 ## Cross-Lens Insights 🔗
 Items where both product and engineering perspectives converge:
 | ID | Finding | Product Impact | Engineering Impact | Suggestion |
 |----|---------|---------------|-------------------|------------|
 | X1 | [finding] | [product concern] | [engineering concern] | [unified suggestion] |
 ---
 ## Findings Summary
 | Metric | Count |
 |--------|-------|
 | 🎯 Must-Address | [count] |
 | 💡 Recommendations | [count] |
 | 🤔 Questions | [count] |
 | Product findings | [count] |
 | Engineering findings | [count] |
 | Cross-lens findings | [count] |
 ---
 ## Recommended Actions
 ### 🎯 Must-Address (Before Proceeding)
 1. **[ID]**: [Action with specific file and change description]
 ### 💡 Recommendations (Strongly Suggested)
 1. **[ID]**: [Action]
 ### 🤔 Questions (Need Stakeholder Input)
 1. **[ID]**: [Question requiring clarification]
 ---
 **Severity Legend**:
 - 🎯 **Must-Address**: Blocks proceeding to implementation
 - 💡 **Recommendation**: Strongly suggested improvement
 - 🤔 **Question**: Needs stakeholder input to resolve
 ---
 *Generated by `/speckit.critique` — Dual-lens strategic and technical review for spec-driven development.*
--- a/templates/qa-template.md
+++ b/templates/qa-template.md
@@ -0,0 +1,91 @@
 # QA Report: [FEATURE NAME]
 **QA Mode**: [Browser QA / CLI QA / Hybrid]
 **Date**: [DATE]
 **Feature**: [Link to spec.md]
 **Environment**: [OS, browser, runtime versions, application URL]
 **Verdict**: [✅ ALL PASSED / ⚠️ PARTIAL PASS / ❌ FAILURES FOUND]
 ---
 ## QA Summary
 [One-paragraph overview of testing scope, approach, and overall results.]
 ---
 ## Test Results
 | ID | User Story | Scenario | Mode | Result | Evidence |
 |----|-----------|----------|------|--------|----------|
 | TC-001 | [Story] | [Scenario description] | Browser/CLI | ✅/❌/⚠️/⏭️ | [link to screenshot or output] |
 | TC-002 | [Story] | [Scenario description] | Browser/CLI | ✅/❌/⚠️/⏭️ | [link to screenshot or output] |
 **Legend**: ✅ Pass | ❌ Fail | ⚠️ Partial | ⏭️ Skipped
 ---
 ## Acceptance Criteria Coverage
 | Criterion | Test ID(s) | Status | Notes |
 |-----------|-----------|--------|-------|
 | [AC from spec.md] | TC-001, TC-003 | ✅ Met | |
 | [AC from spec.md] | TC-002 | ❌ Not Met | [what failed] |
 | [AC from spec.md] | — | ⏭️ Not Tested | [reason] |
 **Coverage**: [X]/[Y] acceptance criteria validated ([Z]%)
 ---
 ## Test Suite Results
 | Test Suite | Total | Passed | Failed | Skipped | Coverage |
 |-----------|-------|--------|--------|---------|----------|
 | [suite name] | [n] | [n] | [n] | [n] | [%] |
 ---
 ## Failure Details
 ### TC-[ID]: [Scenario Name]
 **Status**: ❌ FAIL
 **Steps to Reproduce**:
 1. [Step 1]
 2. [Step 2]
 3. [Step 3]
 **Expected**: [Expected outcome from spec]
 **Actual**: [What actually happened]
 **Evidence**: [Screenshot path or output capture]
 **Severity**: [Critical / High / Medium / Low]
 ---
 ## Environment Info
 | Property | Value |
 |----------|-------|
 | Operating System | [OS version] |
 | Browser | [Browser and version, if applicable] |
 | Runtime | [Node.js/Python/etc. version] |
 | Application URL | [URL, if applicable] |
 | Test Runner | [Tool used] |
 ---
 ## Metrics Summary
 | Metric | Value |
 |--------|-------|
 | Total scenarios | [count] |
 | ✅ Passed | [count] |
 | ❌ Failed | [count] |
 | ⚠️ Partial | [count] |
 | ⏭️ Skipped | [count] |
 | Pass rate | [%] |
 | Acceptance criteria coverage | [%] |
 ---
 *Generated by `/speckit.qa` — Systematic QA testing for spec-driven development.*
--- a/templates/retro-template.md
+++ b/templates/retro-template.md
@@ -0,0 +1,115 @@
 # Retrospective: [FEATURE NAME]
 **Date**: [DATE]
 **Feature**: [Link to spec.md]
 **Cycle**: [first_commit_date] → [last_commit_date]
 **Overall Assessment**: [🌟 Excellent / ✅ Good / ⚠️ Adequate / 🔧 Needs Improvement]
 ---
 ## Metrics Dashboard
 ```
 📊 Development Cycle Metrics
 ━━━━━━━━━━━━━━━━━━━━━━━━━━
 📝 Specification
 Requirements:      [total] total, [fulfilled] fulfilled, [partial] partial
 Spec Accuracy:     [accuracy]%
 📋 Planning
 Tasks:             [total] total, [completed] completed
 Added during impl: [unplanned_count]
 Plan Score:        [EXCELLENT / GOOD / ADEQUATE / NEEDS IMPROVEMENT]
 💻 Implementation
 Commits:           [count]
 Files changed:     [count]
 Lines:             +[additions] / -[deletions]
 Test/Code ratio:   [ratio]
 🔍 Quality
 Review findings:   🔴[blockers] 🟡[warnings] 🟢[suggestions]
 QA pass rate:      [qa_pass_rate]%
 Quality Score:     [score]
 ```
 ---
 ## Specification Accuracy
 | Requirement | Status | Notes |
 |-------------|--------|-------|
 | [FR-001] | ✅ Fulfilled | |
 | [FR-002] | ⚠️ Partial | [deviation notes] |
 | [FR-003] | ❌ Not Implemented | [reason] |
 | [Unplanned] | ➕ Added | [why it was needed] |
 **Accuracy Score**: [X]% ([fulfilled + partial×0.5] / [total] requirements)
 ---
 ## Plan Effectiveness
 | Aspect | Assessment | Details |
 |--------|-----------|---------|
 | Architecture decisions | ✅/⚠️/❌ | [which decisions worked/didn't] |
 | Task scoping | ✅/⚠️/❌ | [well-sized / too large / too small] |
 | Dependency ordering | ✅/⚠️/❌ | [any ordering issues] |
 | Missing tasks | [count] added | [what was missed] |
 **Plan Score**: [EXCELLENT / GOOD / ADEQUATE / NEEDS IMPROVEMENT]
 ---
 ## What Went Well 🎉
 1. **[Success area]**: [Description of what worked and why]
 2. **[Success area]**: [Description]
 3. **[Success area]**: [Description]
 ---
 ## What Could Improve 🔧
 1. **[Improvement area]**: [Description of the issue and its impact]
 2. **[Improvement area]**: [Description]
 3. **[Improvement area]**: [Description]
 ---
 ## Improvement Suggestions
 | ID | Impact | Category | Suggestion | Rationale |
 |----|--------|----------|------------|-----------|
 | IMP-001 | HIGH | [cat] | [specific action] | [why this matters] |
 | IMP-002 | MEDIUM | [cat] | [specific action] | [why this matters] |
 | IMP-003 | LOW | [cat] | [specific action] | [why this matters] |
 **Categories**: Specification, Planning, Implementation, Quality, Process
 ---
 ## Historical Trends
 <!-- Only populated if previous retros exist -->
 | Metric | Previous | Current | Trend |
 |--------|----------|---------|-------|
 | Spec accuracy | [%] | [%] | 📈/📉/➡️ |
 | QA pass rate | [%] | [%] | 📈/📉/➡️ |
 | Review blockers | [n] | [n] | 📈/📉/➡️ |
 | Unplanned tasks | [n] | [n] | 📈/📉/➡️ |
 ---
 ## Suggested Constitution Updates
 Based on this retrospective, consider adding these principles:
 1. **[Principle name]**: [Principle description]
   _Learned from: [specific experience in this cycle]_
 ---
 *Generated by `/speckit.retro` — Sprint retrospective for spec-driven development.*
--- a/templates/review-template.md
+++ b/templates/review-template.md
@@ -0,0 +1,75 @@
 # Code Review Report: [FEATURE NAME]
 **Reviewer**: AI Agent (Staff Engineer Perspective)
 **Date**: [DATE]
 **Feature**: [Link to spec.md]
 **Branch**: [Feature branch name]
 **Verdict**: [✅ APPROVED / ⚠️ APPROVED WITH CONDITIONS / ❌ CHANGES REQUIRED]
 ---
 ## Executive Summary
 [One-paragraph overall assessment of the implementation quality, key strengths, and primary concerns.]
 ---
 ## Review Findings
 | ID | Severity | File | Line(s) | Category | Finding | Recommendation |
 |----|----------|------|---------|----------|---------|----------------|
 | R001 | 🔴 Blocker | [file] | [lines] | [category] | [description] | [fix suggestion] |
 | R002 | 🟡 Warning | [file] | [lines] | [category] | [description] | [fix suggestion] |
 | R003 | 🟢 Suggestion | [file] | [lines] | [category] | [description] | [fix suggestion] |
 **Categories**: Correctness, Security, Performance, Spec Compliance, Error Handling, Test Quality, Architecture
 ---
 ## Spec Coverage Matrix
 | Requirement | Status | Implementation Notes |
 |-------------|--------|---------------------|
 | FR-001: [requirement] | ✅ Implemented | [notes] |
 | FR-002: [requirement] | ⚠️ Partial | [what's missing] |
 | FR-003: [requirement] | ❌ Missing | [recommendation] |
 **Coverage**: [X]/[Y] requirements implemented ([Z]%)
 ---
 ## Test Coverage Assessment
 | Area | Tests Exist? | Coverage | Gaps |
 |------|-------------|----------|------|
 | [Module/Feature] | ✅/❌ | [%] | [untested paths] |
 ---
 ## Metrics Summary
 | Metric | Value |
 |--------|-------|
 | Files reviewed | [count] |
 | 🔴 Blockers | [count] |
 | 🟡 Warnings | [count] |
 | 🟢 Suggestions | [count] |
 | Spec coverage | [%] |
 | Test coverage | [%] |
 ---
 ## Recommended Actions
 ### Must Fix (Blockers)
 1. [Action item with specific file and fix description]
 ### Should Fix (Warnings)
 1. [Action item]
 ### Nice to Fix (Suggestions)
 1. [Action item]
 ---
 *Generated by `/speckit.review` — Staff-level code review for spec-driven development.*
--- a/templates/ship-template.md
+++ b/templates/ship-template.md
@@ -0,0 +1,97 @@
 # Release: [FEATURE NAME]
 **Date**: [DATE]
 **Branch**: [feature_branch] → [target_branch]
 **PR**: [#number — title](URL)
 **Status**: [🚀 Shipped / ⏳ Pending Review / ❌ Blocked]
 ---
 ## Summary
 [One-paragraph summary of the shipped feature, derived from spec.md overview.]
 ---
 ## Changelog Entry
 ### Added
 - [New feature or capability from spec]
 ### Changed
 - [Modification to existing behavior]
 ### Fixed
 - [Bug fix discovered during implementation]
 ### Technical Notes
 - [Key architecture decisions from plan.md]
 ---
 ## Readiness Check Results
 | Check | Status | Details |
 |-------|--------|---------|
 | Tasks complete | ✅/❌ | [X]/[Y] tasks completed |
 | Code review | ✅/⚠️/❌ | [Review verdict] |
 | QA testing | ✅/⚠️/❌ | [QA verdict] |
 | CI pipeline | ✅/❌ | [CI run ID or status] |
 | Working tree | ✅/❌ | [Clean/dirty] |
 ---
 ## PR Description
 ### Summary
 [Feature summary from spec.md]
 ### Specification
 [Key requirements summary]
 ### Implementation
 [Architecture decisions and completed tasks summary]
 ### Testing
 [QA and test coverage summary]
 ### Review Notes
 [Review findings summary and conditions]
 ### Checklist
 - [ ] All tasks completed
 - [ ] Code review passed
 - [ ] QA testing passed
 - [ ] CI pipeline green
 - [ ] Changelog updated
 - [ ] Documentation updated (if applicable)
 ---
 ## Artifacts
 | Artifact | Path |
 |----------|------|
 | Specification | [FEATURE_DIR/spec.md] |
 | Plan | [FEATURE_DIR/plan.md] |
 | Tasks | [FEATURE_DIR/tasks.md] |
 | Review | [FEATURE_DIR/reviews/review-{timestamp}.md] |
 | QA Report | [FEATURE_DIR/qa/qa-{timestamp}.md] |
 | This Release | [FEATURE_DIR/releases/release-{timestamp}.md] |
 ---
 ## Git Info
 | Property | Value |
 |----------|-------|
 | Feature branch | [branch name] |
 | Target branch | [main/develop] |
 | Commit range | [base_sha..head_sha] |
 | Commits | [count] |
 | Files changed | [count] |
 | Lines | +[additions] / -[deletions] |
 ---
 *Generated by `/speckit.ship` — Release automation for spec-driven development.*