From 8e23fee0944ca0820b4499077a11ff58be89edc8 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Sun, 25 Jan 2026 08:01:30 +0100 Subject: [PATCH 01/10] fix: Prevent mock data implementations with infrastructure features Problem: The coding agent can implement in-memory storage (e.g., `dev-store.ts` with `globalThis`) instead of a real database. These implementations pass all tests because data persists during runtime, but data is lost on server restart. This is a root cause for #68 - agent "passes" features that don't actually work. Solution: 1. Add 5 mandatory Infrastructure Features (indices 0-4) that run first: - Feature 0: Database connection established - Feature 1: Database schema applied correctly - Feature 2: Data persists across server restart (CRITICAL) - Feature 3: No mock data patterns in codebase - Feature 4: Backend API queries real database 2. Add STEP 5.7: Server Restart Persistence Test to coding prompt: - Create test data, stop server, restart, verify data still exists 3. Extend grep patterns for mock detection in STEP 5.6: - globalThis., devStore, dev-store, mockData, fakeData - TODO.*real, STUB, MOCK, new Map() as data stores Changes: - .claude/templates/initializer_prompt.template.md - Infrastructure features - .claude/templates/coding_prompt.template.md - STEP 5.6/5.7 enhancements - .claude/commands/create-spec.md - Phase 3b database question Backwards Compatible: - Works with YOLO mode (uses bash/grep, not browser automation) - Stateless apps can skip database features via create-spec question Co-Authored-By: Claude Opus 4.5 --- .claude/commands/create-spec.md | 43 +++++- .claude/templates/coding_prompt.template.md | 73 ++++++++- .../templates/initializer_prompt.template.md | 139 +++++++++++++++--- 3 files changed, 223 insertions(+), 32 deletions(-) diff --git a/.claude/commands/create-spec.md b/.claude/commands/create-spec.md index f8cae28..f8a1b96 100644 --- a/.claude/commands/create-spec.md +++ b/.claude/commands/create-spec.md @@ -95,6 +95,27 @@ Ask the user about their involvement preference: **For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc. +### Phase 3b: Database Requirements (MANDATORY) + +**Always ask this question regardless of mode:** + +> "One foundational question about data storage: +> +> **Does this application need to store user data persistently?** +> +> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps) +> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites) +> 3. **Not sure** - Let me describe what I need and you decide" + +**Branching logic:** + +- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence. + +- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly: + ```xml + none - stateless application + ``` + ## Phase 4: Features (THE MAIN PHASE) This is where you spend most of your time. Ask questions in plain language that anyone can answer. @@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f **Typical ranges for reference:** -- **Simple apps** (todo list, calculator, notes): ~20-50 features -- **Medium apps** (blog, task manager with auth): ~100 features -- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features +- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure) +- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure) +- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure) These are just reference points - your actual count should come from the requirements discussed. +**MANDATORY: Infrastructure Features** + +If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4): +1. Database connection established +2. Database schema applied correctly +3. Data persists across server restart +4. No mock data patterns in codebase +5. Backend API queries real database + +These features ensure the coding agent implements a real database, not mock data or in-memory storage. + **How to count features:** For each feature area discussed, estimate the number of discrete, testable behaviors: @@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav > "Based on what we discussed, here's my feature breakdown: > +> - **Infrastructure (required)**: 5 features (database setup, persistence verification) > - [Category 1]: ~X features > - [Category 2]: ~Y features > - [Category 3]: ~Z features > - ... > -> **Total: ~N features** +> **Total: ~N features** (including 5 infrastructure) > > Does this seem right, or should I adjust?" Let the user confirm or adjust. This becomes your `feature_count` for the spec. +**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these. + ## Phase 5: Technical Details (DERIVED OR DISCUSSED) **For Quick Mode users:** diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index bce9a14..5d3ecb2 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -156,6 +156,9 @@ Use browser automation tools: - [ ] Deleted the test data - verified it's gone everywhere - [ ] NO unexplained data appeared (would indicate mock data) - [ ] Dashboard/counts reflect real numbers after my changes +- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)** +- [ ] **Verified no globalThis, devStore, or dev-store patterns** +- [ ] **Server restart test passed (STEP 5.7) - data persists across restart** #### Navigation Verification @@ -174,10 +177,72 @@ Use browser automation tools: ### STEP 5.6: MOCK DATA DETECTION (Before marking passing) -1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"` -2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone -3. **Check database:** All displayed data must come from real DB queries -4. If unexplained data appears, it's mock data - fix before marking passing. +**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:** + +```bash +# 1. In-memory storage patterns (CRITICAL - catches dev-store) +grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ +grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/ + +# 2. Mock data variables +grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" src/ + +# 3. TODO/incomplete markers +grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/ + +# 4. Development-only conditionals +grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/ + +# 5. In-memory collections as data stores (check lib/store/data directories) +grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null +``` + +**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing. + +**Runtime verification:** +1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone +2. Check database directly - all displayed data must come from real DB queries +3. If unexplained data appears, it's mock data - fix before marking passing. + +### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features) + +**When required:** Any feature involving CRUD operations or data persistence. + +**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.** + +**Steps:** + +1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345") +2. Verify data appears in UI and API response + +3. **STOP the server completely:** + ```bash + pkill -f "node" || pkill -f "npm" || pkill -f "next" + sleep 5 + # Verify server is stopped + pgrep -f "node" && echo "ERROR: Server still running!" && exit 1 + ``` + +4. **RESTART the server:** + ```bash + ./init.sh & + sleep 15 # Allow server to fully start + ``` + +5. **Query for test data - it MUST still exist** + - Via UI: Navigate to data location, verify data appears + - Via API: `curl http://localhost:PORT/api/items` - verify data in response + +6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL + - Search for: `grep -r "globalThis\|devStore\|dev-store" src/` + - You MUST fix the mock data implementation before proceeding + - Replace in-memory storage with real database queries + +7. **Clean up test data** after successful verification + +**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through. + +**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation. ### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!) diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index c6ee081..4594169 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea - Feature count must match the `feature_count` specified in app_spec.txt - Reference tiers for other projects: - - **Simple apps**: ~150 tests - - **Medium apps**: ~250 tests - - **Complex apps**: ~400+ tests + - **Simple apps**: ~155 tests (includes 5 infrastructure) + - **Medium apps**: ~255 tests (includes 5 infrastructure) + - **Complex apps**: ~405+ tests (includes 5 infrastructure) - Both "functional" and "style" categories - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps) - At least 25 tests MUST have 10+ steps each (more for complex apps) @@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi 2. **Can only depend on EARLIER features** (index must be less than current position) 3. **No circular dependencies** allowed 4. **Maximum 20 dependencies** per feature -5. **Foundation features (index 0-9)** should have NO dependencies -6. **60% of features after index 10** should have at least one dependency +5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST +6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure) +7. **60% of features after index 10** should have additional dependencies beyond infrastructure ### Dependency Types @@ -82,30 +83,107 @@ Create WIDE dependency graphs, not linear chains: ```json [ - // FOUNDATION TIER (indices 0-2, no dependencies) - run first - { "name": "App loads without errors", "category": "functional" }, - { "name": "Navigation bar displays", "category": "style" }, - { "name": "Homepage renders correctly", "category": "functional" }, + // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first + { "name": "Database connection established", "category": "functional" }, + { "name": "Database schema applied correctly", "category": "functional" }, + { "name": "Data persists across server restart", "category": "functional" }, + { "name": "No mock data patterns in codebase", "category": "functional" }, + { "name": "Backend API queries real database", "category": "functional" }, - // AUTH TIER (indices 3-5, depend on foundation) - run in parallel - { "name": "User can register", "depends_on_indices": [0] }, - { "name": "User can login", "depends_on_indices": [0, 3] }, - { "name": "User can logout", "depends_on_indices": [4] }, + // FOUNDATION TIER (indices 5-7, depend on infrastructure) + { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] }, + { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] }, + { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] }, - // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login - // All 4 start as soon as login passes! - { "name": "User can create todo", "depends_on_indices": [4] }, - { "name": "User can view todos", "depends_on_indices": [4] }, - { "name": "User can edit todo", "depends_on_indices": [4, 6] }, - { "name": "User can delete todo", "depends_on_indices": [4, 6] }, + // AUTH TIER (indices 8-10, depend on foundation + infrastructure) + { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] }, + { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] }, + { "name": "User can logout", "depends_on_indices": [9] }, - // ADVANCED TIER (indices 10-11) - both depend on view, not each other - { "name": "User can filter todos", "depends_on_indices": [7] }, - { "name": "User can search todos", "depends_on_indices": [7] } + // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login + { "name": "User can create todo", "depends_on_indices": [9] }, + { "name": "User can view todos", "depends_on_indices": [9] }, + { "name": "User can edit todo", "depends_on_indices": [9, 11] }, + { "name": "User can delete todo", "depends_on_indices": [9, 11] }, + + // ADVANCED TIER (indices 15-16) - both depend on view, not each other + { "name": "User can filter todos", "depends_on_indices": [12] }, + { "name": "User can search todos", "depends_on_indices": [12] } ] ``` -**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles. +**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first. + +--- + +## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4) + +**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage. + +| Index | Name | Test Steps | +|-------|------|------------| +| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status | +| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec | +| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists | +| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty | +| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls | + +**ALL other features MUST depend on indices [0, 1, 2, 3, 4].** + +### Infrastructure Feature Descriptions + +**Feature 0 - Database connection established:** +``` +Steps: +1. Start the development server +2. Check server logs for database connection message +3. Call health endpoint (e.g., GET /api/health) +4. Verify response includes database status: connected +``` + +**Feature 1 - Database schema applied correctly:** +``` +Steps: +1. Connect to database directly (sqlite3, psql, etc.) +2. List all tables in the database +3. Verify tables match what's defined in app_spec.txt +4. Verify key columns exist on each table +``` + +**Feature 2 - Data persists across server restart (CRITICAL):** +``` +Steps: +1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345") +2. Verify data appears in API response (GET /api/items) +3. STOP the server completely: pkill -f "node" && sleep 5 +4. Verify server is stopped: pgrep -f "node" returns nothing +5. RESTART the server: ./init.sh & sleep 15 +6. Query API again: GET /api/items +7. Verify "RESTART_TEST_12345" still exists +8. If data is GONE → CRITICAL FAILURE (in-memory storage detected) +9. Clean up test data +``` + +**Feature 3 - No mock data patterns in codebase:** +``` +Steps: +1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" src/ +2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" src/ +3. Run: grep -r "mockData\|fakeData\|sampleData\|dummyData" --include="*.ts" src/ +4. Run: grep -r "new Map()\|new Set()" --include="*.ts" src/lib/ src/store/ src/data/ +5. ALL grep commands must return empty (exit code 1) +6. If any returns results → investigate and fix before passing +``` + +**Feature 4 - Backend API queries real database:** +``` +Steps: +1. Start server with verbose logging +2. Make API call (e.g., GET /api/items) +3. Check server logs +4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log +5. If no DB queries in logs → implementation is using mock data +``` --- @@ -117,6 +195,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou | Category | Simple | Medium | Complex | | -------------------------------- | ------- | ------- | -------- | +| **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 | | A. Security & Access Control | 5 | 20 | 40 | | B. Navigation Integrity | 15 | 25 | 40 | | C. Real Data Verification | 20 | 30 | 50 | @@ -137,12 +216,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou | R. Concurrency & Race Conditions | 5 | 8 | 15 | | S. Export/Import | 5 | 6 | 10 | | T. Performance | 5 | 5 | 10 | -| **TOTAL** | **150** | **250** | **400+** | +| **TOTAL** | **155** | **255** | **405+** | --- ### Category Descriptions +**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4). + **A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users. **B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects. @@ -205,6 +286,16 @@ The feature_list.json must include tests that **actively verify real data** and - `setTimeout` simulating API delays with static data - Static returns instead of database queries +**Additional prohibited patterns (in-memory stores):** + +- `globalThis.` (in-memory storage pattern) +- `dev-store`, `devStore`, `DevStore` (development stores) +- `json-server`, `mirage`, `msw` (mock backends) +- `Map()` or `Set()` used as primary data store +- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing + +**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart. + --- **CRITICAL INSTRUCTION:** From dae16c3cca64dc2edd234eff61c23ed65cc1d671 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Sun, 25 Jan 2026 11:43:54 +0100 Subject: [PATCH 02/10] fix: Address CodeRabbit review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix math error in category totals (155→165, 255→265) - Fix example JSON to include [0,1,2,3,4] dependencies for all features - Add more robust server shutdown (SIGTERM then SIGKILL) - Add health check after server restart - Align grep patterns between templates (add .js, testData, TODO/STUB/MOCK) - Add package.json check for mock backend libraries - Reference STEP 5.6 instead of duplicating grep commands Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 16 ++++++--- .../templates/initializer_prompt.template.md | 35 ++++++++++--------- 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 5d3ecb2..45af425 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -217,16 +217,24 @@ grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/s 3. **STOP the server completely:** ```bash + # Send SIGTERM first, then SIGKILL if needed pkill -f "node" || pkill -f "npm" || pkill -f "next" - sleep 5 + sleep 3 + pkill -9 -f "node" 2>/dev/null || true + sleep 2 # Verify server is stopped - pgrep -f "node" && echo "ERROR: Server still running!" && exit 1 + if pgrep -f "node" > /dev/null; then + echo "ERROR: Server still running!" + exit 1 + fi ``` 4. **RESTART the server:** ```bash ./init.sh & sleep 15 # Allow server to fully start + # Verify server is responding + curl -f http://localhost:3000/api/health || curl -f http://localhost:3000 || echo "WARNING: Health check failed" ``` 5. **Query for test data - it MUST still exist** @@ -234,8 +242,8 @@ grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/s - Via API: `curl http://localhost:PORT/api/items` - verify data in response 6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL - - Search for: `grep -r "globalThis\|devStore\|dev-store" src/` - - You MUST fix the mock data implementation before proceeding + - Run all grep commands from STEP 5.6 to identify the mock pattern + - You MUST fix the in-memory storage implementation before proceeding - Replace in-memory storage with real database queries 7. **Clean up test data** after successful verification diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index 4594169..44140b6 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -36,8 +36,8 @@ Use the feature_create_bulk tool to add all features at once. You can create fea - Feature count must match the `feature_count` specified in app_spec.txt - Reference tiers for other projects: - - **Simple apps**: ~155 tests (includes 5 infrastructure) - - **Medium apps**: ~255 tests (includes 5 infrastructure) + - **Simple apps**: ~165 tests (includes 5 infrastructure) + - **Medium apps**: ~265 tests (includes 5 infrastructure) - **Complex apps**: ~405+ tests (includes 5 infrastructure) - Both "functional" and "style" categories - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps) @@ -98,17 +98,17 @@ Create WIDE dependency graphs, not linear chains: // AUTH TIER (indices 8-10, depend on foundation + infrastructure) { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] }, { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] }, - { "name": "User can logout", "depends_on_indices": [9] }, + { "name": "User can logout", "depends_on_indices": [0, 1, 2, 3, 4, 9] }, // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login - { "name": "User can create todo", "depends_on_indices": [9] }, - { "name": "User can view todos", "depends_on_indices": [9] }, - { "name": "User can edit todo", "depends_on_indices": [9, 11] }, - { "name": "User can delete todo", "depends_on_indices": [9, 11] }, + { "name": "User can create todo", "depends_on_indices": [0, 1, 2, 3, 4, 9] }, + { "name": "User can view todos", "depends_on_indices": [0, 1, 2, 3, 4, 9] }, + { "name": "User can edit todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] }, + { "name": "User can delete todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] }, // ADVANCED TIER (indices 15-16) - both depend on view, not each other - { "name": "User can filter todos", "depends_on_indices": [12] }, - { "name": "User can search todos", "depends_on_indices": [12] } + { "name": "User can filter todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] }, + { "name": "User can search todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] } ] ``` @@ -167,12 +167,15 @@ Steps: **Feature 3 - No mock data patterns in codebase:** ``` Steps: -1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" src/ -2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" src/ -3. Run: grep -r "mockData\|fakeData\|sampleData\|dummyData" --include="*.ts" src/ -4. Run: grep -r "new Map()\|new Set()" --include="*.ts" src/lib/ src/store/ src/data/ -5. ALL grep commands must return empty (exit code 1) -6. If any returns results → investigate and fix before passing +1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ +2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/ +3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" src/ +4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/ +5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/ +6. Run: grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null +7. Run: grep -E "json-server|miragejs|msw" package.json +8. ALL grep commands must return empty (exit code 1) +9. If any returns results → investigate and fix before passing ``` **Feature 4 - Backend API queries real database:** @@ -216,7 +219,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou | R. Concurrency & Race Conditions | 5 | 8 | 15 | | S. Export/Import | 5 | 6 | 10 | | T. Performance | 5 | 5 | 10 | -| **TOTAL** | **155** | **255** | **405+** | +| **TOTAL** | **165** | **265** | **405+** | --- From e7564865154ba9f301fdb68fd5c67ea9606860c1 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Sun, 25 Jan 2026 11:50:35 +0100 Subject: [PATCH 03/10] fix: Address remaining CodeRabbit feedback - Escape parentheses in grep patterns: new Map\(\) and new Set\(\) - Add --include="*.js" to all grep commands for complete coverage Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 10 +++++----- .claude/templates/initializer_prompt.template.md | 10 +++++----- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 45af425..4c83ea1 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -182,19 +182,19 @@ Use browser automation tools: ```bash # 1. In-memory storage patterns (CRITICAL - catches dev-store) grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ -grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/ +grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/ # 2. Mock data variables -grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" src/ +grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" src/ # 3. TODO/incomplete markers -grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/ +grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/ # 4. Development-only conditionals -grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/ +grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/ # 5. In-memory collections as data stores (check lib/store/data directories) -grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null +grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/lib/ src/store/ src/data/ 2>/dev/null ``` **Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing. diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index 44140b6..7d8db3e 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -168,11 +168,11 @@ Steps: ``` Steps: 1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ -2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/ -3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" src/ -4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/ -5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/ -6. Run: grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null +2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/ +3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/ +4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/ +5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/ +6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/lib/ src/store/ src/data/ 2>/dev/null 7. Run: grep -E "json-server|miragejs|msw" package.json 8. ALL grep commands must return empty (exit code 1) 9. If any returns results → investigate and fix before passing From 95b0dfac83e7440d2ac35da6dd32902047ebecb9 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Sun, 25 Jan 2026 20:11:06 +0100 Subject: [PATCH 04/10] fix: Health check now fails script on server startup failure Changed from warning-only to proper error handling: - if server doesn't respond after restart, exit with error - prevents false negatives when server fails to start Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 4c83ea1..e7f1c2f 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -234,7 +234,10 @@ grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include= ./init.sh & sleep 15 # Allow server to fully start # Verify server is responding - curl -f http://localhost:3000/api/health || curl -f http://localhost:3000 || echo "WARNING: Health check failed" + if ! curl -f http://localhost:3000/api/health && ! curl -f http://localhost:3000; then + echo "ERROR: Server failed to start after restart" + exit 1 + fi ``` 5. **Query for test data - it MUST still exist** From cd9f5b76cfbb2fcc1f7d3eee0310fef617b6e760 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Mon, 26 Jan 2026 20:49:35 +0100 Subject: [PATCH 05/10] fix: Address Leon's review - safer process killing and cross-platform support Changes: - Replace pkill -f "node" with port-based killing (lsof -ti :PORT) - Safer: only kills dev server, not VS Code/Claude Code/other Node apps - More specific: targets exact port instead of all Node processes - Add Windows alternative commands (commented, for reference) - Use ${PORT:-3000} variable instead of hardcoded port 3000 - Update health check and API verification to use PORT variable Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index e7f1c2f..360c210 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -217,14 +217,20 @@ grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include= 3. **STOP the server completely:** ```bash - # Send SIGTERM first, then SIGKILL if needed - pkill -f "node" || pkill -f "npm" || pkill -f "next" + # Kill by port (safer - only kills the dev server, not VS Code/Claude Code/etc.) + # Unix/macOS: + lsof -ti :${PORT:-3000} | xargs kill -TERM 2>/dev/null || true sleep 3 - pkill -9 -f "node" 2>/dev/null || true + lsof -ti :${PORT:-3000} | xargs kill -9 2>/dev/null || true sleep 2 + + # Windows alternative (use if lsof not available): + # netstat -ano | findstr :${PORT:-3000} | findstr LISTENING + # taskkill /F /PID 2>nul + # Verify server is stopped - if pgrep -f "node" > /dev/null; then - echo "ERROR: Server still running!" + if lsof -ti :${PORT:-3000} > /dev/null 2>&1; then + echo "ERROR: Server still running on port ${PORT:-3000}!" exit 1 fi ``` @@ -234,7 +240,7 @@ grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include= ./init.sh & sleep 15 # Allow server to fully start # Verify server is responding - if ! curl -f http://localhost:3000/api/health && ! curl -f http://localhost:3000; then + if ! curl -f http://localhost:${PORT:-3000}/api/health && ! curl -f http://localhost:${PORT:-3000}; then echo "ERROR: Server failed to start after restart" exit 1 fi @@ -242,7 +248,7 @@ grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include= 5. **Query for test data - it MUST still exist** - Via UI: Navigate to data location, verify data appears - - Via API: `curl http://localhost:PORT/api/items` - verify data in response + - Via API: `curl http://localhost:${PORT:-3000}/api/items` - verify data in response 6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL - Run all grep commands from STEP 5.6 to identify the mock pattern From d1233ad104de137bd2698f52b22d3e4d0148b301 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Mon, 26 Jan 2026 21:29:24 +0100 Subject: [PATCH 06/10] fix: Expand Map/Set grep search to entire src/ directory - Changed grep for "new Map()/new Set()" to search all of src/ - Previously only searched src/lib/, src/store/, src/data/ - Now consistent with other grep patterns that search entire src/ - Applied to both coding_prompt and initializer_prompt templates Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 4 ++-- .claude/templates/initializer_prompt.template.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 360c210..181be94 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -193,8 +193,8 @@ grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --i # 4. Development-only conditionals grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/ -# 5. In-memory collections as data stores (check lib/store/data directories) -grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/lib/ src/store/ src/data/ 2>/dev/null +# 5. In-memory collections as data stores +grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null ``` **Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing. diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index 7d8db3e..d2ded79 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -172,7 +172,7 @@ Steps: 3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/ 4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/ 5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/ -6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/lib/ src/store/ src/data/ 2>/dev/null +6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null 7. Run: grep -E "json-server|miragejs|msw" package.json 8. ALL grep commands must return empty (exit code 1) 9. If any returns results → investigate and fix before passing From 03504b3c1a8974c6bfce7bfbbb4ec6cc57f7fe34 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Mon, 26 Jan 2026 22:26:24 +0100 Subject: [PATCH 07/10] fix: use port-based process killing for cross-platform safety Addresses reviewer feedback: 1. Windows Compatibility: Added Windows alternative using netstat/taskkill 2. Safer Process Killing: Changed from `pkill -f "node"` to port-based killing (`lsof -ti :$PORT`) to avoid killing unrelated Node processes like VS Code, Claude Code, or other development tools Co-Authored-By: Claude Opus 4.5 --- .claude/templates/initializer_prompt.template.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index d2ded79..ccf1f98 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -155,8 +155,11 @@ Steps: Steps: 1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345") 2. Verify data appears in API response (GET /api/items) -3. STOP the server completely: pkill -f "node" && sleep 5 -4. Verify server is stopped: pgrep -f "node" returns nothing +3. STOP the server completely (kill by port to avoid killing unrelated Node processes): + - Unix/macOS: lsof -ti :$PORT | xargs kill -9 2>/dev/null || true && sleep 5 + - Windows: FOR /F "tokens=5" %a IN ('netstat -aon ^| find ":$PORT"') DO taskkill /F /PID %a 2>nul + - Note: Replace $PORT with actual port (e.g., 3000) +4. Verify server is stopped: lsof -ti :$PORT returns nothing (or netstat on Windows) 5. RESTART the server: ./init.sh & sleep 15 6. Query API again: GET /api/items 7. Verify "RESTART_TEST_12345" still exists From d652b1858769f8dba63f2932209d7fedc420fed6 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Tue, 27 Jan 2026 07:00:30 +0100 Subject: [PATCH 08/10] fix: add language tags to fenced code blocks per CodeRabbit/markdownlint Added 'text' language identifier to all fenced code blocks in the Infrastructure Feature Descriptions section to satisfy MD040. Co-Authored-By: Claude Opus 4.5 --- .claude/templates/initializer_prompt.template.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index ccf1f98..08274ec 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -133,7 +133,7 @@ Create WIDE dependency graphs, not linear chains: ### Infrastructure Feature Descriptions **Feature 0 - Database connection established:** -``` +```text Steps: 1. Start the development server 2. Check server logs for database connection message @@ -142,7 +142,7 @@ Steps: ``` **Feature 1 - Database schema applied correctly:** -``` +```text Steps: 1. Connect to database directly (sqlite3, psql, etc.) 2. List all tables in the database @@ -151,7 +151,7 @@ Steps: ``` **Feature 2 - Data persists across server restart (CRITICAL):** -``` +```text Steps: 1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345") 2. Verify data appears in API response (GET /api/items) @@ -168,7 +168,7 @@ Steps: ``` **Feature 3 - No mock data patterns in codebase:** -``` +```text Steps: 1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ 2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/ @@ -182,7 +182,7 @@ Steps: ``` **Feature 4 - Backend API queries real database:** -``` +```text Steps: 1. Start server with verbose logging 2. Make API call (e.g., GET /api/items) From 11cefec85be1bcf3fa61ca402cc6a43bfba85762 Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Wed, 28 Jan 2026 06:40:34 +0100 Subject: [PATCH 09/10] fix: add test file exclusions to mock data grep checks The comment said "excluding test files" but the grep commands didn't actually exclude them. Added common test file exclusion patterns. Co-Authored-By: Claude Opus 4.5 --- .claude/templates/coding_prompt.template.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index 181be94..70178a0 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -180,21 +180,24 @@ Use browser automation tools: **Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:** ```bash +# Common exclusions for test files +EXCLUDE="--exclude=*.test.* --exclude=*.spec.* --exclude=*__test__* --exclude=*__mocks__*" + # 1. In-memory storage patterns (CRITICAL - catches dev-store) -grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/ -grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/ +grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ +grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ # 2. Mock data variables -grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" src/ +grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ # 3. TODO/incomplete markers -grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/ +grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ # 4. Development-only conditionals -grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/ +grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ # 5. In-memory collections as data stores -grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null +grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ 2>/dev/null ``` **Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing. From 4cec4e63a49b9a869004bb10793c77c99129f05e Mon Sep 17 00:00:00 2001 From: cabana8471 Date: Thu, 29 Jan 2026 08:35:17 +0100 Subject: [PATCH 10/10] fix: standardize tier naming from 'Complex' to 'Advanced' for consistency Per CodeRabbit review - aligns with create-spec.md terminology. --- .claude/templates/initializer_prompt.template.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index 08274ec..bb914e2 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -38,7 +38,7 @@ Use the feature_create_bulk tool to add all features at once. You can create fea - Reference tiers for other projects: - **Simple apps**: ~165 tests (includes 5 infrastructure) - **Medium apps**: ~265 tests (includes 5 infrastructure) - - **Complex apps**: ~405+ tests (includes 5 infrastructure) + - **Advanced apps**: ~405+ tests (includes 5 infrastructure) - Both "functional" and "style" categories - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps) - At least 25 tests MUST have 10+ steps each (more for complex apps) @@ -199,7 +199,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou ### Category Distribution by Complexity Tier -| Category | Simple | Medium | Complex | +| Category | Simple | Medium | Advanced | | -------------------------------- | ------- | ------- | -------- | | **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 | | A. Security & Access Control | 5 | 20 | 40 |