fix: Prevent mock data implementations with infrastructure features

Problem: The coding agent can implement in-memory storage (e.g., `dev-store.ts` with `globalThis`) instead of a real database. These implementations pass all tests because data persists during runtime, but data is lost on server restart. This is a root cause for #68 - agent "passes" features that don't actually work. Solution: 1. Add 5 mandatory Infrastructure Features (indices 0-4) that run first: - Feature 0: Database connection established - Feature 1: Database schema applied correctly - Feature 2: Data persists across server restart (CRITICAL) - Feature 3: No mock data patterns in codebase - Feature 4: Backend API queries real database 2. Add STEP 5.7: Server Restart Persistence Test to coding prompt: - Create test data, stop server, restart, verify data still exists 3. Extend grep patterns for mock detection in STEP 5.6: - globalThis., devStore, dev-store, mockData, fakeData - TODO.*real, STUB, MOCK, new Map() as data stores Changes: - .claude/templates/initializer_prompt.template.md - Infrastructure features - .claude/templates/coding_prompt.template.md - STEP 5.6/5.7 enhancements - .claude/commands/create-spec.md - Phase 3b database question Backwards Compatible: - Works with YOLO mode (uses bash/grep, not browser automation) - Stateless apps can skip database features via create-spec question Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:23:37 +00:00 · 2026-01-25 08:01:30 +01:00
parent 486979c3d9
commit 8e23fee094
3 changed files with 223 additions and 32 deletions
--- a/.claude/templates/initializer_prompt.template.md
+++ b/.claude/templates/initializer_prompt.template.md
@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea

 - Feature count must match the `feature_count` specified in app_spec.txt
 - Reference tiers for other projects:
-  - **Simple apps**: ~150 tests
-  - **Medium apps**: ~250 tests
-  - **Complex apps**: ~400+ tests
+  - **Simple apps**: ~155 tests (includes 5 infrastructure)
+  - **Medium apps**: ~255 tests (includes 5 infrastructure)
+  - **Complex apps**: ~405+ tests (includes 5 infrastructure)
 - Both "functional" and "style" categories
 - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
 - At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
 2. **Can only depend on EARLIER features** (index must be less than current position)
 3. **No circular dependencies** allowed
 4. **Maximum 20 dependencies** per feature
-5. **Foundation features (index 0-9)** should have NO dependencies
-6. **60% of features after index 10** should have at least one dependency
+5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
+6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
+7. **60% of features after index 10** should have additional dependencies beyond infrastructure

 ### Dependency Types

@@ -82,30 +83,107 @@ Create WIDE dependency graphs, not linear chains:

 ```json
 [
-  // FOUNDATION TIER (indices 0-2, no dependencies) - run first
-  { "name": "App loads without errors", "category": "functional" },
-  { "name": "Navigation bar displays", "category": "style" },
-  { "name": "Homepage renders correctly", "category": "functional" },
+  // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
+  { "name": "Database connection established", "category": "functional" },
+  { "name": "Database schema applied correctly", "category": "functional" },
+  { "name": "Data persists across server restart", "category": "functional" },
+  { "name": "No mock data patterns in codebase", "category": "functional" },
+  { "name": "Backend API queries real database", "category": "functional" },

-  // AUTH TIER (indices 3-5, depend on foundation) - run in parallel
-  { "name": "User can register", "depends_on_indices": [0] },
-  { "name": "User can login", "depends_on_indices": [0, 3] },
-  { "name": "User can logout", "depends_on_indices": [4] },
+  // FOUNDATION TIER (indices 5-7, depend on infrastructure)
+  { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },

-  // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
-  // All 4 start as soon as login passes!
-  { "name": "User can create todo", "depends_on_indices": [4] },
-  { "name": "User can view todos", "depends_on_indices": [4] },
-  { "name": "User can edit todo", "depends_on_indices": [4, 6] },
-  { "name": "User can delete todo", "depends_on_indices": [4, 6] },
+  // AUTH TIER (indices 8-10, depend on foundation + infrastructure)
+  { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
+  { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
+  { "name": "User can logout", "depends_on_indices": [9] },

-  // ADVANCED TIER (indices 10-11) - both depend on view, not each other
-  { "name": "User can filter todos", "depends_on_indices": [7] },
-  { "name": "User can search todos", "depends_on_indices": [7] }
+  // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
+  { "name": "User can create todo", "depends_on_indices": [9] },
+  { "name": "User can view todos", "depends_on_indices": [9] },
+  { "name": "User can edit todo", "depends_on_indices": [9, 11] },
+  { "name": "User can delete todo", "depends_on_indices": [9, 11] },
+
+  // ADVANCED TIER (indices 15-16) - both depend on view, not each other
+  { "name": "User can filter todos", "depends_on_indices": [12] },
+  { "name": "User can search todos", "depends_on_indices": [12] }
 ]
 ```

-**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
+**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
+
+---
+
+## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
+
+**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
+
+| Index | Name | Test Steps |
+|-------|------|------------|
+| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
+| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
+| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
+| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
+| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
+
+**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
+
+### Infrastructure Feature Descriptions
+
+**Feature 0 - Database connection established:**
+```
+Steps:
+1. Start the development server
+2. Check server logs for database connection message
+3. Call health endpoint (e.g., GET /api/health)
+4. Verify response includes database status: connected
+```
+
+**Feature 1 - Database schema applied correctly:**
+```
+Steps:
+1. Connect to database directly (sqlite3, psql, etc.)
+2. List all tables in the database
+3. Verify tables match what's defined in app_spec.txt
+4. Verify key columns exist on each table
+```
+
+**Feature 2 - Data persists across server restart (CRITICAL):**
+```
+Steps:
+1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
+2. Verify data appears in API response (GET /api/items)
+3. STOP the server completely: pkill -f "node" && sleep 5
+4. Verify server is stopped: pgrep -f "node" returns nothing
+5. RESTART the server: ./init.sh & sleep 15
+6. Query API again: GET /api/items
+7. Verify "RESTART_TEST_12345" still exists
+8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
+9. Clean up test data
+```
+
+**Feature 3 - No mock data patterns in codebase:**
+```
+Steps:
+1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" src/
+2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" src/
+3. Run: grep -r "mockData\|fakeData\|sampleData\|dummyData" --include="*.ts" src/
+4. Run: grep -r "new Map()\|new Set()" --include="*.ts" src/lib/ src/store/ src/data/
+5. ALL grep commands must return empty (exit code 1)
+6. If any returns results → investigate and fix before passing
+```
+
+**Feature 4 - Backend API queries real database:**
+```
+Steps:
+1. Start server with verbose logging
+2. Make API call (e.g., GET /api/items)
+3. Check server logs
+4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
+5. If no DB queries in logs → implementation is using mock data
+```

 ---

@@ -117,6 +195,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou

 | Category                         | Simple  | Medium  | Complex  |
 | -------------------------------- | ------- | ------- | -------- |
+| **0. Infrastructure (REQUIRED)** | 5       | 5       | 5        |
 | A. Security & Access Control     | 5       | 20      | 40       |
 | B. Navigation Integrity          | 15      | 25      | 40       |
 | C. Real Data Verification        | 20      | 30      | 50       |
@@ -137,12 +216,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
 | R. Concurrency & Race Conditions | 5       | 8       | 15       |
 | S. Export/Import                 | 5       | 6       | 10       |
 | T. Performance                   | 5       | 5       | 10       |
-| **TOTAL**                        | **150** | **250** | **400+** |
+| **TOTAL**                        | **155** | **255** | **405+** |

 ---

 ### Category Descriptions

+**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
+
 **A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.

 **B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +286,16 @@ The feature_list.json must include tests that **actively verify real data** and
 - `setTimeout` simulating API delays with static data
 - Static returns instead of database queries

+**Additional prohibited patterns (in-memory stores):**
+
+- `globalThis.` (in-memory storage pattern)
+- `dev-store`, `devStore`, `DevStore` (development stores)
+- `json-server`, `mirage`, `msw` (mock backends)
+- `Map()` or `Set()` used as primary data store
+- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
+
+**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
+
 ---

 **CRITICAL INSTRUCTION:**