fix: Prevent mock data implementations with infrastructure features

Problem: The coding agent can implement in-memory storage (e.g., `dev-store.ts` with `globalThis`) instead of a real database. These implementations pass all tests because data persists during runtime, but data is lost on server restart. This is a root cause for #68 - agent "passes" features that don't actually work. Solution: 1. Add 5 mandatory Infrastructure Features (indices 0-4) that run first: - Feature 0: Database connection established - Feature 1: Database schema applied correctly - Feature 2: Data persists across server restart (CRITICAL) - Feature 3: No mock data patterns in codebase - Feature 4: Backend API queries real database 2. Add STEP 5.7: Server Restart Persistence Test to coding prompt: - Create test data, stop server, restart, verify data still exists 3. Extend grep patterns for mock detection in STEP 5.6: - globalThis., devStore, dev-store, mockData, fakeData - TODO.*real, STUB, MOCK, new Map() as data stores Changes: - .claude/templates/initializer_prompt.template.md - Infrastructure features - .claude/templates/coding_prompt.template.md - STEP 5.6/5.7 enhancements - .claude/commands/create-spec.md - Phase 3b database question Backwards Compatible: - Works with YOLO mode (uses bash/grep, not browser automation) - Stateless apps can skip database features via create-spec question Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-18 03:13:08 +00:00 · 2026-01-25 08:01:30 +01:00
parent 486979c3d9
commit 8e23fee094
3 changed files with 223 additions and 32 deletions
--- a/.claude/commands/create-spec.md
+++ b/.claude/commands/create-spec.md
@@ -95,6 +95,27 @@ Ask the user about their involvement preference:

 **For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.

+### Phase 3b: Database Requirements (MANDATORY)
+
+**Always ask this question regardless of mode:**
+
+> "One foundational question about data storage:
+>
+> **Does this application need to store user data persistently?**
+>
+> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
+> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
+> 3. **Not sure** - Let me describe what I need and you decide"
+
+**Branching logic:**
+
+- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
+
+- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
+  ```xml
+  <database>none - stateless application</database>
+  ```
+
 ## Phase 4: Features (THE MAIN PHASE)

 This is where you spend most of your time. Ask questions in plain language that anyone can answer.
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f

 **Typical ranges for reference:**

- **Simple apps** (todo list, calculator, notes): ~20-50 features
- **Medium apps** (blog, task manager with auth): ~100 features
- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features
+- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
+- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
+- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)

 These are just reference points - your actual count should come from the requirements discussed.

+**MANDATORY: Infrastructure Features**
+
+If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
+1. Database connection established
+2. Database schema applied correctly
+3. Data persists across server restart
+4. No mock data patterns in codebase
+5. Backend API queries real database
+
+These features ensure the coding agent implements a real database, not mock data or in-memory storage.
+
 **How to count features:**
 For each feature area discussed, estimate the number of discrete, testable behaviors:

@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav

 > "Based on what we discussed, here's my feature breakdown:
 >
+> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
 > - [Category 1]: ~X features
 > - [Category 2]: ~Y features
 > - [Category 3]: ~Z features
 > - ...
 >
-> **Total: ~N features**
+> **Total: ~N features** (including 5 infrastructure)
 >
 > Does this seem right, or should I adjust?"

 Let the user confirm or adjust. This becomes your `feature_count` for the spec.

+**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
+
 ## Phase 5: Technical Details (DERIVED OR DISCUSSED)

 **For Quick Mode users:**
--- a/.claude/templates/coding_prompt.template.md
+++ b/.claude/templates/coding_prompt.template.md
@@ -156,6 +156,9 @@ Use browser automation tools:
 - [ ] Deleted the test data - verified it's gone everywhere
 - [ ] NO unexplained data appeared (would indicate mock data)
 - [ ] Dashboard/counts reflect real numbers after my changes
+- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
+- [ ] **Verified no globalThis, devStore, or dev-store patterns**
+- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**

 #### Navigation Verification

@@ -174,10 +177,72 @@ Use browser automation tools:

 ### STEP 5.6: MOCK DATA DETECTION (Before marking passing)

-1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
-2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
-3. **Check database:** All displayed data must come from real DB queries
-4. If unexplained data appears, it's mock data - fix before marking passing.
+**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
+
+```bash
+# 1. In-memory storage patterns (CRITICAL - catches dev-store)
+grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
+grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/
+
+# 2. Mock data variables
+grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" src/
+
+# 3. TODO/incomplete markers
+grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/
+
+# 4. Development-only conditionals
+grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/
+
+# 5. In-memory collections as data stores (check lib/store/data directories)
+grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null
+```
+
+**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
+
+**Runtime verification:**
+1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
+2. Check database directly - all displayed data must come from real DB queries
+3. If unexplained data appears, it's mock data - fix before marking passing.
+
+### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
+
+**When required:** Any feature involving CRUD operations or data persistence.
+
+**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
+
+**Steps:**
+
+1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
+2. Verify data appears in UI and API response
+
+3. **STOP the server completely:**
+   ```bash
+   pkill -f "node" || pkill -f "npm" || pkill -f "next"
+   sleep 5
+   # Verify server is stopped
+   pgrep -f "node" && echo "ERROR: Server still running!" && exit 1
+   ```
+
+4. **RESTART the server:**
+   ```bash
+   ./init.sh &
+   sleep 15  # Allow server to fully start
+   ```
+
+5. **Query for test data - it MUST still exist**
+   - Via UI: Navigate to data location, verify data appears
+   - Via API: `curl http://localhost:PORT/api/items` - verify data in response
+
+6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
+   - Search for: `grep -r "globalThis\|devStore\|dev-store" src/`
+   - You MUST fix the mock data implementation before proceeding
+   - Replace in-memory storage with real database queries
+
+7. **Clean up test data** after successful verification
+
+**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
+
+**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.

 ### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)

--- a/.claude/templates/initializer_prompt.template.md
+++ b/.claude/templates/initializer_prompt.template.md
@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea

 - Feature count must match the `feature_count` specified in app_spec.txt
 - Reference tiers for other projects:
-  - **Simple apps**: ~150 tests
-  - **Medium apps**: ~250 tests
-  - **Complex apps**: ~400+ tests
+  - **Simple apps**: ~155 tests (includes 5 infrastructure)
+  - **Medium apps**: ~255 tests (includes 5 infrastructure)
+  - **Complex apps**: ~405+ tests (includes 5 infrastructure)
 - Both "functional" and "style" categories
 - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
 - At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
 2. **Can only depend on EARLIER features** (index must be less than current position)
 3. **No circular dependencies** allowed
 4. **Maximum 20 dependencies** per feature
-5. **Foundation features (index 0-9)** should have NO dependencies
-6. **60% of features after index 10** should have at least one dependency
+5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
+6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
+7. **60% of features after index 10** should have additional dependencies beyond infrastructure

 ### Dependency Types

@@ -82,30 +83,107 @@ Create WIDE dependency graphs, not linear chains:

 ```json
 [
-  // FOUNDATION TIER (indices 0-2, no dependencies) - run first
-  { "name": "App loads without errors", "category": "functional" },
-  { "name": "Navigation bar displays", "category": "style" },
-  { "name": "Homepage renders correctly", "category": "functional" },
+  // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
+  { "name": "Database connection established", "category": "functional" },
+  { "name": "Database schema applied correctly", "category": "functional" },
+  { "name": "Data persists across server restart", "category": "functional" },
+  { "name": "No mock data patterns in codebase", "category": "functional" },
+  { "name": "Backend API queries real database", "category": "functional" },

-  // AUTH TIER (indices 3-5, depend on foundation) - run in parallel
-  { "name": "User can register", "depends_on_indices": [0] },
-  { "name": "User can login", "depends_on_indices": [0, 3] },
-  { "name": "User can logout", "depends_on_indices": [4] },
+  // FOUNDATION TIER (indices 5-7, depend on infrastructure)
+  { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },

-  // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
-  // All 4 start as soon as login passes!
-  { "name": "User can create todo", "depends_on_indices": [4] },
-  { "name": "User can view todos", "depends_on_indices": [4] },
-  { "name": "User can edit todo", "depends_on_indices": [4, 6] },
-  { "name": "User can delete todo", "depends_on_indices": [4, 6] },
+  // AUTH TIER (indices 8-10, depend on foundation + infrastructure)
+  { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
+  { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
+  { "name": "User can logout", "depends_on_indices": [9] },

-  // ADVANCED TIER (indices 10-11) - both depend on view, not each other
-  { "name": "User can filter todos", "depends_on_indices": [7] },
-  { "name": "User can search todos", "depends_on_indices": [7] }
+  // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
+  { "name": "User can create todo", "depends_on_indices": [9] },
+  { "name": "User can view todos", "depends_on_indices": [9] },
+  { "name": "User can edit todo", "depends_on_indices": [9, 11] },
+  { "name": "User can delete todo", "depends_on_indices": [9, 11] },
+
+  // ADVANCED TIER (indices 15-16) - both depend on view, not each other
+  { "name": "User can filter todos", "depends_on_indices": [12] },
+  { "name": "User can search todos", "depends_on_indices": [12] }
 ]
 ```

-**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
+**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
+
+---
+
+## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
+
+**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
+
+| Index | Name | Test Steps |
+|-------|------|------------|
+| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
+| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
+| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
+| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
+| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
+
+**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
+
+### Infrastructure Feature Descriptions
+
+**Feature 0 - Database connection established:**
+```
+Steps:
+1. Start the development server
+2. Check server logs for database connection message
+3. Call health endpoint (e.g., GET /api/health)
+4. Verify response includes database status: connected
+```
+
+**Feature 1 - Database schema applied correctly:**
+```
+Steps:
+1. Connect to database directly (sqlite3, psql, etc.)
+2. List all tables in the database
+3. Verify tables match what's defined in app_spec.txt
+4. Verify key columns exist on each table
+```
+
+**Feature 2 - Data persists across server restart (CRITICAL):**
+```
+Steps:
+1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
+2. Verify data appears in API response (GET /api/items)
+3. STOP the server completely: pkill -f "node" && sleep 5
+4. Verify server is stopped: pgrep -f "node" returns nothing
+5. RESTART the server: ./init.sh & sleep 15
+6. Query API again: GET /api/items
+7. Verify "RESTART_TEST_12345" still exists
+8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
+9. Clean up test data
+```
+
+**Feature 3 - No mock data patterns in codebase:**
+```
+Steps:
+1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" src/
+2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" src/
+3. Run: grep -r "mockData\|fakeData\|sampleData\|dummyData" --include="*.ts" src/
+4. Run: grep -r "new Map()\|new Set()" --include="*.ts" src/lib/ src/store/ src/data/
+5. ALL grep commands must return empty (exit code 1)
+6. If any returns results → investigate and fix before passing
+```
+
+**Feature 4 - Backend API queries real database:**
+```
+Steps:
+1. Start server with verbose logging
+2. Make API call (e.g., GET /api/items)
+3. Check server logs
+4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
+5. If no DB queries in logs → implementation is using mock data
+```

 ---

@@ -117,6 +195,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou

 | Category                         | Simple  | Medium  | Complex  |
 | -------------------------------- | ------- | ------- | -------- |
+| **0. Infrastructure (REQUIRED)** | 5       | 5       | 5        |
 | A. Security & Access Control     | 5       | 20      | 40       |
 | B. Navigation Integrity          | 15      | 25      | 40       |
 | C. Real Data Verification        | 20      | 30      | 50       |
@@ -137,12 +216,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
 | R. Concurrency & Race Conditions | 5       | 8       | 15       |
 | S. Export/Import                 | 5       | 6       | 10       |
 | T. Performance                   | 5       | 5       | 10       |
-| **TOTAL**                        | **150** | **250** | **400+** |
+| **TOTAL**                        | **155** | **255** | **405+** |

 ---

 ### Category Descriptions

+**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
+
 **A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.

 **B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +286,16 @@ The feature_list.json must include tests that **actively verify real data** and
 - `setTimeout` simulating API delays with static data
 - Static returns instead of database queries

+**Additional prohibited patterns (in-memory stores):**
+
+- `globalThis.` (in-memory storage pattern)
+- `dev-store`, `devStore`, `DevStore` (development stores)
+- `json-server`, `mirage`, `msw` (mock backends)
+- `Map()` or `Set()` used as primary data store
+- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
+
+**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
+
 ---

 **CRITICAL INSTRUCTION:**