diff --git a/.claude/commands/create-spec.md b/.claude/commands/create-spec.md
index f8cae28..f8a1b96 100644
--- a/.claude/commands/create-spec.md
+++ b/.claude/commands/create-spec.md
@@ -95,6 +95,27 @@ Ask the user about their involvement preference:
**For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.
+### Phase 3b: Database Requirements (MANDATORY)
+
+**Always ask this question regardless of mode:**
+
+> "One foundational question about data storage:
+>
+> **Does this application need to store user data persistently?**
+>
+> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
+> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
+> 3. **Not sure** - Let me describe what I need and you decide"
+
+**Branching logic:**
+
+- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
+
+- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
+ ```xml
+ none - stateless application
+ ```
+
## Phase 4: Features (THE MAIN PHASE)
This is where you spend most of your time. Ask questions in plain language that anyone can answer.
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f
**Typical ranges for reference:**
-- **Simple apps** (todo list, calculator, notes): ~20-50 features
-- **Medium apps** (blog, task manager with auth): ~100 features
-- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features
+- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
+- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
+- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)
These are just reference points - your actual count should come from the requirements discussed.
+**MANDATORY: Infrastructure Features**
+
+If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
+1. Database connection established
+2. Database schema applied correctly
+3. Data persists across server restart
+4. No mock data patterns in codebase
+5. Backend API queries real database
+
+These features ensure the coding agent implements a real database, not mock data or in-memory storage.
+
**How to count features:**
For each feature area discussed, estimate the number of discrete, testable behaviors:
@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav
> "Based on what we discussed, here's my feature breakdown:
>
+> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
> - [Category 1]: ~X features
> - [Category 2]: ~Y features
> - [Category 3]: ~Z features
> - ...
>
-> **Total: ~N features**
+> **Total: ~N features** (including 5 infrastructure)
>
> Does this seem right, or should I adjust?"
Let the user confirm or adjust. This becomes your `feature_count` for the spec.
+**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
+
## Phase 5: Technical Details (DERIVED OR DISCUSSED)
**For Quick Mode users:**
diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md
index d72b933..9322404 100644
--- a/.claude/templates/coding_prompt.template.md
+++ b/.claude/templates/coding_prompt.template.md
@@ -156,6 +156,9 @@ Use browser automation tools:
- [ ] Deleted the test data - verified it's gone everywhere
- [ ] NO unexplained data appeared (would indicate mock data)
- [ ] Dashboard/counts reflect real numbers after my changes
+- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
+- [ ] **Verified no globalThis, devStore, or dev-store patterns**
+- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**
#### Navigation Verification
@@ -174,10 +177,92 @@ Use browser automation tools:
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
-1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
-2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
-3. **Check database:** All displayed data must come from real DB queries
-4. If unexplained data appears, it's mock data - fix before marking passing.
+**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
+
+```bash
+# Common exclusions for test files
+EXCLUDE="--exclude=*.test.* --exclude=*.spec.* --exclude=*__test__* --exclude=*__mocks__*"
+
+# 1. In-memory storage patterns (CRITICAL - catches dev-store)
+grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
+grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
+
+# 2. Mock data variables
+grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
+
+# 3. TODO/incomplete markers
+grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
+
+# 4. Development-only conditionals
+grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
+
+# 5. In-memory collections as data stores
+grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ 2>/dev/null
+```
+
+**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
+
+**Runtime verification:**
+1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
+2. Check database directly - all displayed data must come from real DB queries
+3. If unexplained data appears, it's mock data - fix before marking passing.
+
+### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
+
+**When required:** Any feature involving CRUD operations or data persistence.
+
+**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
+
+**Steps:**
+
+1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
+2. Verify data appears in UI and API response
+
+3. **STOP the server completely:**
+ ```bash
+ # Kill by port (safer - only kills the dev server, not VS Code/Claude Code/etc.)
+ # Unix/macOS:
+ lsof -ti :${PORT:-3000} | xargs kill -TERM 2>/dev/null || true
+ sleep 3
+ lsof -ti :${PORT:-3000} | xargs kill -9 2>/dev/null || true
+ sleep 2
+
+ # Windows alternative (use if lsof not available):
+ # netstat -ano | findstr :${PORT:-3000} | findstr LISTENING
+ # taskkill /F /PID 2>nul
+
+ # Verify server is stopped
+ if lsof -ti :${PORT:-3000} > /dev/null 2>&1; then
+ echo "ERROR: Server still running on port ${PORT:-3000}!"
+ exit 1
+ fi
+ ```
+
+4. **RESTART the server:**
+ ```bash
+ ./init.sh &
+ sleep 15 # Allow server to fully start
+ # Verify server is responding
+ if ! curl -f http://localhost:${PORT:-3000}/api/health && ! curl -f http://localhost:${PORT:-3000}; then
+ echo "ERROR: Server failed to start after restart"
+ exit 1
+ fi
+ ```
+
+5. **Query for test data - it MUST still exist**
+ - Via UI: Navigate to data location, verify data appears
+ - Via API: `curl http://localhost:${PORT:-3000}/api/items` - verify data in response
+
+6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
+ - Run all grep commands from STEP 5.6 to identify the mock pattern
+ - You MUST fix the in-memory storage implementation before proceeding
+ - Replace in-memory storage with real database queries
+
+7. **Clean up test data** after successful verification
+
+**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
+
+**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md
index c6ee081..bb914e2 100644
--- a/.claude/templates/initializer_prompt.template.md
+++ b/.claude/templates/initializer_prompt.template.md
@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea
- Feature count must match the `feature_count` specified in app_spec.txt
- Reference tiers for other projects:
- - **Simple apps**: ~150 tests
- - **Medium apps**: ~250 tests
- - **Complex apps**: ~400+ tests
+ - **Simple apps**: ~165 tests (includes 5 infrastructure)
+ - **Medium apps**: ~265 tests (includes 5 infrastructure)
+ - **Advanced apps**: ~405+ tests (includes 5 infrastructure)
- Both "functional" and "style" categories
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
- At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
2. **Can only depend on EARLIER features** (index must be less than current position)
3. **No circular dependencies** allowed
4. **Maximum 20 dependencies** per feature
-5. **Foundation features (index 0-9)** should have NO dependencies
-6. **60% of features after index 10** should have at least one dependency
+5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
+6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
+7. **60% of features after index 10** should have additional dependencies beyond infrastructure
### Dependency Types
@@ -82,30 +83,113 @@ Create WIDE dependency graphs, not linear chains:
```json
[
- // FOUNDATION TIER (indices 0-2, no dependencies) - run first
- { "name": "App loads without errors", "category": "functional" },
- { "name": "Navigation bar displays", "category": "style" },
- { "name": "Homepage renders correctly", "category": "functional" },
+ // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
+ { "name": "Database connection established", "category": "functional" },
+ { "name": "Database schema applied correctly", "category": "functional" },
+ { "name": "Data persists across server restart", "category": "functional" },
+ { "name": "No mock data patterns in codebase", "category": "functional" },
+ { "name": "Backend API queries real database", "category": "functional" },
- // AUTH TIER (indices 3-5, depend on foundation) - run in parallel
- { "name": "User can register", "depends_on_indices": [0] },
- { "name": "User can login", "depends_on_indices": [0, 3] },
- { "name": "User can logout", "depends_on_indices": [4] },
+ // FOUNDATION TIER (indices 5-7, depend on infrastructure)
+ { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
+ { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
+ { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
- // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
- // All 4 start as soon as login passes!
- { "name": "User can create todo", "depends_on_indices": [4] },
- { "name": "User can view todos", "depends_on_indices": [4] },
- { "name": "User can edit todo", "depends_on_indices": [4, 6] },
- { "name": "User can delete todo", "depends_on_indices": [4, 6] },
+ // AUTH TIER (indices 8-10, depend on foundation + infrastructure)
+ { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
+ { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
+ { "name": "User can logout", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
- // ADVANCED TIER (indices 10-11) - both depend on view, not each other
- { "name": "User can filter todos", "depends_on_indices": [7] },
- { "name": "User can search todos", "depends_on_indices": [7] }
+ // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
+ { "name": "User can create todo", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
+ { "name": "User can view todos", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
+ { "name": "User can edit todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
+ { "name": "User can delete todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
+
+ // ADVANCED TIER (indices 15-16) - both depend on view, not each other
+ { "name": "User can filter todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] },
+ { "name": "User can search todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] }
]
```
-**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
+**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
+
+---
+
+## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
+
+**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
+
+| Index | Name | Test Steps |
+|-------|------|------------|
+| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
+| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
+| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
+| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
+| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
+
+**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
+
+### Infrastructure Feature Descriptions
+
+**Feature 0 - Database connection established:**
+```text
+Steps:
+1. Start the development server
+2. Check server logs for database connection message
+3. Call health endpoint (e.g., GET /api/health)
+4. Verify response includes database status: connected
+```
+
+**Feature 1 - Database schema applied correctly:**
+```text
+Steps:
+1. Connect to database directly (sqlite3, psql, etc.)
+2. List all tables in the database
+3. Verify tables match what's defined in app_spec.txt
+4. Verify key columns exist on each table
+```
+
+**Feature 2 - Data persists across server restart (CRITICAL):**
+```text
+Steps:
+1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
+2. Verify data appears in API response (GET /api/items)
+3. STOP the server completely (kill by port to avoid killing unrelated Node processes):
+ - Unix/macOS: lsof -ti :$PORT | xargs kill -9 2>/dev/null || true && sleep 5
+ - Windows: FOR /F "tokens=5" %a IN ('netstat -aon ^| find ":$PORT"') DO taskkill /F /PID %a 2>nul
+ - Note: Replace $PORT with actual port (e.g., 3000)
+4. Verify server is stopped: lsof -ti :$PORT returns nothing (or netstat on Windows)
+5. RESTART the server: ./init.sh & sleep 15
+6. Query API again: GET /api/items
+7. Verify "RESTART_TEST_12345" still exists
+8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
+9. Clean up test data
+```
+
+**Feature 3 - No mock data patterns in codebase:**
+```text
+Steps:
+1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
+2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/
+3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/
+4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/
+5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/
+6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null
+7. Run: grep -E "json-server|miragejs|msw" package.json
+8. ALL grep commands must return empty (exit code 1)
+9. If any returns results → investigate and fix before passing
+```
+
+**Feature 4 - Backend API queries real database:**
+```text
+Steps:
+1. Start server with verbose logging
+2. Make API call (e.g., GET /api/items)
+3. Check server logs
+4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
+5. If no DB queries in logs → implementation is using mock data
+```
---
@@ -115,8 +199,9 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
### Category Distribution by Complexity Tier
-| Category | Simple | Medium | Complex |
+| Category | Simple | Medium | Advanced |
| -------------------------------- | ------- | ------- | -------- |
+| **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 |
| A. Security & Access Control | 5 | 20 | 40 |
| B. Navigation Integrity | 15 | 25 | 40 |
| C. Real Data Verification | 20 | 30 | 50 |
@@ -137,12 +222,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
| R. Concurrency & Race Conditions | 5 | 8 | 15 |
| S. Export/Import | 5 | 6 | 10 |
| T. Performance | 5 | 5 | 10 |
-| **TOTAL** | **150** | **250** | **400+** |
+| **TOTAL** | **165** | **265** | **405+** |
---
### Category Descriptions
+**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
+
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +292,16 @@ The feature_list.json must include tests that **actively verify real data** and
- `setTimeout` simulating API delays with static data
- Static returns instead of database queries
+**Additional prohibited patterns (in-memory stores):**
+
+- `globalThis.` (in-memory storage pattern)
+- `dev-store`, `devStore`, `DevStore` (development stores)
+- `json-server`, `mirage`, `msw` (mock backends)
+- `Map()` or `Set()` used as primary data store
+- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
+
+**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
+
---
**CRITICAL INSTRUCTION:**