fix: Prevent mock data implementations with infrastructure features

Problem:
The coding agent can implement in-memory storage (e.g., `dev-store.ts` with
`globalThis`) instead of a real database. These implementations pass all tests
because data persists during runtime, but data is lost on server restart.

This is a root cause for #68 - agent "passes" features that don't actually work.

Solution:
1. Add 5 mandatory Infrastructure Features (indices 0-4) that run first:
   - Feature 0: Database connection established
   - Feature 1: Database schema applied correctly
   - Feature 2: Data persists across server restart (CRITICAL)
   - Feature 3: No mock data patterns in codebase
   - Feature 4: Backend API queries real database

2. Add STEP 5.7: Server Restart Persistence Test to coding prompt:
   - Create test data, stop server, restart, verify data still exists

3. Extend grep patterns for mock detection in STEP 5.6:
   - globalThis., devStore, dev-store, mockData, fakeData
   - TODO.*real, STUB, MOCK, new Map() as data stores

Changes:
- .claude/templates/initializer_prompt.template.md - Infrastructure features
- .claude/templates/coding_prompt.template.md - STEP 5.6/5.7 enhancements
- .claude/commands/create-spec.md - Phase 3b database question

Backwards Compatible:
- Works with YOLO mode (uses bash/grep, not browser automation)
- Stateless apps can skip database features via create-spec question

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
cabana8471
2026-01-25 08:01:30 +01:00
parent 486979c3d9
commit 8e23fee094
3 changed files with 223 additions and 32 deletions

View File

@@ -95,6 +95,27 @@ Ask the user about their involvement preference:
**For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.
### Phase 3b: Database Requirements (MANDATORY)
**Always ask this question regardless of mode:**
> "One foundational question about data storage:
>
> **Does this application need to store user data persistently?**
>
> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
> 3. **Not sure** - Let me describe what I need and you decide"
**Branching logic:**
- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
```xml
<database>none - stateless application</database>
```
## Phase 4: Features (THE MAIN PHASE)
This is where you spend most of your time. Ask questions in plain language that anyone can answer.
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f
**Typical ranges for reference:**
- **Simple apps** (todo list, calculator, notes): ~20-50 features
- **Medium apps** (blog, task manager with auth): ~100 features
- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features
- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)
These are just reference points - your actual count should come from the requirements discussed.
**MANDATORY: Infrastructure Features**
If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
1. Database connection established
2. Database schema applied correctly
3. Data persists across server restart
4. No mock data patterns in codebase
5. Backend API queries real database
These features ensure the coding agent implements a real database, not mock data or in-memory storage.
**How to count features:**
For each feature area discussed, estimate the number of discrete, testable behaviors:
@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav
> "Based on what we discussed, here's my feature breakdown:
>
> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
> - [Category 1]: ~X features
> - [Category 2]: ~Y features
> - [Category 3]: ~Z features
> - ...
>
> **Total: ~N features**
> **Total: ~N features** (including 5 infrastructure)
>
> Does this seem right, or should I adjust?"
Let the user confirm or adjust. This becomes your `feature_count` for the spec.
**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
## Phase 5: Technical Details (DERIVED OR DISCUSSED)
**For Quick Mode users:**

View File

@@ -156,6 +156,9 @@ Use browser automation tools:
- [ ] Deleted the test data - verified it's gone everywhere
- [ ] NO unexplained data appeared (would indicate mock data)
- [ ] Dashboard/counts reflect real numbers after my changes
- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
- [ ] **Verified no globalThis, devStore, or dev-store patterns**
- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**
#### Navigation Verification
@@ -174,10 +177,72 @@ Use browser automation tools:
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
3. **Check database:** All displayed data must come from real DB queries
4. If unexplained data appears, it's mock data - fix before marking passing.
**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
```bash
# 1. In-memory storage patterns (CRITICAL - catches dev-store)
grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" src/
# 2. Mock data variables
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" src/
# 3. TODO/incomplete markers
grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" src/
# 4. Development-only conditionals
grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" src/
# 5. In-memory collections as data stores (check lib/store/data directories)
grep -r "new Map()\|new Set()" --include="*.ts" --include="*.tsx" src/lib/ src/store/ src/data/ 2>/dev/null
```
**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
**Runtime verification:**
1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
2. Check database directly - all displayed data must come from real DB queries
3. If unexplained data appears, it's mock data - fix before marking passing.
### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
**When required:** Any feature involving CRUD operations or data persistence.
**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
**Steps:**
1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
2. Verify data appears in UI and API response
3. **STOP the server completely:**
```bash
pkill -f "node" || pkill -f "npm" || pkill -f "next"
sleep 5
# Verify server is stopped
pgrep -f "node" && echo "ERROR: Server still running!" && exit 1
```
4. **RESTART the server:**
```bash
./init.sh &
sleep 15 # Allow server to fully start
```
5. **Query for test data - it MUST still exist**
- Via UI: Navigate to data location, verify data appears
- Via API: `curl http://localhost:PORT/api/items` - verify data in response
6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
- Search for: `grep -r "globalThis\|devStore\|dev-store" src/`
- You MUST fix the mock data implementation before proceeding
- Replace in-memory storage with real database queries
7. **Clean up test data** after successful verification
**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)

View File

@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea
- Feature count must match the `feature_count` specified in app_spec.txt
- Reference tiers for other projects:
- **Simple apps**: ~150 tests
- **Medium apps**: ~250 tests
- **Complex apps**: ~400+ tests
- **Simple apps**: ~155 tests (includes 5 infrastructure)
- **Medium apps**: ~255 tests (includes 5 infrastructure)
- **Complex apps**: ~405+ tests (includes 5 infrastructure)
- Both "functional" and "style" categories
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
- At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
2. **Can only depend on EARLIER features** (index must be less than current position)
3. **No circular dependencies** allowed
4. **Maximum 20 dependencies** per feature
5. **Foundation features (index 0-9)** should have NO dependencies
6. **60% of features after index 10** should have at least one dependency
5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
7. **60% of features after index 10** should have additional dependencies beyond infrastructure
### Dependency Types
@@ -82,30 +83,107 @@ Create WIDE dependency graphs, not linear chains:
```json
[
// FOUNDATION TIER (indices 0-2, no dependencies) - run first
{ "name": "App loads without errors", "category": "functional" },
{ "name": "Navigation bar displays", "category": "style" },
{ "name": "Homepage renders correctly", "category": "functional" },
// INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
{ "name": "Database connection established", "category": "functional" },
{ "name": "Database schema applied correctly", "category": "functional" },
{ "name": "Data persists across server restart", "category": "functional" },
{ "name": "No mock data patterns in codebase", "category": "functional" },
{ "name": "Backend API queries real database", "category": "functional" },
// AUTH TIER (indices 3-5, depend on foundation) - run in parallel
{ "name": "User can register", "depends_on_indices": [0] },
{ "name": "User can login", "depends_on_indices": [0, 3] },
{ "name": "User can logout", "depends_on_indices": [4] },
// FOUNDATION TIER (indices 5-7, depend on infrastructure)
{ "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
{ "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
{ "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
// CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
// All 4 start as soon as login passes!
{ "name": "User can create todo", "depends_on_indices": [4] },
{ "name": "User can view todos", "depends_on_indices": [4] },
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
// AUTH TIER (indices 8-10, depend on foundation + infrastructure)
{ "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
{ "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
{ "name": "User can logout", "depends_on_indices": [9] },
// ADVANCED TIER (indices 10-11) - both depend on view, not each other
{ "name": "User can filter todos", "depends_on_indices": [7] },
{ "name": "User can search todos", "depends_on_indices": [7] }
// CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
{ "name": "User can create todo", "depends_on_indices": [9] },
{ "name": "User can view todos", "depends_on_indices": [9] },
{ "name": "User can edit todo", "depends_on_indices": [9, 11] },
{ "name": "User can delete todo", "depends_on_indices": [9, 11] },
// ADVANCED TIER (indices 15-16) - both depend on view, not each other
{ "name": "User can filter todos", "depends_on_indices": [12] },
{ "name": "User can search todos", "depends_on_indices": [12] }
]
```
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
---
## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
| Index | Name | Test Steps |
|-------|------|------------|
| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
### Infrastructure Feature Descriptions
**Feature 0 - Database connection established:**
```
Steps:
1. Start the development server
2. Check server logs for database connection message
3. Call health endpoint (e.g., GET /api/health)
4. Verify response includes database status: connected
```
**Feature 1 - Database schema applied correctly:**
```
Steps:
1. Connect to database directly (sqlite3, psql, etc.)
2. List all tables in the database
3. Verify tables match what's defined in app_spec.txt
4. Verify key columns exist on each table
```
**Feature 2 - Data persists across server restart (CRITICAL):**
```
Steps:
1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
2. Verify data appears in API response (GET /api/items)
3. STOP the server completely: pkill -f "node" && sleep 5
4. Verify server is stopped: pgrep -f "node" returns nothing
5. RESTART the server: ./init.sh & sleep 15
6. Query API again: GET /api/items
7. Verify "RESTART_TEST_12345" still exists
8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
9. Clean up test data
```
**Feature 3 - No mock data patterns in codebase:**
```
Steps:
1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" src/
2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" src/
3. Run: grep -r "mockData\|fakeData\|sampleData\|dummyData" --include="*.ts" src/
4. Run: grep -r "new Map()\|new Set()" --include="*.ts" src/lib/ src/store/ src/data/
5. ALL grep commands must return empty (exit code 1)
6. If any returns results → investigate and fix before passing
```
**Feature 4 - Backend API queries real database:**
```
Steps:
1. Start server with verbose logging
2. Make API call (e.g., GET /api/items)
3. Check server logs
4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
5. If no DB queries in logs → implementation is using mock data
```
---
@@ -117,6 +195,7 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
| Category | Simple | Medium | Complex |
| -------------------------------- | ------- | ------- | -------- |
| **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 |
| A. Security & Access Control | 5 | 20 | 40 |
| B. Navigation Integrity | 15 | 25 | 40 |
| C. Real Data Verification | 20 | 30 | 50 |
@@ -137,12 +216,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
| R. Concurrency & Race Conditions | 5 | 8 | 15 |
| S. Export/Import | 5 | 6 | 10 |
| T. Performance | 5 | 5 | 10 |
| **TOTAL** | **150** | **250** | **400+** |
| **TOTAL** | **155** | **255** | **405+** |
---
### Category Descriptions
**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +286,16 @@ The feature_list.json must include tests that **actively verify real data** and
- `setTimeout` simulating API delays with static data
- Static returns instead of database queries
**Additional prohibited patterns (in-memory stores):**
- `globalThis.` (in-memory storage pattern)
- `dev-store`, `devStore`, `DevStore` (development stores)
- `json-server`, `mirage`, `msw` (mock backends)
- `Map()` or `Set()` used as primary data store
- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
---
**CRITICAL INSTRUCTION:**