Merge pull request #95 from cabana8471-arch/fix/infrastructure-features-mock-prevention

fix: Prevent mock data implementations with infrastructure features
This commit is contained in:
Leon van Zyl
2026-01-29 11:00:13 +02:00
committed by GitHub
3 changed files with 250 additions and 33 deletions

View File

@@ -95,6 +95,27 @@ Ask the user about their involvement preference:
**For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc. **For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.
### Phase 3b: Database Requirements (MANDATORY)
**Always ask this question regardless of mode:**
> "One foundational question about data storage:
>
> **Does this application need to store user data persistently?**
>
> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
> 3. **Not sure** - Let me describe what I need and you decide"
**Branching logic:**
- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
```xml
<database>none - stateless application</database>
```
## Phase 4: Features (THE MAIN PHASE) ## Phase 4: Features (THE MAIN PHASE)
This is where you spend most of your time. Ask questions in plain language that anyone can answer. This is where you spend most of your time. Ask questions in plain language that anyone can answer.
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f
**Typical ranges for reference:** **Typical ranges for reference:**
- **Simple apps** (todo list, calculator, notes): ~20-50 features - **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
- **Medium apps** (blog, task manager with auth): ~100 features - **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features - **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)
These are just reference points - your actual count should come from the requirements discussed. These are just reference points - your actual count should come from the requirements discussed.
**MANDATORY: Infrastructure Features**
If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
1. Database connection established
2. Database schema applied correctly
3. Data persists across server restart
4. No mock data patterns in codebase
5. Backend API queries real database
These features ensure the coding agent implements a real database, not mock data or in-memory storage.
**How to count features:** **How to count features:**
For each feature area discussed, estimate the number of discrete, testable behaviors: For each feature area discussed, estimate the number of discrete, testable behaviors:
@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav
> "Based on what we discussed, here's my feature breakdown: > "Based on what we discussed, here's my feature breakdown:
> >
> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
> - [Category 1]: ~X features > - [Category 1]: ~X features
> - [Category 2]: ~Y features > - [Category 2]: ~Y features
> - [Category 3]: ~Z features > - [Category 3]: ~Z features
> - ... > - ...
> >
> **Total: ~N features** > **Total: ~N features** (including 5 infrastructure)
> >
> Does this seem right, or should I adjust?" > Does this seem right, or should I adjust?"
Let the user confirm or adjust. This becomes your `feature_count` for the spec. Let the user confirm or adjust. This becomes your `feature_count` for the spec.
**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
## Phase 5: Technical Details (DERIVED OR DISCUSSED) ## Phase 5: Technical Details (DERIVED OR DISCUSSED)
**For Quick Mode users:** **For Quick Mode users:**

View File

@@ -156,6 +156,9 @@ Use browser automation tools:
- [ ] Deleted the test data - verified it's gone everywhere - [ ] Deleted the test data - verified it's gone everywhere
- [ ] NO unexplained data appeared (would indicate mock data) - [ ] NO unexplained data appeared (would indicate mock data)
- [ ] Dashboard/counts reflect real numbers after my changes - [ ] Dashboard/counts reflect real numbers after my changes
- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
- [ ] **Verified no globalThis, devStore, or dev-store patterns**
- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**
#### Navigation Verification #### Navigation Verification
@@ -174,10 +177,92 @@ Use browser automation tools:
### STEP 5.6: MOCK DATA DETECTION (Before marking passing) ### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"` **Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
3. **Check database:** All displayed data must come from real DB queries ```bash
4. If unexplained data appears, it's mock data - fix before marking passing. # Common exclusions for test files
EXCLUDE="--exclude=*.test.* --exclude=*.spec.* --exclude=*__test__* --exclude=*__mocks__*"
# 1. In-memory storage patterns (CRITICAL - catches dev-store)
grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
# 2. Mock data variables
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
# 3. TODO/incomplete markers
grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
# 4. Development-only conditionals
grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
# 5. In-memory collections as data stores
grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ 2>/dev/null
```
**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
**Runtime verification:**
1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
2. Check database directly - all displayed data must come from real DB queries
3. If unexplained data appears, it's mock data - fix before marking passing.
### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
**When required:** Any feature involving CRUD operations or data persistence.
**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
**Steps:**
1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
2. Verify data appears in UI and API response
3. **STOP the server completely:**
```bash
# Kill by port (safer - only kills the dev server, not VS Code/Claude Code/etc.)
# Unix/macOS:
lsof -ti :${PORT:-3000} | xargs kill -TERM 2>/dev/null || true
sleep 3
lsof -ti :${PORT:-3000} | xargs kill -9 2>/dev/null || true
sleep 2
# Windows alternative (use if lsof not available):
# netstat -ano | findstr :${PORT:-3000} | findstr LISTENING
# taskkill /F /PID <pid_from_above> 2>nul
# Verify server is stopped
if lsof -ti :${PORT:-3000} > /dev/null 2>&1; then
echo "ERROR: Server still running on port ${PORT:-3000}!"
exit 1
fi
```
4. **RESTART the server:**
```bash
./init.sh &
sleep 15 # Allow server to fully start
# Verify server is responding
if ! curl -f http://localhost:${PORT:-3000}/api/health && ! curl -f http://localhost:${PORT:-3000}; then
echo "ERROR: Server failed to start after restart"
exit 1
fi
```
5. **Query for test data - it MUST still exist**
- Via UI: Navigate to data location, verify data appears
- Via API: `curl http://localhost:${PORT:-3000}/api/items` - verify data in response
6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
- Run all grep commands from STEP 5.6 to identify the mock pattern
- You MUST fix the in-memory storage implementation before proceeding
- Replace in-memory storage with real database queries
7. **Clean up test data** after successful verification
**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!) ### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)

View File

@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea
- Feature count must match the `feature_count` specified in app_spec.txt - Feature count must match the `feature_count` specified in app_spec.txt
- Reference tiers for other projects: - Reference tiers for other projects:
- **Simple apps**: ~150 tests - **Simple apps**: ~165 tests (includes 5 infrastructure)
- **Medium apps**: ~250 tests - **Medium apps**: ~265 tests (includes 5 infrastructure)
- **Complex apps**: ~400+ tests - **Advanced apps**: ~405+ tests (includes 5 infrastructure)
- Both "functional" and "style" categories - Both "functional" and "style" categories
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps) - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
- At least 25 tests MUST have 10+ steps each (more for complex apps) - At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
2. **Can only depend on EARLIER features** (index must be less than current position) 2. **Can only depend on EARLIER features** (index must be less than current position)
3. **No circular dependencies** allowed 3. **No circular dependencies** allowed
4. **Maximum 20 dependencies** per feature 4. **Maximum 20 dependencies** per feature
5. **Foundation features (index 0-9)** should have NO dependencies 5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
6. **60% of features after index 10** should have at least one dependency 6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
7. **60% of features after index 10** should have additional dependencies beyond infrastructure
### Dependency Types ### Dependency Types
@@ -82,30 +83,113 @@ Create WIDE dependency graphs, not linear chains:
```json ```json
[ [
// FOUNDATION TIER (indices 0-2, no dependencies) - run first // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
{ "name": "App loads without errors", "category": "functional" }, { "name": "Database connection established", "category": "functional" },
{ "name": "Navigation bar displays", "category": "style" }, { "name": "Database schema applied correctly", "category": "functional" },
{ "name": "Homepage renders correctly", "category": "functional" }, { "name": "Data persists across server restart", "category": "functional" },
{ "name": "No mock data patterns in codebase", "category": "functional" },
{ "name": "Backend API queries real database", "category": "functional" },
// AUTH TIER (indices 3-5, depend on foundation) - run in parallel // FOUNDATION TIER (indices 5-7, depend on infrastructure)
{ "name": "User can register", "depends_on_indices": [0] }, { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
{ "name": "User can login", "depends_on_indices": [0, 3] }, { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
{ "name": "User can logout", "depends_on_indices": [4] }, { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
// CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login // AUTH TIER (indices 8-10, depend on foundation + infrastructure)
// All 4 start as soon as login passes! { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
{ "name": "User can create todo", "depends_on_indices": [4] }, { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
{ "name": "User can view todos", "depends_on_indices": [4] }, { "name": "User can logout", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
// ADVANCED TIER (indices 10-11) - both depend on view, not each other // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
{ "name": "User can filter todos", "depends_on_indices": [7] }, { "name": "User can create todo", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
{ "name": "User can search todos", "depends_on_indices": [7] } { "name": "User can view todos", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
{ "name": "User can edit todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
{ "name": "User can delete todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
// ADVANCED TIER (indices 15-16) - both depend on view, not each other
{ "name": "User can filter todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] },
{ "name": "User can search todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] }
] ]
``` ```
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles. **Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
---
## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
| Index | Name | Test Steps |
|-------|------|------------|
| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
### Infrastructure Feature Descriptions
**Feature 0 - Database connection established:**
```text
Steps:
1. Start the development server
2. Check server logs for database connection message
3. Call health endpoint (e.g., GET /api/health)
4. Verify response includes database status: connected
```
**Feature 1 - Database schema applied correctly:**
```text
Steps:
1. Connect to database directly (sqlite3, psql, etc.)
2. List all tables in the database
3. Verify tables match what's defined in app_spec.txt
4. Verify key columns exist on each table
```
**Feature 2 - Data persists across server restart (CRITICAL):**
```text
Steps:
1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
2. Verify data appears in API response (GET /api/items)
3. STOP the server completely (kill by port to avoid killing unrelated Node processes):
- Unix/macOS: lsof -ti :$PORT | xargs kill -9 2>/dev/null || true && sleep 5
- Windows: FOR /F "tokens=5" %a IN ('netstat -aon ^| find ":$PORT"') DO taskkill /F /PID %a 2>nul
- Note: Replace $PORT with actual port (e.g., 3000)
4. Verify server is stopped: lsof -ti :$PORT returns nothing (or netstat on Windows)
5. RESTART the server: ./init.sh & sleep 15
6. Query API again: GET /api/items
7. Verify "RESTART_TEST_12345" still exists
8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
9. Clean up test data
```
**Feature 3 - No mock data patterns in codebase:**
```text
Steps:
1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/
3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/
4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/
5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/
6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null
7. Run: grep -E "json-server|miragejs|msw" package.json
8. ALL grep commands must return empty (exit code 1)
9. If any returns results → investigate and fix before passing
```
**Feature 4 - Backend API queries real database:**
```text
Steps:
1. Start server with verbose logging
2. Make API call (e.g., GET /api/items)
3. Check server logs
4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
5. If no DB queries in logs → implementation is using mock data
```
--- ---
@@ -115,8 +199,9 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
### Category Distribution by Complexity Tier ### Category Distribution by Complexity Tier
| Category | Simple | Medium | Complex | | Category | Simple | Medium | Advanced |
| -------------------------------- | ------- | ------- | -------- | | -------------------------------- | ------- | ------- | -------- |
| **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 |
| A. Security & Access Control | 5 | 20 | 40 | | A. Security & Access Control | 5 | 20 | 40 |
| B. Navigation Integrity | 15 | 25 | 40 | | B. Navigation Integrity | 15 | 25 | 40 |
| C. Real Data Verification | 20 | 30 | 50 | | C. Real Data Verification | 20 | 30 | 50 |
@@ -137,12 +222,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
| R. Concurrency & Race Conditions | 5 | 8 | 15 | | R. Concurrency & Race Conditions | 5 | 8 | 15 |
| S. Export/Import | 5 | 6 | 10 | | S. Export/Import | 5 | 6 | 10 |
| T. Performance | 5 | 5 | 10 | | T. Performance | 5 | 5 | 10 |
| **TOTAL** | **150** | **250** | **400+** | | **TOTAL** | **165** | **265** | **405+** |
--- ---
### Category Descriptions ### Category Descriptions
**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users. **A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects. **B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +292,16 @@ The feature_list.json must include tests that **actively verify real data** and
- `setTimeout` simulating API delays with static data - `setTimeout` simulating API delays with static data
- Static returns instead of database queries - Static returns instead of database queries
**Additional prohibited patterns (in-memory stores):**
- `globalThis.` (in-memory storage pattern)
- `dev-store`, `devStore`, `DevStore` (development stores)
- `json-server`, `mirage`, `msw` (mock backends)
- `Map()` or `Set()` used as primary data store
- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
--- ---
**CRITICAL INSTRUCTION:** **CRITICAL INSTRUCTION:**