mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-01-29 22:02:05 +00:00
Merge pull request #95 from cabana8471-arch/fix/infrastructure-features-mock-prevention
fix: Prevent mock data implementations with infrastructure features
This commit is contained in:
@@ -95,6 +95,27 @@ Ask the user about their involvement preference:
|
||||
|
||||
**For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.
|
||||
|
||||
### Phase 3b: Database Requirements (MANDATORY)
|
||||
|
||||
**Always ask this question regardless of mode:**
|
||||
|
||||
> "One foundational question about data storage:
|
||||
>
|
||||
> **Does this application need to store user data persistently?**
|
||||
>
|
||||
> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
|
||||
> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
|
||||
> 3. **Not sure** - Let me describe what I need and you decide"
|
||||
|
||||
**Branching logic:**
|
||||
|
||||
- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
|
||||
|
||||
- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
|
||||
```xml
|
||||
<database>none - stateless application</database>
|
||||
```
|
||||
|
||||
## Phase 4: Features (THE MAIN PHASE)
|
||||
|
||||
This is where you spend most of your time. Ask questions in plain language that anyone can answer.
|
||||
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f
|
||||
|
||||
**Typical ranges for reference:**
|
||||
|
||||
- **Simple apps** (todo list, calculator, notes): ~20-50 features
|
||||
- **Medium apps** (blog, task manager with auth): ~100 features
|
||||
- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features
|
||||
- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
|
||||
- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
|
||||
- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)
|
||||
|
||||
These are just reference points - your actual count should come from the requirements discussed.
|
||||
|
||||
**MANDATORY: Infrastructure Features**
|
||||
|
||||
If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
|
||||
1. Database connection established
|
||||
2. Database schema applied correctly
|
||||
3. Data persists across server restart
|
||||
4. No mock data patterns in codebase
|
||||
5. Backend API queries real database
|
||||
|
||||
These features ensure the coding agent implements a real database, not mock data or in-memory storage.
|
||||
|
||||
**How to count features:**
|
||||
For each feature area discussed, estimate the number of discrete, testable behaviors:
|
||||
|
||||
@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav
|
||||
|
||||
> "Based on what we discussed, here's my feature breakdown:
|
||||
>
|
||||
> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
|
||||
> - [Category 1]: ~X features
|
||||
> - [Category 2]: ~Y features
|
||||
> - [Category 3]: ~Z features
|
||||
> - ...
|
||||
>
|
||||
> **Total: ~N features**
|
||||
> **Total: ~N features** (including 5 infrastructure)
|
||||
>
|
||||
> Does this seem right, or should I adjust?"
|
||||
|
||||
Let the user confirm or adjust. This becomes your `feature_count` for the spec.
|
||||
|
||||
**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
|
||||
|
||||
## Phase 5: Technical Details (DERIVED OR DISCUSSED)
|
||||
|
||||
**For Quick Mode users:**
|
||||
|
||||
@@ -156,6 +156,9 @@ Use browser automation tools:
|
||||
- [ ] Deleted the test data - verified it's gone everywhere
|
||||
- [ ] NO unexplained data appeared (would indicate mock data)
|
||||
- [ ] Dashboard/counts reflect real numbers after my changes
|
||||
- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
|
||||
- [ ] **Verified no globalThis, devStore, or dev-store patterns**
|
||||
- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**
|
||||
|
||||
#### Navigation Verification
|
||||
|
||||
@@ -174,10 +177,92 @@ Use browser automation tools:
|
||||
|
||||
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
|
||||
|
||||
1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
|
||||
2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
|
||||
3. **Check database:** All displayed data must come from real DB queries
|
||||
4. If unexplained data appears, it's mock data - fix before marking passing.
|
||||
**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
|
||||
|
||||
```bash
|
||||
# Common exclusions for test files
|
||||
EXCLUDE="--exclude=*.test.* --exclude=*.spec.* --exclude=*__test__* --exclude=*__mocks__*"
|
||||
|
||||
# 1. In-memory storage patterns (CRITICAL - catches dev-store)
|
||||
grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 2. Mock data variables
|
||||
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 3. TODO/incomplete markers
|
||||
grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 4. Development-only conditionals
|
||||
grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 5. In-memory collections as data stores
|
||||
grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ 2>/dev/null
|
||||
```
|
||||
|
||||
**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
|
||||
|
||||
**Runtime verification:**
|
||||
1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
|
||||
2. Check database directly - all displayed data must come from real DB queries
|
||||
3. If unexplained data appears, it's mock data - fix before marking passing.
|
||||
|
||||
### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
|
||||
|
||||
**When required:** Any feature involving CRUD operations or data persistence.
|
||||
|
||||
**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
|
||||
2. Verify data appears in UI and API response
|
||||
|
||||
3. **STOP the server completely:**
|
||||
```bash
|
||||
# Kill by port (safer - only kills the dev server, not VS Code/Claude Code/etc.)
|
||||
# Unix/macOS:
|
||||
lsof -ti :${PORT:-3000} | xargs kill -TERM 2>/dev/null || true
|
||||
sleep 3
|
||||
lsof -ti :${PORT:-3000} | xargs kill -9 2>/dev/null || true
|
||||
sleep 2
|
||||
|
||||
# Windows alternative (use if lsof not available):
|
||||
# netstat -ano | findstr :${PORT:-3000} | findstr LISTENING
|
||||
# taskkill /F /PID <pid_from_above> 2>nul
|
||||
|
||||
# Verify server is stopped
|
||||
if lsof -ti :${PORT:-3000} > /dev/null 2>&1; then
|
||||
echo "ERROR: Server still running on port ${PORT:-3000}!"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
4. **RESTART the server:**
|
||||
```bash
|
||||
./init.sh &
|
||||
sleep 15 # Allow server to fully start
|
||||
# Verify server is responding
|
||||
if ! curl -f http://localhost:${PORT:-3000}/api/health && ! curl -f http://localhost:${PORT:-3000}; then
|
||||
echo "ERROR: Server failed to start after restart"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
5. **Query for test data - it MUST still exist**
|
||||
- Via UI: Navigate to data location, verify data appears
|
||||
- Via API: `curl http://localhost:${PORT:-3000}/api/items` - verify data in response
|
||||
|
||||
6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
|
||||
- Run all grep commands from STEP 5.6 to identify the mock pattern
|
||||
- You MUST fix the in-memory storage implementation before proceeding
|
||||
- Replace in-memory storage with real database queries
|
||||
|
||||
7. **Clean up test data** after successful verification
|
||||
|
||||
**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
|
||||
|
||||
**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.
|
||||
|
||||
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
|
||||
|
||||
|
||||
@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea
|
||||
|
||||
- Feature count must match the `feature_count` specified in app_spec.txt
|
||||
- Reference tiers for other projects:
|
||||
- **Simple apps**: ~150 tests
|
||||
- **Medium apps**: ~250 tests
|
||||
- **Complex apps**: ~400+ tests
|
||||
- **Simple apps**: ~165 tests (includes 5 infrastructure)
|
||||
- **Medium apps**: ~265 tests (includes 5 infrastructure)
|
||||
- **Advanced apps**: ~405+ tests (includes 5 infrastructure)
|
||||
- Both "functional" and "style" categories
|
||||
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
|
||||
- At least 25 tests MUST have 10+ steps each (more for complex apps)
|
||||
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
|
||||
2. **Can only depend on EARLIER features** (index must be less than current position)
|
||||
3. **No circular dependencies** allowed
|
||||
4. **Maximum 20 dependencies** per feature
|
||||
5. **Foundation features (index 0-9)** should have NO dependencies
|
||||
6. **60% of features after index 10** should have at least one dependency
|
||||
5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
|
||||
6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
|
||||
7. **60% of features after index 10** should have additional dependencies beyond infrastructure
|
||||
|
||||
### Dependency Types
|
||||
|
||||
@@ -82,30 +83,113 @@ Create WIDE dependency graphs, not linear chains:
|
||||
|
||||
```json
|
||||
[
|
||||
// FOUNDATION TIER (indices 0-2, no dependencies) - run first
|
||||
{ "name": "App loads without errors", "category": "functional" },
|
||||
{ "name": "Navigation bar displays", "category": "style" },
|
||||
{ "name": "Homepage renders correctly", "category": "functional" },
|
||||
// INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
|
||||
{ "name": "Database connection established", "category": "functional" },
|
||||
{ "name": "Database schema applied correctly", "category": "functional" },
|
||||
{ "name": "Data persists across server restart", "category": "functional" },
|
||||
{ "name": "No mock data patterns in codebase", "category": "functional" },
|
||||
{ "name": "Backend API queries real database", "category": "functional" },
|
||||
|
||||
// AUTH TIER (indices 3-5, depend on foundation) - run in parallel
|
||||
{ "name": "User can register", "depends_on_indices": [0] },
|
||||
{ "name": "User can login", "depends_on_indices": [0, 3] },
|
||||
{ "name": "User can logout", "depends_on_indices": [4] },
|
||||
// FOUNDATION TIER (indices 5-7, depend on infrastructure)
|
||||
{ "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
|
||||
{ "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
|
||||
{ "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
|
||||
|
||||
// CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
|
||||
// All 4 start as soon as login passes!
|
||||
{ "name": "User can create todo", "depends_on_indices": [4] },
|
||||
{ "name": "User can view todos", "depends_on_indices": [4] },
|
||||
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
|
||||
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
|
||||
// AUTH TIER (indices 8-10, depend on foundation + infrastructure)
|
||||
{ "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
|
||||
{ "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
|
||||
{ "name": "User can logout", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
|
||||
|
||||
// ADVANCED TIER (indices 10-11) - both depend on view, not each other
|
||||
{ "name": "User can filter todos", "depends_on_indices": [7] },
|
||||
{ "name": "User can search todos", "depends_on_indices": [7] }
|
||||
// CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
|
||||
{ "name": "User can create todo", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
|
||||
{ "name": "User can view todos", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
|
||||
{ "name": "User can edit todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
|
||||
{ "name": "User can delete todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
|
||||
|
||||
// ADVANCED TIER (indices 15-16) - both depend on view, not each other
|
||||
{ "name": "User can filter todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] },
|
||||
{ "name": "User can search todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] }
|
||||
]
|
||||
```
|
||||
|
||||
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
|
||||
**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
|
||||
|
||||
**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
|
||||
|
||||
| Index | Name | Test Steps |
|
||||
|-------|------|------------|
|
||||
| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
|
||||
| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
|
||||
| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
|
||||
| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
|
||||
| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
|
||||
|
||||
**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
|
||||
|
||||
### Infrastructure Feature Descriptions
|
||||
|
||||
**Feature 0 - Database connection established:**
|
||||
```text
|
||||
Steps:
|
||||
1. Start the development server
|
||||
2. Check server logs for database connection message
|
||||
3. Call health endpoint (e.g., GET /api/health)
|
||||
4. Verify response includes database status: connected
|
||||
```
|
||||
|
||||
**Feature 1 - Database schema applied correctly:**
|
||||
```text
|
||||
Steps:
|
||||
1. Connect to database directly (sqlite3, psql, etc.)
|
||||
2. List all tables in the database
|
||||
3. Verify tables match what's defined in app_spec.txt
|
||||
4. Verify key columns exist on each table
|
||||
```
|
||||
|
||||
**Feature 2 - Data persists across server restart (CRITICAL):**
|
||||
```text
|
||||
Steps:
|
||||
1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
|
||||
2. Verify data appears in API response (GET /api/items)
|
||||
3. STOP the server completely (kill by port to avoid killing unrelated Node processes):
|
||||
- Unix/macOS: lsof -ti :$PORT | xargs kill -9 2>/dev/null || true && sleep 5
|
||||
- Windows: FOR /F "tokens=5" %a IN ('netstat -aon ^| find ":$PORT"') DO taskkill /F /PID %a 2>nul
|
||||
- Note: Replace $PORT with actual port (e.g., 3000)
|
||||
4. Verify server is stopped: lsof -ti :$PORT returns nothing (or netstat on Windows)
|
||||
5. RESTART the server: ./init.sh & sleep 15
|
||||
6. Query API again: GET /api/items
|
||||
7. Verify "RESTART_TEST_12345" still exists
|
||||
8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
|
||||
9. Clean up test data
|
||||
```
|
||||
|
||||
**Feature 3 - No mock data patterns in codebase:**
|
||||
```text
|
||||
Steps:
|
||||
1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
|
||||
2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/
|
||||
3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/
|
||||
4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/
|
||||
5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/
|
||||
6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null
|
||||
7. Run: grep -E "json-server|miragejs|msw" package.json
|
||||
8. ALL grep commands must return empty (exit code 1)
|
||||
9. If any returns results → investigate and fix before passing
|
||||
```
|
||||
|
||||
**Feature 4 - Backend API queries real database:**
|
||||
```text
|
||||
Steps:
|
||||
1. Start server with verbose logging
|
||||
2. Make API call (e.g., GET /api/items)
|
||||
3. Check server logs
|
||||
4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
|
||||
5. If no DB queries in logs → implementation is using mock data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -115,8 +199,9 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
|
||||
|
||||
### Category Distribution by Complexity Tier
|
||||
|
||||
| Category | Simple | Medium | Complex |
|
||||
| Category | Simple | Medium | Advanced |
|
||||
| -------------------------------- | ------- | ------- | -------- |
|
||||
| **0. Infrastructure (REQUIRED)** | 5 | 5 | 5 |
|
||||
| A. Security & Access Control | 5 | 20 | 40 |
|
||||
| B. Navigation Integrity | 15 | 25 | 40 |
|
||||
| C. Real Data Verification | 20 | 30 | 50 |
|
||||
@@ -137,12 +222,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
|
||||
| R. Concurrency & Race Conditions | 5 | 8 | 15 |
|
||||
| S. Export/Import | 5 | 6 | 10 |
|
||||
| T. Performance | 5 | 5 | 10 |
|
||||
| **TOTAL** | **150** | **250** | **400+** |
|
||||
| **TOTAL** | **165** | **265** | **405+** |
|
||||
|
||||
---
|
||||
|
||||
### Category Descriptions
|
||||
|
||||
**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
|
||||
|
||||
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
|
||||
|
||||
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
|
||||
@@ -205,6 +292,16 @@ The feature_list.json must include tests that **actively verify real data** and
|
||||
- `setTimeout` simulating API delays with static data
|
||||
- Static returns instead of database queries
|
||||
|
||||
**Additional prohibited patterns (in-memory stores):**
|
||||
|
||||
- `globalThis.` (in-memory storage pattern)
|
||||
- `dev-store`, `devStore`, `DevStore` (development stores)
|
||||
- `json-server`, `mirage`, `msw` (mock backends)
|
||||
- `Map()` or `Set()` used as primary data store
|
||||
- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
|
||||
|
||||
**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
|
||||
|
||||
---
|
||||
|
||||
**CRITICAL INSTRUCTION:**
|
||||
|
||||
Reference in New Issue
Block a user