mirror of https://github.com/bmad-code-org/BMAD-METHOD.git synced 2026-01-30 04:32:02 +00:00

Files

murat 4001b175bd docs: fix docs build

2026-01-15 14:40:56 -06:00

25 KiB

Raw Blame History

title, description

title	description
Test Quality Standards Explained	Understanding TEA's Definition of Done for deterministic, isolated, and maintainable tests

Test Quality Standards Explained

Test quality standards define what makes a test "good" in TEA. These aren't suggestions - they're the Definition of Done that prevents tests from rotting in review.

Overview

TEA's Quality Principles:

Deterministic - Same result every run
Isolated - No dependencies on other tests
Explicit - Assertions visible in test body
Focused - Single responsibility, appropriate size
Fast - Execute in reasonable time

Why these matter: Tests that violate these principles create maintenance burden, slow down development, and lose team trust.

The Problem

Tests That Rot in Review

// ❌ The anti-pattern: This test will rot
test('user can do stuff', async ({ page }) => {
  await page.goto('/');
  await page.waitForTimeout(5000);  // Non-deterministic

  if (await page.locator('.banner').isVisible()) {  // Conditional
    await page.click('.dismiss');
  }

  try {  // Try-catch for flow control
    await page.click('#load-more');
  } catch (e) {
    // Silently continue
  }

  // ... 300 more lines of test logic
  // ... no clear assertions
});

What's wrong:

Hard wait - Flaky, wastes time
Conditional - Non-deterministic behavior
Try-catch - Hides failures
Too large - Hard to maintain
Vague name - Unclear purpose
No explicit assertions - What's being tested?

Result: PR review comments: "This test is flaky, please fix" → never merged → test deleted → coverage lost

AI-Generated Tests Without Standards

AI-generated tests without quality guardrails:

// AI generates 50 tests like this:
test('test1', async ({ page }) => {
  await page.goto('/');
  await page.waitForTimeout(3000);
  // ... flaky, vague, redundant
});

test('test2', async ({ page }) => {
  await page.goto('/');
  await page.waitForTimeout(3000);
  // ... duplicates test1
});

// ... 48 more similar tests

Result: 50 tests, 80% redundant, 90% flaky, 0% trusted by team - low-quality outputs that create maintenance burden.

The Solution: TEA's Quality Standards

1. Determinism (No Flakiness)

Rule: Test produces same result every run.

Requirements:

❌ No hard waits (waitForTimeout)
❌ No conditionals for flow control (if/else)
❌ No try-catch for flow control
✅ Use network-first patterns (wait for responses)
✅ Use explicit waits (waitForSelector, waitForResponse)

Bad Example:

test('flaky test', async ({ page }) => {
  await page.click('button');
  await page.waitForTimeout(2000);  // ❌ Might be too short

  if (await page.locator('.modal').isVisible()) {  // ❌ Non-deterministic
    await page.click('.dismiss');
  }

  try {  // ❌ Silently handles errors
    await expect(page.locator('.success')).toBeVisible();
  } catch (e) {
    // Test passes even if assertion fails!
  }
});

Good Example (Vanilla Playwright):

test('deterministic test', async ({ page }) => {
  const responsePromise = page.waitForResponse(
    resp => resp.url().includes('/api/submit') && resp.ok()
  );

  await page.click('button');
  await responsePromise;  // ✅ Wait for actual response

  // Modal should ALWAYS show (make it deterministic)
  await expect(page.locator('.modal')).toBeVisible();
  await page.click('.dismiss');

  // Explicit assertion (fails if not visible)
  await expect(page.locator('.success')).toBeVisible();
});

With Playwright Utils (Even Cleaner):

import { test } from '@seontechnologies/playwright-utils/fixtures';
import { expect } from '@playwright/test';

test('deterministic test', async ({ page, interceptNetworkCall }) => {
  const submitCall = interceptNetworkCall({
    method: 'POST',
    url: '**/api/submit'
  });

  await page.click('button');

  // Wait for actual response (automatic JSON parsing)
  const { status, responseJson } = await submitCall;
  expect(status).toBe(200);

  // Modal should ALWAYS show (make it deterministic)
  await expect(page.locator('.modal')).toBeVisible();
  await page.click('.dismiss');

  // Explicit assertion (fails if not visible)
  await expect(page.locator('.success')).toBeVisible();
});

Why both work:

Waits for actual event (network response)
No conditionals (behavior is deterministic)
Assertions fail loudly (no silent failures)
Same result every run (deterministic)

Playwright Utils additional benefits:

Automatic JSON parsing
{ status, responseJson } structure (can validate response data)
No manual await response.json()

2. Isolation (No Dependencies)

Rule: Test runs independently, no shared state.

Requirements:

✅ Self-cleaning (cleanup after test)
✅ No global state dependencies
✅ Can run in parallel
✅ Can run in any order
✅ Use unique test data

Bad Example:

// ❌ Tests depend on execution order
let userId: string;  // Shared global state

test('create user', async ({ apiRequest }) => {
  const { body } = await apiRequest({
    method: 'POST',
    path: '/api/users',
    body: { email: 'test@example.com' }   (hard-coded)
  });
  userId = body.id;  // Store in global
});

test('update user', async ({ apiRequest }) => {
  // Depends on previous test setting userId
  await apiRequest({
    method: 'PATCH',
    path: `/api/users/${userId}`,
    body: { name: 'Updated' }  
  });
  // No cleanup - leaves user in database
});

Problems:

Tests must run in order (can't parallelize)
Second test fails if first skipped (.only)
Hard-coded data causes conflicts
No cleanup (database fills with test data)

Good Example (Vanilla Playwright):

test('should update user profile', async ({ request }) => {
  // Create unique test data
  const testEmail = `test-${Date.now()}@example.com`;

  // Setup: Create user
  const createResp = await request.post('/api/users', {
    data: { email: testEmail, name: 'Original' }
  });
  const user = await createResp.json();

  // Test: Update user
  const updateResp = await request.patch(`/api/users/${user.id}`, {
    data: { name: 'Updated' }
  });
  const updated = await updateResp.json();

  expect(updated.name).toBe('Updated');

  // Cleanup: Delete user
  await request.delete(`/api/users/${user.id}`);
});

Even Better (With Playwright Utils):

import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
import { expect } from '@playwright/test';
import { faker } from '@faker-js/faker';

test('should update user profile', async ({ apiRequest }) => {
  // Dynamic unique test data
  const testEmail = faker.internet.email();

  // Setup: Create user
  const { status: createStatus, body: user } = await apiRequest({
    method: 'POST',
    path: '/api/users',
    body: { email: testEmail, name: faker.person.fullName() }  
  });

  expect(createStatus).toBe(201);

  // Test: Update user
  const { status, body: updated } = await apiRequest({
    method: 'PATCH',
    path: `/api/users/${user.id}`,
    body: { name: 'Updated Name' }  
  });

  expect(status).toBe(200);
  expect(updated.name).toBe('Updated Name');

  // Cleanup: Delete user
  await apiRequest({
    method: 'DELETE',
    path: `/api/users/${user.id}`
  });
});

Playwright Utils Benefits:

{ status, body } destructuring (cleaner than response.status() + await response.json())
No manual await response.json()
Automatic retry for 5xx errors
Optional schema validation with .validateSchema()

Why it works:

No global state
Unique test data (no conflicts)
Self-cleaning (deletes user)
Can run in parallel
Can run in any order

3. Explicit Assertions (No Hidden Validation)

Rule: Assertions visible in test body, not abstracted.

Requirements:

✅ Assertions in test code (not helper functions)
✅ Specific assertions (not generic toBeTruthy)
✅ Meaningful expectations (test actual behavior)

Bad Example:

// ❌ Assertions hidden in helper
async function verifyProfilePage(page: Page) {
  // Assertions buried in helper (not visible in test)
  await expect(page.locator('h1')).toBeVisible();
  await expect(page.locator('.email')).toContainText('@');
  await expect(page.locator('.name')).not.toBeEmpty();
}

test('profile page', async ({ page }) => {
  await page.goto('/profile');
  await verifyProfilePage(page);  // What's being verified?
});

Problems:

Can't see what's tested (need to read helper)
Hard to debug failures (which assertion failed?)
Reduces test readability
Hides important validation

Good Example:

// ✅ Assertions explicit in test
test('should display profile with correct data', async ({ page }) => {
  await page.goto('/profile');

  // Explicit assertions - clear what's tested
  await expect(page.locator('h1')).toContainText('Test User');
  await expect(page.locator('.email')).toContainText('test@example.com');
  await expect(page.locator('.bio')).toContainText('Software Engineer');
  await expect(page.locator('img[alt="Avatar"]')).toBeVisible();
});

Why it works:

See what's tested at a glance
Debug failures easily (know which assertion failed)
Test is self-documenting
No hidden behavior

Exception: Use helper for setup/cleanup, not assertions.

4. Focused Tests (Appropriate Size)

Rule: Test has single responsibility, reasonable size.

Requirements:

✅ Test size < 300 lines
✅ Single responsibility (test one thing well)
✅ Clear describe/test names
✅ Appropriate scope (not too granular, not too broad)

Bad Example:

// ❌ 500-line test testing everything
test('complete user flow', async ({ page }) => {
  // Registration (50 lines)
  await page.goto('/register');
  await page.fill('#email', 'test@example.com');
  // ... 48 more lines

  // Profile setup (100 lines)
  await page.goto('/profile');
  // ... 98 more lines

  // Settings configuration (150 lines)
  await page.goto('/settings');
  // ... 148 more lines

  // Data export (200 lines)
  await page.goto('/export');
  // ... 198 more lines

  // Total: 500 lines, testing 4 different features
});

Problems:

Failure in line 50 prevents testing lines 51-500
Hard to understand (what's being tested?)
Slow to execute (testing too much)
Hard to debug (which feature failed?)

Good Example:

// ✅ Focused tests - one responsibility each

test('should register new user', async ({ page }) => {
  await page.goto('/register');
  await page.fill('#email', 'test@example.com');
  await page.fill('#password', 'password123');
  await page.click('button[type="submit"]');

  await expect(page).toHaveURL('/welcome');
  await expect(page.locator('h1')).toContainText('Welcome');
});

test('should configure user profile', async ({ page, authSession }) => {
  await authSession.login({ email: 'test@example.com', password: 'pass' });
  await page.goto('/profile');

  await page.fill('#name', 'Test User');
  await page.fill('#bio', 'Software Engineer');
  await page.click('button:has-text("Save")');

  await expect(page.locator('.success')).toBeVisible();
});

// ... separate tests for settings, export (each < 50 lines)

Why it works:

Each test has one responsibility
Failure is easy to diagnose
Can run tests independently
Test names describe exactly what's tested

5. Fast Execution (Performance Budget)

Rule: Individual test executes in < 1.5 minutes.

Requirements:

✅ Test execution < 90 seconds
✅ Efficient selectors (getByRole > XPath)
✅ Minimal redundant actions
✅ Parallel execution enabled

Bad Example:

// ❌ Slow test (3+ minutes)
test('slow test', async ({ page }) => {
  await page.goto('/');
  await page.waitForTimeout(10000);  // 10s wasted

  // Navigate through 10 pages (2 minutes)
  for (let i = 1; i <= 10; i++) {
    await page.click(`a[href="/page-${i}"]`);
    await page.waitForTimeout(5000);  // 5s per page = 50s wasted
  }

  // Complex XPath selector (slow)
  await page.locator('//div[@class="container"]/section[3]/div[2]/p').click();

  // More waiting
  await page.waitForTimeout(30000);  // 30s wasted

  await expect(page.locator('.result')).toBeVisible();
});

Total time: 3+ minutes (95 seconds wasted on hard waits)

Good Example (Vanilla Playwright):

// ✅ Fast test (< 10 seconds)
test('fast test', async ({ page }) => {
  // Set up response wait
  const apiPromise = page.waitForResponse(
    resp => resp.url().includes('/api/result') && resp.ok()
  );

  await page.goto('/');

  // Direct navigation (skip intermediate pages)
  await page.goto('/page-10');

  // Efficient selector
  await page.getByRole('button', { name: 'Submit' }).click();

  // Wait for actual response (fast when API is fast)
  await apiPromise;

  await expect(page.locator('.result')).toBeVisible();
});

With Playwright Utils:

import { test } from '@seontechnologies/playwright-utils/fixtures';
import { expect } from '@playwright/test';

test('fast test', async ({ page, interceptNetworkCall }) => {
  // Set up interception
  const resultCall = interceptNetworkCall({
    method: 'GET',
    url: '**/api/result'
  });

  await page.goto('/');

  // Direct navigation (skip intermediate pages)
  await page.goto('/page-10');

  // Efficient selector
  await page.getByRole('button', { name: 'Submit' }).click();

  // Wait for actual response (automatic JSON parsing)
  const { status, responseJson } = await resultCall;

  expect(status).toBe(200);
  await expect(page.locator('.result')).toBeVisible();

  // Can also validate response data if needed
  // expect(responseJson.data).toBeDefined();
});

Total time: < 10 seconds (no wasted waits)

Both examples achieve:

No hard waits (wait for actual events)
Direct navigation (skip unnecessary steps)
Efficient selectors (getByRole)
Fast execution

Playwright Utils bonus:

Can validate API response data easily
Automatic JSON parsing
Cleaner API

TEA's Quality Scoring

TEA reviews tests against these standards in *test-review:

Scoring Categories (100 points total)

Determinism (35 points):

No hard waits: 10 points
No conditionals: 10 points
No try-catch flow: 10 points
Network-first patterns: 5 points

Isolation (25 points):

Self-cleaning: 15 points
No global state: 5 points
Parallel-safe: 5 points

Assertions (20 points):

Explicit in test body: 10 points
Specific and meaningful: 10 points

Structure (10 points):

Test size < 300 lines: 5 points
Clear naming: 5 points

Performance (10 points):

Execution time < 1.5 min: 10 points

Quality Scoring Breakdown

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
pie title Test Quality Score (100 points)
    "Determinism" : 35
    "Isolation" : 25
    "Assertions" : 20
    "Structure" : 10
    "Performance" : 10

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'13px'}}}%%
flowchart LR
    subgraph Det[Determinism - 35 pts]
        D1[No hard waits<br/>10 pts]
        D2[No conditionals<br/>10 pts]
        D3[No try-catch flow<br/>10 pts]
        D4[Network-first<br/>5 pts]
    end

    subgraph Iso[Isolation - 25 pts]
        I1[Self-cleaning<br/>15 pts]
        I2[No global state<br/>5 pts]
        I3[Parallel-safe<br/>5 pts]
    end

    subgraph Assrt[Assertions - 20 pts]
        A1[Explicit in body<br/>10 pts]
        A2[Specific/meaningful<br/>10 pts]
    end

    subgraph Struct[Structure - 10 pts]
        S1[Size < 300 lines<br/>5 pts]
        S2[Clear naming<br/>5 pts]
    end

    subgraph Perf[Performance - 10 pts]
        P1[Time < 1.5 min<br/>10 pts]
    end

    Det --> Total([Total: 100 points])
    Iso --> Total
    Assrt --> Total
    Struct --> Total
    Perf --> Total

    style Det fill:#ffebee,stroke:#c62828,stroke-width:2px
    style Iso fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style Assrt fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px
    style Struct fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    style Perf fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style Total fill:#fff,stroke:#000,stroke-width:3px

Score Interpretation

Score	Interpretation	Action
90-100	Excellent	Production-ready, minimal changes
80-89	Good	Minor improvements recommended
70-79	Acceptable	Address recommendations before release
60-69	Needs Work	Fix critical issues
< 60	Critical	Significant refactoring needed

Comparison: Good vs Bad Tests

Bad Test (Score: 45/100):

test('login test', async ({ page }) => {  // Vague name
  await page.goto('/login');
  await page.waitForTimeout(3000);  // -10 (hard wait)

  await page.fill('[name="email"]', 'test@example.com');
  await page.fill('[name="password"]', 'password');

  if (await page.locator('.remember-me').isVisible()) {  // -10 (conditional)
    await page.click('.remember-me');
  }

  await page.click('button');

  try {  // -10 (try-catch flow)
    await page.waitForURL('/dashboard', { timeout: 5000 });
  } catch (e) {
    // Ignore navigation failure
  }

  // No assertions! -10
  // No cleanup! -10
});

Issues:

Determinism: 5/35 (hard wait, conditional, try-catch)
Isolation: 10/25 (no cleanup)
Assertions: 0/20 (no assertions!)
Structure: 15/10 (okay)
Performance: 5/10 (slow)
Total: 45/100

Good Test (Score: 95/100):

test('should login with valid credentials and redirect to dashboard', async ({ page, authSession }) => {
  // Use fixture for deterministic auth
  const loginPromise = page.waitForResponse(
    resp => resp.url().includes('/api/auth/login') && resp.ok()
  );

  await page.goto('/login');
  await page.getByLabel('Email').fill('test@example.com');
  await page.getByLabel('Password').fill('password123');
  await page.getByRole('button', { name: 'Sign in' }).click();

  // Wait for actual API response
  const response = await loginPromise;
  const { token } = await response.json();

  // Explicit assertions
  expect(token).toBeDefined();
  await expect(page).toHaveURL('/dashboard');
  await expect(page.getByText('Welcome back')).toBeVisible();

  // Cleanup handled by authSession fixture
});

Quality:

Determinism: 35/35 (network-first, no conditionals)
Isolation: 25/25 (fixture handles cleanup)
Assertions: 20/20 (explicit and specific)
Structure: 10/10 (clear name, focused)
Performance: 5/10 (< 1 min)
Total: 95/100

Example: API Testing

Bad Test (Score: 50/100):

test('api test', async ({ request }) => {
  const response = await request.post('/api/users', {
    data: { email: 'test@example.com' }  // Hard-coded (conflicts)
  });

  if (response.ok()) {  // Conditional
    const user = await response.json();
    // Weak assertion
    expect(user).toBeTruthy();
  }

  // No cleanup - user left in database
});

Good Test (Score: 92/100):

test('should create user with valid data', async ({ apiRequest }) => {
  // Unique test data
  const testEmail = `test-${Date.now()}@example.com`;

  // Create user
  const { status, body } = await apiRequest({
    method: 'POST',
    path: '/api/users',
    body: { email: testEmail, name: 'Test User' }
  });

  // Explicit assertions
  expect(status).toBe(201);
  expect(body.id).toBeDefined();
  expect(body.email).toBe(testEmail);
  expect(body.name).toBe('Test User');

  // Cleanup
  await apiRequest({
    method: 'DELETE',
    path: `/api/users/${body.id}`
  });
});

How TEA Enforces Standards

During Test Generation (`atdd`, `automate`)

TEA generates tests following standards by default:

// TEA-generated test (automatically follows standards)
test('should submit contact form', async ({ page }) => {
  // Network-first pattern (no hard waits)
  const submitPromise = page.waitForResponse(
    resp => resp.url().includes('/api/contact') && resp.ok()
  );

  // Accessible selectors (resilient)
  await page.getByLabel('Name').fill('Test User');
  await page.getByLabel('Email').fill('test@example.com');
  await page.getByLabel('Message').fill('Test message');
  await page.getByRole('button', { name: 'Send' }).click();

  const response = await submitPromise;
  const result = await response.json();

  // Explicit assertions
  expect(result.success).toBe(true);
  await expect(page.getByText('Message sent')).toBeVisible();

  // Size: 15 lines (< 300 ✓)
  // Execution: ~2 seconds (< 90s ✓)
});

During Test Review (*test-review)

TEA audits tests and flags violations:

## Critical Issues

### Hard Wait Detected (tests/login.spec.ts:23)
**Issue:** `await page.waitForTimeout(3000)`
**Score Impact:** -10 (Determinism)
**Fix:** Use network-first pattern

### Conditional Flow Control (tests/profile.spec.ts:45)
**Issue:** `if (await page.locator('.banner').isVisible())`
**Score Impact:** -10 (Determinism)
**Fix:** Make banner presence deterministic

## Recommendations

### Extract Fixture (tests/auth.spec.ts)
**Issue:** Login code repeated 5 times
**Score Impact:** -3 (Structure)
**Fix:** Extract to authSession fixture

Definition of Done Checklist

When is a test "done"?

Test Quality DoD:

No hard waits (waitForTimeout)
No conditionals for flow control
No try-catch for flow control
Network-first patterns used
Assertions explicit in test body
Test size < 300 lines
Clear, descriptive test name
Self-cleaning (cleanup in afterEach or test)
Unique test data (no hard-coded values)
Execution time < 1.5 minutes
Can run in parallel
Can run in any order

Code Review DoD:

Test quality score > 80
No critical issues from *test-review
Follows project patterns (fixtures, selectors)
Test reviewed by team member

Common Quality Issues

Issue: "My test needs conditionals for optional elements"

Wrong approach:

if (await page.locator('.banner').isVisible()) {
  await page.click('.dismiss');
}

Right approach - Make it deterministic:

// Option 1: Always expect banner
await expect(page.locator('.banner')).toBeVisible();
await page.click('.dismiss');

// Option 2: Test both scenarios separately
test('should show banner for new users', ...);
test('should not show banner for returning users', ...);

Issue: "My test needs try-catch for error handling"

Wrong approach:

try {
  await page.click('#optional-button');
} catch (e) {
  // Silently continue
}

Right approach - Make failures explicit:

// Option 1: Button should exist
await page.click('#optional-button');  // Fails loudly if missing

// Option 2: Button might not exist (test both)
test('should work with optional button', async ({ page }) => {
  const hasButton = await page.locator('#optional-button').count() > 0;
  if (hasButton) {
    await page.click('#optional-button');
  }
  // But now you're testing optional behavior explicitly
});

Issue: "Hard waits are easier than network patterns"

Short-term: Hard waits seem simpler Long-term: Flaky tests waste more time than learning network patterns

Investment:

30 minutes to learn network-first patterns
Prevents hundreds of hours debugging flaky tests
Tests run faster (no wasted waits)
Team trusts test suite

Technical Implementation

For detailed test quality patterns, see:

Core TEA Concepts:

Risk-Based Testing - Quality scales with risk
Knowledge Base System - How standards are enforced
Engagement Models - Quality in different models

Technical Patterns:

Network-First Patterns - Determinism explained
Fixture Architecture - Isolation through fixtures

Overview:

TEA Overview - Quality standards in lifecycle
Testing as Engineering - Why quality matters

Practical Guides

Workflow Guides:

How to Run Test Review - Audit against these standards
How to Run ATDD - Generate quality tests
How to Run Automate - Expand with quality

Use-Case Guides:

Using TEA with Existing Tests - Improve legacy quality
Running TEA for Enterprise - Enterprise quality thresholds

Reference

TEA Command Reference - *test-review command
Knowledge Base Index - Test quality fragment
Glossary - TEA terminology

Generated with BMad Method - TEA (Test Architect)

25 KiB Raw Blame History

Test Quality Standards Explained

Overview

The Problem

Tests That Rot in Review

AI-Generated Tests Without Standards

The Solution: TEA's Quality Standards

1. Determinism (No Flakiness)

2. Isolation (No Dependencies)

3. Explicit Assertions (No Hidden Validation)

4. Focused Tests (Appropriate Size)

5. Fast Execution (Performance Budget)

TEA's Quality Scoring

Scoring Categories (100 points total)

Quality Scoring Breakdown

Score Interpretation

Comparison: Good vs Bad Tests

Example: User Login

Example: API Testing

How TEA Enforces Standards

During Test Generation (*atdd, *automate)

During Test Review (*test-review)

Definition of Done Checklist

Common Quality Issues

Issue: "My test needs conditionals for optional elements"

Issue: "My test needs try-catch for error handling"

Issue: "Hard waits are easier than network patterns"

Technical Implementation

Related Concepts

Practical Guides

Reference

25 KiB

Raw Blame History

During Test Generation (`atdd`, `automate`)