* docs: radical reduction of documentation scope for v6 beta Archive and basement unreviewed content to ship a focused, minimal doc set. Changes: - Archive stale how-to workflow guides (will rewrite for v6) - Archive outdated explanation and reference content - Move unreviewed content to basement for later review - Reorganize TEA docs into dedicated /tea/ section - Add workflow-map visual reference page - Simplify getting-started tutorial and sidebar navigation - Add explanation pages: brainstorming, adversarial-review, party-mode, quick-flow, advanced-elicitation - Fix base URL handling for subdirectory deployments (GitHub Pages forks) The goal is a minimal, accurate doc set for beta rather than comprehensive but potentially misleading content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: restructure BMM and agents documentation by consolidating and flattening index files. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
5.1 KiB
title, description
| title | description |
|---|---|
| AI-Generated Testing: Why Most Approaches Fail | How Playwright-Utils, TEA workflows, and Playwright MCPs solve AI test quality problems |
AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining three components: Playwright-Utils, TEA (Test Architect), and Playwright MCPs.
:::note[Source] This article is adapted from The Testing Meta Most Teams Have Not Caught Up To Yet by Murat K Ozcan. :::
The Problem with AI-Generated Tests
When teams use AI to generate tests without structure, they often produce what can be called "slop factory" outputs:
| Issue | Description |
|---|---|
| Redundant coverage | Multiple tests covering the same functionality |
| Incorrect assertions | Tests that pass but don't actually verify behavior |
| Flaky tests | Non-deterministic tests that randomly pass or fail |
| Unreviewable diffs | Generated code too verbose or inconsistent to review |
The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.
:::caution[The Paradox] AI excels at generating code quickly, but testing requires precision and consistency. Without guardrails, AI-generated tests amplify the chaos they're meant to prevent. :::
The Solution: A Three-Part Stack
The solution combines three components that work together to enforce quality:
Playwright-Utils
Bridges the gap between Cypress ergonomics and Playwright's capabilities by standardizing commonly reinvented primitives through utility functions.
| Utility | Purpose |
|---|---|
| api-request | API calls with schema validation |
| auth-session | Authentication handling |
| intercept-network-call | Network mocking and interception |
| recurse | Retry logic and polling |
| log | Structured logging |
| network-recorder | Record and replay network traffic |
| burn-in | Smart test selection for CI |
| network-error-monitor | HTTP error detection |
| file-utils | CSV/PDF handling |
These utilities eliminate the need to reinvent authentication, API calls, retries, and logging for every project.
TEA (Test Architect Agent)
A quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.
| Workflow | Purpose |
|---|---|
test-design |
Risk-based test planning per epic |
framework |
Scaffold production-ready test infrastructure |
ci |
CI pipeline with selective testing |
atdd |
Acceptance test-driven development |
automate |
Prioritized test automation |
test-review |
Test quality audits (0-100 score) |
nfr-assess |
Non-functional requirements assessment |
trace |
Coverage traceability and gate decisions |
:::tip[Key Insight] TEA doesn't just generate tests—it provides a complete quality operating model with workflows for planning, execution, and release gates. :::
Playwright MCPs
Model Context Protocols enable real-time verification during test generation. Instead of inferring selectors and behavior from documentation, MCPs allow agents to:
- Run flows and confirm the DOM against the accessibility tree
- Validate network responses in real-time
- Discover actual functionality through interactive exploration
- Verify generated tests against live applications
How They Work Together
The three components form a quality pipeline:
| Stage | Component | Action |
|---|---|---|
| Standards | Playwright-Utils | Provides production-ready patterns and utilities |
| Process | TEA Workflows | Enforces systematic test planning and review |
| Verification | Playwright MCPs | Validates generated tests against live applications |
Before (AI-only): 20 tests with redundant coverage, incorrect assertions, and flaky behavior.
After (Full Stack): Risk-based selection, verified selectors, validated behavior, reviewable code.
Why This Matters
Traditional AI testing approaches fail because they:
- Lack quality standards — No consistent patterns or utilities
- Skip planning — Jump straight to test generation without risk assessment
- Can't verify — Generate tests without validating against actual behavior
- Don't review — No systematic audit of generated test quality
The three-part stack addresses each gap:
| Gap | Solution |
|---|---|
| No standards | Playwright-Utils provides production-ready patterns |
| No planning | TEA test-design creates risk-based test plans |
| No verification | Playwright MCPs validate against live applications |
| No review | TEA test-review audits quality with scoring |
This approach is sometimes called context engineering—loading domain-specific standards into AI context automatically rather than relying on prompts alone. TEA's tea-index.csv manifest loads relevant knowledge fragments so the AI doesn't relearn testing patterns each session.