mirror of
https://github.com/bmad-code-org/BMAD-METHOD.git
synced 2026-01-30 04:32:02 +00:00
Add an optional also_consider parameter that allows callers to pass domain-specific areas to keep in mind during review. This gently nudges the reviewer toward specific concerns without overriding normal analysis. Testing showed: - Specific items steer strongly (questions get directly answered) - Domain-focused items shift the lens (e.g., security focus = deeper security findings) - Vague items have minimal effect (similar to baseline) - Single items nudge without dominating - Contradictory items handled gracefully Includes test cases with sample content and 10 configurations to validate the parameter behavior across different use cases. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Brian <bmadcode@gmail.com>
1.7 KiB
1.7 KiB
Adversarial Review Test Suite
Tests for the also_consider optional input in review-adversarial-general.xml.
Purpose
Evaluate whether the also_consider input gently nudges the reviewer toward specific areas without overriding normal adversarial analysis.
Test Content
All tests use sample-content.md - a deliberately imperfect User Authentication API doc with:
- Vague error handling section
- Missing rate limit details
- No token expiration info
- Password in plain text example
- Missing authentication headers
- No error response examples
Running Tests
For each test case in test-cases.yaml, invoke the adversarial review task.
Manual Test Invocation
Review this content using the adversarial review task:
<content>
[paste sample-content.md]
</content>
<also_consider>
[paste items from test case, or omit for TC01]
</also_consider>
Evaluation Criteria
For each test, note:
- Total findings - Still hitting ~10 issues?
- Distribution - Are findings spread across concerns or clustered?
- Relevance - Do findings relate to
also_consideritems when provided? - Balance - Are
also_considerfindings elevated over others, or naturally mixed? - Quality - Are findings actionable regardless of source?
Expected Outcomes
- TC01 (baseline): Generic spread of findings
- TC02-TC05 (domain-focused): Some findings align with domain, others still organic
- TC06 (single item): Light influence, not dominant
- TC07 (vague items): Minimal change from baseline
- TC08 (specific items): Direct answers if gaps exist
- TC09 (mixed): Balanced across domains
- TC10 (contradictory): Graceful handling