mirror of https://github.com/bmad-code-org/BMAD-METHOD.git synced 2026-01-30 04:32:02 +00:00

Files

Alex Verkhovsky aad132c9b1 feat: add optional also_consider input to adversarial review task (#1371 )

Add an optional also_consider parameter that allows callers to pass
domain-specific areas to keep in mind during review. This gently nudges
the reviewer toward specific concerns without overriding normal analysis.

Testing showed:
- Specific items steer strongly (questions get directly answered)
- Domain-focused items shift the lens (e.g., security focus = deeper security findings)
- Vague items have minimal effect (similar to baseline)
- Single items nudge without dominating
- Contradictory items handled gracefully

Includes test cases with sample content and 10 configurations to validate
the parameter behavior across different use cases.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Brian <bmadcode@gmail.com>

2026-01-22 22:26:25 -06:00

1.7 KiB

Raw Blame History

Adversarial Review Test Suite

Tests for the also_consider optional input in review-adversarial-general.xml.

Purpose

Evaluate whether the also_consider input gently nudges the reviewer toward specific areas without overriding normal adversarial analysis.

Test Content

All tests use sample-content.md - a deliberately imperfect User Authentication API doc with:

Vague error handling section
Missing rate limit details
No token expiration info
Password in plain text example
Missing authentication headers
No error response examples

Running Tests

For each test case in test-cases.yaml, invoke the adversarial review task.

Manual Test Invocation

Review this content using the adversarial review task:

<content>
[paste sample-content.md]
</content>

<also_consider>
[paste items from test case, or omit for TC01]
</also_consider>

Evaluation Criteria

For each test, note:

Total findings - Still hitting ~10 issues?
Distribution - Are findings spread across concerns or clustered?
Relevance - Do findings relate to also_consider items when provided?
Balance - Are also_consider findings elevated over others, or naturally mixed?
Quality - Are findings actionable regardless of source?

Expected Outcomes

TC01 (baseline): Generic spread of findings
TC02-TC05 (domain-focused): Some findings align with domain, others still organic
TC06 (single item): Light influence, not dominant
TC07 (vague items): Minimal change from baseline
TC08 (specific items): Direct answers if gaps exist
TC09 (mixed): Balanced across domains
TC10 (contradictory): Graceful handling

1.7 KiB Raw Blame History