Commit Graph

1 Commits

Author SHA1 Message Date
Ralph Furman
3c9bf4ff5d Add math-olympiad skill — adversarial verification for competition math
The skill that addresses the Proof-or-Bluff gap: self-verified 85.7% IMO
becomes <5% under human grading. Uses fresh-context verifiers armed with
specific failure patterns (not generic 'check logic').

Validated: 17/18 IMO+Putnam 2025 solved, 0 false positives, 2 novel proofs.
See eval data in anthropic monorepo sandbox/sandbox/ralph/math_skills/.
2026-03-19 23:17:36 +00:00