mirror of
https://github.com/anthropics/claude-plugins-official.git
synced 2026-03-20 11:33:08 +00:00
The skill that addresses the Proof-or-Bluff gap: self-verified 85.7% IMO becomes <5% under human grading. Uses fresh-context verifiers armed with specific failure patterns (not generic 'check logic'). Validated: 17/18 IMO+Putnam 2025 solved, 0 false positives, 2 novel proofs. See eval data in anthropic monorepo sandbox/sandbox/ralph/math_skills/.