Proof / XBOW-style benchmark campaign
104 recorded cases. Black-box and white-box. Every win retained an artifact set; every black-box gap retained a refutation log; the methodology spine was published before the score.