SecHive publishes benchmark results with explicit methodology: which run mode, what target version, how long, what counted as a win, what artifacts back the claim.
Paired black-box and source-enabled best-of campaign across 104 cases. 95.19% black-box wins, 100% white-box wins, and no no-win cases.
Full unredacted black-box and white-box reports against the current 111-challenge tree. Reproducible.
Game of Active Directory. Lab framework loaded, scoring schema published. First public results scheduled for 2026.Q3.
OWASP WrongSecrets challenge tree. Aggregator and runner integrated; first public scorecard scheduled for 2026.Q3.
Every result published on this site uses the same methodology spine: scope, run mode, target version, time budget, scoring rule, and proof retention.
| Field | Definition | Why it matters |
|---|---|---|
| Run mode | black-box, source-enabled, fully source-aware, CTF | Comparing across modes is the most common public-data error. |
| Target version | commit hash, image digest, lab tag | Targets evolve; "Juice Shop" alone is not a benchmark. |
| Time budget | wall-clock and per-case caps | Speed claims without a budget are not comparable. |
| Scoring rule | any-win, full-win, source-aware-objective | The same campaign can show different rates under different rules. |
| Proof retention | artifact set retained per case | Without retention there is no audit trail to inspect. |
| Negative evidence | refutations kept with the case | Failed attempts inform the next run; they are not noise. |