§ Benchmarks

Benchmarks that label their mode.

SecHive publishes benchmark results with explicit methodology: which run mode, what target version, how long, what counted as a win, what artifacts back the claim.

§ Methodology

How a benchmark counts.

Every result published on this site uses the same methodology spine: scope, run mode, target version, time budget, scoring rule, and proof retention.

FieldDefinitionWhy it matters
Run modeblack-box, source-enabled, fully source-aware, CTFComparing across modes is the most common public-data error.
Target versioncommit hash, image digest, lab tagTargets evolve; "Juice Shop" alone is not a benchmark.
Time budgetwall-clock and per-case capsSpeed claims without a budget are not comparable.
Scoring ruleany-win, full-win, source-aware-objectiveThe same campaign can show different rates under different rules.
Proof retentionartifact set retained per caseWithout retention there is no audit trail to inspect.
Negative evidencerefutations kept with the caseFailed attempts inform the next run; they are not noise.