Benchmarking

This page explains how the ByteOr OSS workspace publishes benchmark evidence and how to compare those numbers safely.

The checked-in benchmark surface for this repo lives in bench/byteor-bench, with conservative regression thresholds in bench/perf_baseline.kv and automation under cargo run -p xtask -- perf.

What these numbers are for

  • catch regressions in the transport and execution hot paths
  • compare like-for-like topology changes in a reproducible way
  • publish conservative baseline evidence for the OSS runtime surface

They are not blanket production guarantees. Treat them as harness-backed baselines for a stated topology and host posture.

Public benchmark surface

The current public OSS benchmark flow is:

  1. run the harness with cargo run -p byteor-bench --release for quick smoke output or cargo run -p byteor-bench --release -- --machine for machine-parseable results
  2. enforce or refresh the checked-in baseline with cargo run -p xtask -- perf and cargo run -p xtask -- perf --update
  3. compare the run against bench/perf_baseline.kv, which stores throughput, p99 latency, and CPU budgets for each bench name

The baseline is intentionally conservative because CI and shared lab hosts vary.

Current bench families

The OSS harness currently covers:

  • event-lane roundtrip and slot-forward paths
  • fan-in and MPSC queue shapes over SHM backing
  • SingleRing chain, DAG, and sharded execution shapes
  • startup-inclusive versus steady-state SingleRing comparisons

Use the bench names from the machine output when reporting results so they line up with the checked-in baseline keys.

Methodology expectations

When you publish or compare numbers, record at least:

  • the exact bench name and payload shape
  • whether the run is startup-inclusive or steady-state
  • SHM backing location and whether mappings were prefaulted
  • CPU placement, scheduler policy, memlock posture, and NUMA locality
  • throughput plus tail latency, not throughput alone

If you change host posture materially, document it next to the result. CPU governor changes, huge-page setup, BYTEOR_BENCH_TMPDIR, BYTEOR_BENCH_SHM_PREFAULT, and BYTEOR_BENCH_SINGLE_RING_MODE can all move the outcome.

How to read the baseline file

bench/perf_baseline.kv stores three limits per bench:

  • min_ops_per_s: lower bound for throughput
  • max_p99_ns_per_op: upper bound for tail latency
  • max_cpu_ns_per_op: upper bound for CPU cost per operation

xtask perf enforces those with headroom rather than expecting exact reproduction. That makes the file useful as a regression gate without pretending every host is identical.

Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.