Benchmarking

This page defines the public benchmarking posture for the byteor-enterprise workspace.

The checked-in benchmark surface lives under bench/byteor-enterprise-bench, with regression thresholds in bench/perf_baseline.kv and automation through cargo run -p xtask -- perf.

What this page covers

  • how enterprise benchmarks should be interpreted publicly
  • which checked-in harnesses back published numbers
  • what context must accompany product-level performance claims

Public benchmark posture

Enterprise benchmark numbers are reproducible baseline evidence for product surfaces such as EdgePlane, ActionGraph, DataGuard, and SingleRing-backed execution.

They are not hosted-service SLOs and they are not a substitute for operator readiness checks.

As noted in Operator Flow, doctor output is a contract check, not a benchmark.

Current harness and gate

The public enterprise benchmark flow is:

  1. run the harness with cargo run -p byteor-enterprise-bench for developer iteration or cargo run -p byteor-enterprise-bench -- --machine for lower-noise machine output
  2. enforce or refresh the checked-in gate with cargo run -p xtask -- perf and cargo run -p xtask -- perf --update
  3. compare against bench/perf_baseline.kv, which stores conservative throughput, p99, and CPU thresholds for each enterprise bench name

The harness supports the same tuning surfaces the runtime uses, including memlock, CPU pinning, scheduler policy, and RT priority.

Bench families currently covered

The checked-in enterprise baseline currently includes:

  • edgeplane_identity_stage
  • actiongraph_http_post_dry_run
  • dataguard_chain_mapkv
  • dataguard_fused_mapkv
  • single_ring_chain
  • single_ring_dag
  • single_ring_sharded

Use those bench names directly when reporting results so readers can line them up with the checked-in baseline.

Minimum methodology disclosure

When you publish or compare enterprise numbers, include at least:

  • the exact bench name and product surface
  • whether the run used default or machine mode
  • CPU allowlist, pinning preset, scheduler policy, and RT priority if any
  • memlock posture, tmpdir backing, and host/kernel details
  • throughput plus tail latency and CPU cost

If the run depends on Linux privilege posture, say so explicitly. mlockall, SCHED_FIFO, and SCHED_RR are meaningful only when the host limits and capabilities allow them.

Reading the baseline safely

bench/perf_baseline.kv is a regression gate, not a marketing table.

  • throughput thresholds are lower bounds
  • p99 and CPU thresholds are upper bounds
  • the stored numbers include headroom so the gate remains stable across reasonable CI and lab noise

Use the file to detect regressions and to compare host postures. Do not reuse it as a universal promise for unrelated deployments.

Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.