Performance Tuning

This page covers performance tuning for IndexBus transport and ByteOr runtime deployments.

IndexBus Tuning

Wait Strategy

StrategyBehaviorWhen to Use
spinBusy-wait on the lane positionLowest latency, highest CPU cost
backoffExponential backoff between checksLower CPU cost, higher tail latency

Use spin for latency-sensitive production workloads with dedicated cores. Use backoff for development, testing, and shared-host deployments.

SHM Placement

  • Place SHM files on tmpfs for general use
  • Use hugetlbfs for large regions to reduce TLB pressure
  • Ensure SHM backing is on local storage, not network-mounted
  • Clean stale SHM files before starting new deployments

Lane Sizing

  • Size lane capacity based on expected burst size, not average throughput
  • Over-provisioning capacity wastes memory; under-provisioning causes backpressure
  • Monitor router counters for routing distribution and drop counts

Memory Locking

  • Use mlockall to prevent page faults in the hot path
  • Verify memory-lock limits with doctor
  • isolated-core profile requests memory locking by default

CPU Tuning

Pinning

ModeBehavior
noneNo CPU affinity
balancedSpread threads across available cores
physicalPin to specific physical cores

Scheduling

ModeBehavior
otherDefault Linux scheduling
fifoReal-time FIFO scheduling
rrReal-time round-robin scheduling

RT scheduling requires permissions. Verify with doctor.

Core Isolation

For isolated-core profile:

  • Use isolcpus kernel parameter to dedicate cores
  • Ensure no other workloads scheduled on isolated cores
  • Verify isolation with doctor

Monitoring

Router Counters

Monitor IndexBus router counters for:

  • Total routed messages
  • Per-output distribution
  • Drop counts under pressure

Runtime Metrics

Cloud exposes metrics at /metrics including:

  • Request counters by route
  • Auth success/failure rates
  • Rate limiting events
  • Worker job states

Agent Telemetry

Monitor agent-reported metrics:

  • Heartbeat intervals
  • Applied vs. requested tuning
  • Degraded tuning reasons
  • Artifact upload success rates

Benchmarking

Use the baseline benchmark suite for reproducible measurements:

  • Run on isolated hardware for consistent results
  • Compare against the published baseline numbers
  • Report both throughput and latency percentiles (p50, p99, p99.9)
  • Document the exact hardware and kernel configuration used

Performance Baselines

Conservative CI-safe thresholds from bench/perf_baseline.kv (OSS readiness gate):

BenchmarkMin ops/sMax p99 (ns/op)Max CPU (ns/op)
Events roundtrip (SPSC SHM)7,936,804114,306246
Events slot-forward (3-hop)4,345,8742,094,488530
Events fan-in (4P→1C SPSC)5,438,9722,194,482582
Events MPSC (4P→1C)4,000,2923,649,227683
SingleRing chain A→B (2 stages)4,030,662447403
SingleRing DAG A→{B,C}→D487,9236,8865,536
SingleRing sharded (4 shards)15,083,816131107

These are conservative CI-safe thresholds, not peak numbers. The perf gate enforces throughput ≥ baseline × 95% and latency ≤ baseline × 105%.

Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.