Performance Tuning

This page covers performance tuning for IndexBus transport and ByteOr runtime deployments.

For benchmark methodology, publication policy, and links to the current public baseline surface, start with Benchmarking.

IndexBus Tuning

Wait Strategy

Strategy	Behavior	When to Use
`spin`	Busy-wait on the lane position	Lowest latency, highest CPU cost
`backoff`	Exponential backoff between checks	Lower CPU cost, higher tail latency

Use spin for latency-sensitive production workloads with dedicated cores. Use backoff for development, testing, and shared-host deployments.

SHM Placement

Place SHM files on tmpfs for general use
Use hugetlbfs for large regions to reduce TLB pressure
Ensure SHM backing is on local storage, not network-mounted
Clean stale SHM files before starting new deployments

Lane Sizing

Size lane capacity based on expected burst size, not average throughput
Over-provisioning capacity wastes memory; under-provisioning causes backpressure
Monitor router counters for routing distribution and drop counts

Memory Locking

Use mlockall to prevent page faults in the hot path
Verify memory-lock limits with doctor
isolated-core profile requests memory locking by default

CPU Tuning

Pinning

Mode	Behavior
`none`	No CPU affinity
`balanced`	Spread threads across available cores
`physical`	Pin to specific physical cores

Scheduling

Mode	Behavior
`other`	Default Linux scheduling
`fifo`	Real-time FIFO scheduling
`rr`	Real-time round-robin scheduling

RT scheduling requires permissions. Verify with doctor.

Core Isolation

For isolated-core profile:

Use isolcpus kernel parameter to dedicate cores
Ensure no other workloads scheduled on isolated cores
Verify isolation with doctor

Monitoring

Router Counters

Monitor IndexBus router counters for:

Total routed messages
Per-output distribution
Drop counts under pressure

Runtime Metrics

Cloud exposes metrics at /metrics including:

Request counters by route
Auth success/failure rates
Rate limiting events
Worker job states

Agent Telemetry

Monitor agent-reported metrics:

Heartbeat intervals
Applied vs. requested tuning
Degraded tuning reasons
Artifact upload success rates

Benchmarking

Start with Benchmarking when you need guidance on how to interpret published numbers, which numbers belong in public docs, and where repo-owned evidence lives.

Use the baseline benchmark suite for reproducible measurements:

Run on isolated hardware for consistent results
Compare against the published baseline numbers
Report both throughput and latency percentiles (p50, p99, p99.9)
Document the exact hardware and kernel configuration used

Performance Baselines

Conservative CI-safe thresholds from bench/perf_baseline.kv (OSS readiness gate):

Read these in two passes:

throughput answers how much work a path can sustain under the measured topology
p99 latency and CPU answer whether that path remains operationally credible under load

Do not treat a bigger ops/s number as a win by itself. If throughput improves while tail latency or CPU cost degrades materially, the result may still be worse for latency-sensitive deployments.

Throughput floors

Benchmark	Min ops/s
Events roundtrip (SPSC SHM)	7,936,804
Events slot-forward (3-hop)	4,345,874
Events fan-in (4P→1C SPSC)	5,438,972
Events MPSC (4P→1C)	4,000,292
SingleRing chain A→B (2 stages)	4,030,662
SingleRing DAG A→{B,C}→D	487,923
SingleRing sharded (4 shards)	15,083,816

Tail-latency and CPU ceilings

Benchmark	Max p99 (ns/op)	Max CPU (ns/op)
Events roundtrip (SPSC SHM)	114,306	246
Events slot-forward (3-hop)	2,094,488	530
Events fan-in (4P→1C SPSC)	2,194,482	582
Events MPSC (4P→1C)	3,649,227	683
SingleRing chain A→B (2 stages)	447	403
SingleRing DAG A→{B,C}→D	6,886	5,536
SingleRing sharded (4 shards)	131	107

These are conservative CI-safe thresholds, not peak numbers. The perf gate enforces throughput ≥ baseline × 95% and latency ≤ baseline × 105%.

Provenance

Need the source docs?

Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.

Reference hub

#Performance Tuning

#IndexBus Tuning

#Wait Strategy

#SHM Placement

#Lane Sizing

#Memory Locking

#CPU Tuning

#Pinning

#Scheduling

#Core Isolation

#Monitoring

#Router Counters

#Runtime Metrics

#Agent Telemetry

#Benchmarking

#Performance Baselines

#Throughput floors

#Tail-latency and CPU ceilings