Performance Tuning
This page covers performance tuning for IndexBus transport and ByteOr runtime deployments.
For benchmark methodology, publication policy, and links to the current public baseline surface, start with Benchmarking.
IndexBus Tuning
Wait Strategy
Use spin for latency-sensitive production workloads with dedicated cores. Use backoff for development, testing, and shared-host deployments.
SHM Placement
- Place SHM files on
tmpfsfor general use - Use
hugetlbfsfor large regions to reduce TLB pressure - Ensure SHM backing is on local storage, not network-mounted
- Clean stale SHM files before starting new deployments
Lane Sizing
- Size lane capacity based on expected burst size, not average throughput
- Over-provisioning capacity wastes memory; under-provisioning causes backpressure
- Monitor router counters for routing distribution and drop counts
Memory Locking
- Use
mlockallto prevent page faults in the hot path - Verify memory-lock limits with
doctor isolated-coreprofile requests memory locking by default
CPU Tuning
Pinning
Scheduling
RT scheduling requires permissions. Verify with doctor.
Core Isolation
For isolated-core profile:
- Use
isolcpuskernel parameter to dedicate cores - Ensure no other workloads scheduled on isolated cores
- Verify isolation with
doctor
Monitoring
Router Counters
Monitor IndexBus router counters for:
- Total routed messages
- Per-output distribution
- Drop counts under pressure
Runtime Metrics
Cloud exposes metrics at /metrics including:
- Request counters by route
- Auth success/failure rates
- Rate limiting events
- Worker job states
Agent Telemetry
Monitor agent-reported metrics:
- Heartbeat intervals
- Applied vs. requested tuning
- Degraded tuning reasons
- Artifact upload success rates
Benchmarking
Start with Benchmarking when you need guidance on how to interpret published numbers, which numbers belong in public docs, and where repo-owned evidence lives.
Use the baseline benchmark suite for reproducible measurements:
- Run on isolated hardware for consistent results
- Compare against the published baseline numbers
- Report both throughput and latency percentiles (p50, p99, p99.9)
- Document the exact hardware and kernel configuration used
Performance Baselines
Conservative CI-safe thresholds from bench/perf_baseline.kv (OSS readiness gate):
Read these in two passes:
- throughput answers how much work a path can sustain under the measured topology
- p99 latency and CPU answer whether that path remains operationally credible under load
Do not treat a bigger ops/s number as a win by itself. If throughput improves while tail latency or CPU cost degrades materially, the result may still be worse for latency-sensitive deployments.
Throughput floors
Tail-latency and CPU ceilings
These are conservative CI-safe thresholds, not peak numbers. The perf gate enforces throughput ≥ baseline × 95% and latency ≤ baseline × 105%.