Incident Bundles and Replay

ByteOr's incident capture and replay system provides a structured path from failure detection through investigation to governed re-execution.

Incident Bundles

An incident bundle packages everything needed for triage into a single durable unit:

Spec — the pipeline spec that was running
Validation result — the compile and validation output
Policy audit — the governance decisions that were in effect
Environment data — runtime environment details
Snapshots — optional state captures from the runtime

Bundles are created with incident-bundle on the runtime CLI or through the Cloud artifact system.

Replay Modes

Dry Run

Dry run replays captured inputs without executing against live infrastructure. Use dry run for:

Investigating what happened
Validating the reconstructed input
Confirming the pipeline version matches the incident
Testing remediation strategies before committing

Dry run does not require execute-mode approval in most environment postures.

Execute Mode

Execute mode re-runs captured inputs with full side effects. Use execute mode only when:

The environment posture permits governed execution
A matching pipeline version can be resolved from the incident spec hash
An eligible approval is attached (when required)

Execute Mode Validation

Before execution starts, the backend validates:

Source agent environment posture
Pipeline version resolution from the incident artifact spec hash
Approval coverage when the environment requires it

If validation fails, the replay request is rejected before any governed execution begins.

Replay Audit

Every replay produces a structured audit record:

Audit version — schema version
Bundle directory — source artifact location
Spec hash — the pipeline version
Input — journal lane, path, scanned records, selected bytes, sample hashes
Policy — environment, approval status, mode (dry_run or execute)
Actions — each action taken with role, lane, stage, target, and decision

Operational Playbook

Identify the incident, agent, deployment, and environment
Open the stored artifact record
Confirm artifact metadata and capture time
Launch a dry-run replay
Review the replay audit
Escalate to execute mode only if dry run is insufficient
Obtain approval for execute-mode replay
Review resulting audit records

Escalation Points

Escalate when:

The source environment cannot be resolved
The artifact spec hash does not map to a known pipeline version
Approval coverage is missing or rejected
The agent shows repeated 401, 403, or 429 responses during investigation

Provenance

Need the source docs?

Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.