Incident Bundles and Replay
ByteOr's incident capture and replay system provides a structured path from failure detection through investigation to governed re-execution.
Incident Bundles
An incident bundle packages everything needed for triage into a single durable unit:
- Spec — the pipeline spec that was running
- Validation result — the compile and validation output
- Policy audit — the governance decisions that were in effect
- Environment data — runtime environment details
- Snapshots — optional state captures from the runtime
Bundles are created with incident-bundle on the runtime CLI or through the Cloud artifact system.
Replay Modes
Dry Run
Dry run replays captured inputs without executing against live infrastructure. Use dry run for:
- Investigating what happened
- Validating the reconstructed input
- Confirming the pipeline version matches the incident
- Testing remediation strategies before committing
Dry run does not require execute-mode approval in most environment postures.
Execute Mode
Execute mode re-runs captured inputs with full side effects. Use execute mode only when:
- The environment posture permits governed execution
- A matching pipeline version can be resolved from the incident spec hash
- An eligible approval is attached (when required)
Execute Mode Validation
Before execution starts, the backend validates:
- Source agent environment posture
- Pipeline version resolution from the incident artifact spec hash
- Approval coverage when the environment requires it
If validation fails, the replay request is rejected before any governed execution begins.
Replay Audit
Every replay produces a structured audit record:
- Audit version — schema version
- Bundle directory — source artifact location
- Spec hash — the pipeline version
- Input — journal lane, path, scanned records, selected bytes, sample hashes
- Policy — environment, approval status, mode (dry_run or execute)
- Actions — each action taken with role, lane, stage, target, and decision
Operational Playbook
- Identify the incident, agent, deployment, and environment
- Open the stored artifact record
- Confirm artifact metadata and capture time
- Launch a dry-run replay
- Review the replay audit
- Escalate to execute mode only if dry run is insufficient
- Obtain approval for execute-mode replay
- Review resulting audit records
Escalation Points
Escalate when:
- The source environment cannot be resolved
- The artifact spec hash does not map to a known pipeline version
- Approval coverage is missing or rejected
- The agent shows repeated
401,403, or429responses during investigation