Incident And Replay Operator Guide

This guide covers the operator path from incident artifact collection to replay execution.

Investigation flow

  1. Identify the incident, agent, deployment, and environment involved.
  2. Open the stored artifact record for the incident bundle or snapshot.
  3. Confirm the artifact metadata and capture time.
  4. Launch a dry-run replay first to validate the reconstructed input.
  5. Escalate to execute-mode replay only when the environment posture and approval coverage allow it.

Available replay modes

Dry run

Use dry run when you want to inspect behavior without allowing the replayed action to execute against live infrastructure.

Execute mode

Use execute mode only when:

  • the environment posture permits governed execution
  • the replay can resolve a matching pipeline version from the incident spec hash
  • an eligible approval is attached when required

What execute replay validates

Before execution starts, the backend validates:

  • source agent environment posture
  • pipeline version resolution from the incident artifact spec hash
  • approval coverage when execute mode requires it

If these checks fail, the replay request is rejected before any governed execution begins.

Artifact handling notes

Artifact uploads use multipart submission and enforce:

  • type validation
  • size validation
  • content-hash deduplication
  • scoped agent API key authorization

Artifact upload routes also have abuse controls and can return 429 when an agent exceeds allowed request volume.

Response playbook

When investigating a production incident:

  1. confirm the latest deployment status and approval context
  2. inspect recent heartbeats and snapshots from the source agent
  3. verify the incident artifact metadata and source environment
  4. run a dry-run replay and capture the result
  5. only request execute-mode approval if dry run is insufficient
  6. review resulting audit records after the replay request completes or fails

Escalation points

Escalate when:

  • the source environment cannot be resolved
  • the artifact spec hash does not map to a known pipeline version
  • approval coverage is missing or rejected
  • the agent shows repeated 401, 403, or 429 responses during investigation
Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.