Operational Runbook

This page provides the Enterprise-focused view of operational triage, including first-response checklists, router counters, and escalation paths for Enterprise deployments.

Enterprise First-Pass Checklist

  1. Verify the alert is firing in the expected region and environment
  2. Identify which agent and deployment are involved
  3. Confirm the deployment's effective tuning matches expectations
  4. Check agent heartbeat status and last-known state
  5. Determine whether the issue is runtime (execution), governance (approval/policy), or infrastructure (host/transport)

Enterprise-Specific Escalation

SymptomCheckAction
Deployment stuck in pendingApproval coverageVerify approval credential matches spec hash and environment
Agent not reportingHeartbeat and connectivityCheck agent enrollment, API key, and network path to control plane
Effective tuning degradeddoctor outputRun doctor on the target host, compare requested vs applied
Incident bundle export failingArtifact uploadCheck artifact storage reachability and agent permissions
Replay rejectedApproval and postureConfirm environment posture, spec hash resolution, and approval scope

Canonical Published Surfaces

Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.