Operator Flow

This is the public runbook that connects host readiness, live state capture, incident export, replay, and artifact interpretation into one repeatable operator sequence.

1. Validate the target before execution

Start with the cheapest checks first:

  1. validate --spec=... to catch structural spec errors.
  2. describe --spec=... or dot --spec=... when you need a human review of the intended topology.
  3. doctor on the destination host before enabling aggressive tuning profiles.

Treat doctor output as a contract check, not a benchmark. It exists to separate unsupported host posture from runtime logic failures.

2. Capture live state when behavior diverges

Use snapshot as the direct state-capture verb.

  • SingleRing surfaces should export the ring state and gating view.
  • EdgePlane surfaces additionally expose adapter-level transport health in the snapshot payload.
  • ActionGraph and DataGuard still share the same snapshot and heartbeat machinery when running with agent reporting bootstrap.

Capture the snapshot before restarting a service if the state itself might explain the failure.

3. Export an incident bundle early

incident-bundle is the canonical triage handoff. Export it as soon as you know the event is no longer a transient operator typo.

The bundle is what turns a local runtime problem into a reproducible review unit.

4. Read artifacts in the right order

Start with the artifacts that explain runtime intent before the artifacts that explain downstream consequences:

  1. spec_validation.json
  2. policy_audit.json
  3. environment.json
  4. single_ring_snapshot.json or lane_graph_state.json
  5. product-specific artifacts such as DataGuard reports or replay audits

Typical interpretations:

  • spec-validation failure means you are not debugging runtime behavior yet,
  • policy-audit mismatch means the environment or approval posture is wrong,
  • environment mismatch means the host posture may differ from the assumed tuning contract,
  • snapshot anomalies point to lag, gating, adapter failures, or stalled execution state.

5. Replay from the bundle, not from memory

Replay should consume the captured bundle rather than an operator's reconstructed version of the incident.

Recommended posture:

  1. begin with dry-run replay,
  2. inspect the replay audit,
  3. promote to execute only when approval and environment posture are both correct.

That keeps replay audit trails meaningful and avoids turning replay into an uncontrolled second incident.

6. Product-specific interpretation shortcuts

Product surfaceFirst artifact focus
byteor-edgeplaneadapter snapshot entries: transport, connection_state, message_count, lag, recent_failures
byteor-actiongraphpolicy audit plus approval verification outcome for http_post:* and exec:* stages
byteor-dataguardDataGuard kv/v1 report fields under schema.*, policy.*, and counts.*
Provenance
Need the canonical source?
Use the public hub to orient yourself, then jump to repo-owned docs or rustdoc when you need contract-level detail.