Skip to content

Sandbox Container Lifecycle

This document describes the runtime lifecycle contract owned by agents-sandbox: primary sandbox container, dedicated network, service containers, runtime event stream, exec output redirection, and cleanup/reconciliation.

Runtime Resources

ResourceNotes
Primary containerMain execution target for CreateExec
Dedicated networkOne per sandbox; shared bridge and host network are not supported
Service containersRequired and optional services declared via ServiceSpec, on the same network
Persistent event historyStored in bbolt; lifecycle and exec events survive daemon restart until retention cleanup
Exec output artifactsFiles under the configured artifact root
Exec log bind-mount{ArtifactOutputRoot}/{sandbox_id}//var/log/agents-sandbox/ (rw); each exec writes {exec_id}.stdout.log and {exec_id}.stderr.log

Docker object labels use the reverse-DNS namespace io.github.1996fanrui.agents-sandbox.*. User-defined sandbox labels are propagated with the prefix io.github.1996fanrui.agents-sandbox.user.<key>. Historical sandbox_id and exec_id values are reserved in a persistent registry before accepting create operations, preventing accidental ID reuse after daemon restart.

Lifecycle States

mermaid
stateDiagram-v2
    [*] --> NotCreated
    NotCreated --> Pending : CreateSandbox accepted
    Pending --> Ready : materialization succeeds
    Pending --> Failed : materialization fails
    Ready --> Ready : CreateExec
    Ready --> Stopped : StopSandbox or idle_ttl
    Stopped --> Ready : ResumeSandbox succeeds
    Ready --> Deleting : DeleteSandbox accepted
    Stopped --> Deleting : DeleteSandbox accepted
    Failed --> Deleting : DeleteSandbox accepted
    Pending --> Deleting : DeleteSandbox accepted
    Deleting --> Deleted : resource removal succeeds
    Deleted --> [*]

The externally visible states are PENDING, READY, FAILED, STOPPED, DELETING, and DELETED.

Lifecycle Event Contract

All lifecycle convergence must be observable through SubscribeSandboxEvents. The daemon guarantees monotonic sequence numbers per sandbox, and daemon-issued event sequences serve as the authoritative ordering source of truth.

TransitionEvent(s)
Create acceptedSANDBOX_ACCEPTED
Materialization in progressSANDBOX_PREPARING
Required service readySANDBOX_SERVICE_READY
Optional service failsSANDBOX_SERVICE_FAILED
Create or resume succeedsSANDBOX_READY
Create/resume/stop/delete failsSANDBOX_FAILED
Stop beginsSANDBOX_STOP_REQUESTED
Stop completesSANDBOX_STOPPED
Delete beginsSANDBOX_DELETE_REQUESTED
Delete completesSANDBOX_DELETED

Idle-stop emits SANDBOX_STOP_REQUESTED(reason=idle_ttl) then SANDBOX_STOPPED.

Exec lifecycle events: EXEC_STARTED on successful CreateExec; terminal states EXEC_FINISHED, EXEC_FAILED, or EXEC_CANCELLED. GetExec().exec.last_event_sequence lets clients join the exec snapshot to the sandbox event stream without a race. Internal audit action reasons remain daemon-owned and must not appear in the public RPC or event schema.

Exec Output Redirection

Exec stdout and stderr are redirected inside the container to bind-mounted host files at {ArtifactOutputRoot}/{sandbox_id}/{exec_id}.stdout.log / .stderr.log. The daemon carries zero I/O buffer per exec; CreateExecResponse returns host-side log paths so callers can read output independently. This design keeps the daemon out of the I/O hot path and keeps exec output durable across daemon restarts.

Event Replay and Retention

For one sandbox_id:

  • from_sequence=0 replays the full ordered event history since creation
  • non-zero anchors must be daemon-issued event sequences from the same sandbox stream
  • stale anchors beyond the retained stream fail with OUT_OF_RANGE and reason SANDBOX_EVENT_SEQUENCE_EXPIRED

On restart, the daemon loads all persisted state, reconciles with Docker container inspect results, and rebuilds operational sandbox records (see Daemon State Management for the full recovery contract). Deleted sandbox event streams remain queryable until runtime.event_retention_ttl expires, after which cleanup removes the retained history.

Create Path

mermaid
flowchart TB
    A[CreateSandbox accepted] --> B[Allocate sandbox_id and persist SANDBOX_ACCEPTED]
    B --> C[Validate create spec]
    C --> D[Create dedicated network]
    D --> E[Materialize filesystem inputs and built-in resources]
    E --> E1[Create exec log directory and bind-mount into primary container]
    E1 --> F[Create required and optional service containers]
    F --> G[Start and wait each required service healthy]
    G --> H[Start primary and optional services]
    H --> I[Run post_start_on_primary for required services]
    I --> J[Persist runtime handles]
    J --> K[Emit SANDBOX_SERVICE_READY / SANDBOX_SERVICE_FAILED]
    K --> L[Emit SANDBOX_READY]

Create-path rules:

  • CreateSandbox returns immediately after acceptance; the caller must not infer readiness from the response alone.
  • The daemon must fail fast on invalid mounts, copies, unknown builtin_tools, invalid service declarations, or unsafe artifact targets. Duplicate sandbox_id returns a specific error code.
  • Optional services only report initial create/start result in V1; not restarted or runtime-monitored after readiness.
  • If materialization fails after resources exist, cleanup continues on a daemon-owned background context with bounded timeout.

Resume Path

ResumeSandbox only resumes an already created sandbox; it does not accept the original create spec again.

Resume-path rules:

  • Missing runtime parts are treated as runtime corruption and must fail fast.
  • The daemon must not silently recreate a partially missing sandbox.
  • Resume keeps runtime identity stable.

Stop and Delete

mermaid
flowchart LR
    A[StopSandbox or idle_ttl] --> B[SANDBOX_STOP_REQUESTED] --> C[Stop containers] --> D[SANDBOX_STOPPED]
mermaid
flowchart LR
    A[DeleteSandbox] --> B[SANDBOX_DELETE_REQUESTED] --> C[Remove containers] --> D[Remove network] --> E[Remove state/artifacts] --> F[SANDBOX_DELETED]
  • Delete is asynchronous and immediately acknowledged. Stop and delete continue on daemon-owned background contexts.
  • Cleanup uses structured Docker Engine API calls with idempotent not-found handling.
  • After SANDBOX_DELETED, the daemon retains the event stream for runtime.event_retention_ttl before removing retained history.

Reconciliation

The daemon owns runtime reconciliation for resources under its namespace: idle sandboxes eligible for stop, resources left after failed materialization, orphaned service containers, and dedicated networks without live runtime membership. Reconciliation uses structured audit logs and explicit action reasons, deriving decisions from structured Docker metadata and recorded runtime state.