Skip to content

Zart is in active development — breaking API changes may occur despite our best efforts to keep contracts stable.

Durable Execution

Durable execution lets you write long-running workflows as ordinary async Rust code. Each step is checkpointed to the database. If your process crashes, times out, or gets redeployed, Zart resumes from the last successful step — no work is lost, no step is repeated.

Think of it like GitHub Actions: your workflow has multiple steps, and if the infrastructure fails mid-run, it resumes where it left off. No one configures durability as a non-functional requirement — it’s just how the platform works. Zart brings that same default-reliable experience to your Rust backend.

Step 1: send_email ──✓── persisted
Step 2: charge_card ──💥── crash here
Step 3: send_receipt
─────────────────────────────
Restart → Step 1 is SKIPPED (cached)
→ Step 2 is SKIPPED (cached)
→ Step 3 runs → done

Every durable execution is identified by a unique execution ID that doubles as an idempotency key. Calling start_for with the same ID twice is safe — the second call returns ExecutionAlreadyExists so you can wait on the existing execution instead of creating a duplicate.

use zart::prelude::*;
use zart::scheduler::DurableScheduler;
let durable = DurableScheduler::new(scheduler.clone());
// Schedule a new execution (idempotent)
durable
.start_for::<OnboardingTask>("signup-user-42", "onboarding", &OnboardingData {
email: "alice@example.com".into(),
})
.await?;

You trigger executions from anywhere — a web handler, a CLI command, a background job. The scheduler and the worker share the same PostgreSQL backend, so they can run in the same process or on separate machines.

wait_completion<T> — block and deserialize

Section titled “wait_completion<T> — block and deserialize”

The simplest way to wait for a result. Blocks until the execution finishes, then deserializes the output to T.

let output: OnboardingOutput = durable
.wait_completion("signup-user-42", Duration::from_secs(60), None)
.await?;
println!("Done: {:?}", output);

If the execution doesn’t finish within the timeout, wait_completion returns WaitTimedOut. Deserialization errors are surfaced as SchedulerError::Deserialization.

start_and_wait_for<H> — start and wait in one call

Section titled “start_and_wait_for<H> — start and wait in one call”

For the common case where you start an execution and immediately wait, use start_and_wait_for. It infers input and output types from the handler type:

let output = durable
.start_and_wait_for::<OnboardingTask>(
"signup-user-42",
"onboarding",
&OnboardingData { email: "alice@example.com".into() },
Duration::from_secs(60),
)
.await?;
// output: OnboardingOutput — inferred from OnboardingTask::Output

wait_for<H> — wait with handler-inferred types

Section titled “wait_for<H> — wait with handler-inferred types”

When you started an execution earlier and now want the typed result without manual deserialization:

let output = durable
.wait_for::<OnboardingTask>("signup-user-42", Duration::from_secs(60))
.await?;
// output: OnboardingOutput — inferred from OnboardingTask::Output
let record = durable.status("signup-user-42").await?;
println!("Status: {:?}", record.status);
// ExecutionStatus::Pending | Running | Completed | Failed | Cancelled
let cancelled = durable.cancel("signup-user-42").await?;

A cancelled execution stops at its next scheduling point. Steps currently running will finish, but no new steps will be scheduled.

stateDiagram-v2
    [*] --> Pending: start_for
    Pending --> Running: worker picks up
    Running --> Waiting: sleep / wait_for_event
    Waiting --> Running: timer fires / event arrives
    Running --> Completed: all steps succeed
    Running --> Failed: step fails, no retry
    Running --> Failed: retries exhausted
    Running --> Cancelled: cancel called
    Waiting --> Failed: event timeout
  1. Schedulestart_for inserts a row in zart_executions with status pending.
  2. Claim — Workers poll PostgreSQL with SELECT … FOR UPDATE SKIP LOCKED. Exactly one worker owns one execution.
  3. Run — The handler body executes. Each step call checks the database:
    • Hit — stored result deserialized and returned; the step logic is never called again.
    • Miss — the step runs, the result is persisted, then returned.
  4. Complete — the handler returns Ok; status becomes completed and the output is stored.
  5. Fail — the handler returns Err; after retries are exhausted, status becomes failed.

The execution ID is your idempotency key. If the same user signs up twice due to a double-click, the second start_for call returns ExecutionAlreadyExists — you detect this and simply wait on the existing execution.

match durable.start_for::<OnboardingTask>(&exec_id, "onboarding", &input).await {
Ok(_) => println!("New execution started"),
Err(SchedulerError::ExecutionAlreadyExists(_, _)) => {
println!("Already running, waiting on existing…");
}
Err(e) => return Err(e.into()),
}
// In all cases, wait on the execution and get typed output
let output: OnboardingOutput = durable
.wait_completion(&exec_id, Duration::from_secs(60), None)
.await?;