Durable Execution
What Is Durable Execution?
Section titled “What Is Durable Execution?”Durable execution lets you write long-running workflows as ordinary async Rust code. Each step is checkpointed to the database. If your process crashes, times out, or gets redeployed, Zart resumes from the last successful step — no work is lost, no step is repeated.
Think of it like GitHub Actions: your workflow has multiple steps, and if the infrastructure fails mid-run, it resumes where it left off. No one configures durability as a non-functional requirement — it’s just how the platform works. Zart brings that same default-reliable experience to your Rust backend.
Step 1: send_email ──✓── persistedStep 2: charge_card ──💥── crash hereStep 3: send_receipt─────────────────────────────Restart → Step 1 is SKIPPED (cached) → Step 2 is SKIPPED (cached) → Step 3 runs → doneHow to Trigger a Workflow
Section titled “How to Trigger a Workflow”Every durable execution is identified by a unique execution ID that doubles as an idempotency key. Calling start_for with the same ID twice is safe — the second call returns ExecutionAlreadyExists so you can wait on the existing execution instead of creating a duplicate.
use zart::prelude::*;use zart::scheduler::DurableScheduler;
let durable = DurableScheduler::new(scheduler.clone());
// Schedule a new execution (idempotent)durable .start_for::<OnboardingTask>("signup-user-42", "onboarding", &OnboardingData { email: "alice@example.com".into(), }) .await?;You trigger executions from anywhere — a web handler, a CLI command, a background job. The scheduler and the worker share the same PostgreSQL backend, so they can run in the same process or on separate machines.
How to Wait for Results
Section titled “How to Wait for Results”wait_completion<T> — block and deserialize
Section titled “wait_completion<T> — block and deserialize”The simplest way to wait for a result. Blocks until the execution finishes, then deserializes the output to T.
let output: OnboardingOutput = durable .wait_completion("signup-user-42", Duration::from_secs(60), None) .await?;
println!("Done: {:?}", output);If the execution doesn’t finish within the timeout, wait_completion returns WaitTimedOut. Deserialization errors are surfaced as SchedulerError::Deserialization.
start_and_wait_for<H> — start and wait in one call
Section titled “start_and_wait_for<H> — start and wait in one call”For the common case where you start an execution and immediately wait, use start_and_wait_for. It infers input and output types from the handler type:
let output = durable .start_and_wait_for::<OnboardingTask>( "signup-user-42", "onboarding", &OnboardingData { email: "alice@example.com".into() }, Duration::from_secs(60), ) .await?;// output: OnboardingOutput — inferred from OnboardingTask::Outputwait_for<H> — wait with handler-inferred types
Section titled “wait_for<H> — wait with handler-inferred types”When you started an execution earlier and now want the typed result without manual deserialization:
let output = durable .wait_for::<OnboardingTask>("signup-user-42", Duration::from_secs(60)) .await?;// output: OnboardingOutput — inferred from OnboardingTask::Outputstatus — check without blocking
Section titled “status — check without blocking”let record = durable.status("signup-user-42").await?;println!("Status: {:?}", record.status);// ExecutionStatus::Pending | Running | Completed | Failed | Cancelledcancel — stop a running execution
Section titled “cancel — stop a running execution”let cancelled = durable.cancel("signup-user-42").await?;A cancelled execution stops at its next scheduling point. Steps currently running will finish, but no new steps will be scheduled.
The Lifecycle
Section titled “The Lifecycle”stateDiagram-v2
[*] --> Pending: start_for
Pending --> Running: worker picks up
Running --> Waiting: sleep / wait_for_event
Waiting --> Running: timer fires / event arrives
Running --> Completed: all steps succeed
Running --> Failed: step fails, no retry
Running --> Failed: retries exhausted
Running --> Cancelled: cancel called
Waiting --> Failed: event timeout
- Schedule —
start_forinserts a row inzart_executionswith statuspending. - Claim — Workers poll PostgreSQL with
SELECT … FOR UPDATE SKIP LOCKED. Exactly one worker owns one execution. - Run — The handler body executes. Each step call checks the database:
- Hit — stored result deserialized and returned; the step logic is never called again.
- Miss — the step runs, the result is persisted, then returned.
- Complete — the handler returns
Ok; status becomescompletedand the output is stored. - Fail — the handler returns
Err; after retries are exhausted, status becomesfailed.
Idempotency Guarantee
Section titled “Idempotency Guarantee”The execution ID is your idempotency key. If the same user signs up twice due to a double-click, the second start_for call returns ExecutionAlreadyExists — you detect this and simply wait on the existing execution.
match durable.start_for::<OnboardingTask>(&exec_id, "onboarding", &input).await { Ok(_) => println!("New execution started"), Err(SchedulerError::ExecutionAlreadyExists(_, _)) => { println!("Already running, waiting on existing…"); } Err(e) => return Err(e.into()),}
// In all cases, wait on the execution and get typed outputlet output: OnboardingOutput = durable .wait_completion(&exec_id, Duration::from_secs(60), None) .await?;Next Steps
Section titled “Next Steps”- Steps — define and compose durable work
- Flow Control — parallel steps, sleeps, and events
- Error Handling — the three-way outcome model