Skip to content

Zart is in active development — breaking API changes may occur despite our best efforts to keep contracts stable.

Observability

Zart emits structured logs via tracing and exposes Prometheus metrics through a single function. Both work whether you run Zart embedded inside your own server or as a standalone worker.


Call init_tracing() once at the top of main. It reads RUST_LOG and writes human-readable output to stderr.

use zart::logging::init_tracing;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
init_tracing().expect("failed to initialise tracing");
// …
}

Use init_tracing_with_config when you need structured JSON logs for log aggregation (Datadog, Loki, CloudWatch, etc.).

use zart::logging::{init_tracing_with_config, TracingConfig};
init_tracing_with_config(TracingConfig {
env_filter: Some("zart=info,warn".to_string()),
json_format: true,
}).expect("failed to initialise tracing");
FieldDefaultDescription
env_filterRUST_LOG env var, then "info"Standard EnvFilter directive
json_formatfalsetrue emits newline-delimited JSON; false emits human-readable text

To see Zart’s internal debug logs without drowning in your own application’s output:

Terminal window
RUST_LOG=zart=debug,info cargo run

Prometheus instrumentation is opt-in. Enable the metrics Cargo feature to activate it:

Cargo.toml
zart = { version = "0.1", features = ["metrics"] }
# If you also use zart-api, enable its matching feature:
zart-api = { version = "0.1", features = ["metrics"] }

Without this feature neither prometheus nor lazy_static are compiled into your binary. All metric call sites become no-ops and gather_metrics() returns an empty string.

Zart maintains a private Prometheus registry. All metrics are registered at startup; you expose them by calling gather_metrics() at whatever path suits your application.

MetricTypeLabelsWhat it measures
zart_tasks_totalCounterstatusTasks completed, failed, cancelled, or scheduled
zart_task_duration_secondsHistogramtask_name, statusEnd-to-end task execution time
zart_executions_totalCounterstatus, task_nameDurable execution lifecycle events
zart_steps_totalCounterstatus, step_nameSteps completed, failed, scheduled, or waiting for event
zart_step_duration_secondsHistogramstep_name, statusPer-step execution time
zart_queue_depthGaugeTasks waiting to be picked up
zart_worker_concurrent_tasksGaugeTasks currently executing
zart_poll_interval_secondsHistogramActual time between poll cycles
zart_events_delivered_totalCounterevent_name, statusEvents delivered to waiting tasks
zart_task_heartbeat_renewals_totalCountertask_name, status (success, failed, not_found)Lease renewals — not_found means the task was already claimed or cancelled
zart_heartbeat_activeGaugeActive heartbeat loops (one per running task)

Duration histograms use buckets: 1ms → 5ms → 10ms → 50ms → 100ms → 500ms → 1s → 5s → 10s → 30s → 60s → 5m.

When Zart runs inside your own Axum (or any other) server, wire gather_metrics() into a handler at whatever path you want:

use axum::{Router, routing::get, http::StatusCode, response::IntoResponse};
use zart::metrics::gather_metrics;
async fn metrics_handler() -> impl IntoResponse {
(
StatusCode::OK,
[("Content-Type", "text/plain; version=0.0.4; charset=utf-8")],
gather_metrics(),
)
}
// Mount anywhere you like — /metrics is the Prometheus convention
let app = Router::new()
.route("/metrics", get(metrics_handler))
// … your other routes

If you use the zart-api crate, the GET /metrics endpoint is included automatically. No extra code needed.

use zart_api::ApiServer;
let server = ApiServer::new(scheduler, registry);
server.run("0.0.0.0:8080").await?;
// Prometheus can now scrape http://<host>:8080/metrics
scrape_configs:
- job_name: zart
static_configs:
- targets: ["your-app:8080"]
metrics_path: /metrics # adjust if you chose a different path
scrape_interval: 15s

Expose the worker’s liveness via an HTTP handler in your own server:

use axum::{Router, routing::get, http::StatusCode, response::IntoResponse, extract::State};
use std::sync::Arc;
use zart::Worker;
async fn health(State(worker): State<Arc<Worker>>) -> impl IntoResponse {
if worker.is_healthy() {
StatusCode::OK
} else {
StatusCode::SERVICE_UNAVAILABLE
}
}
let app = Router::new()
.route("/health", get(health))
.with_state(worker.clone());

If you use zart-api, /healthz (liveness) and /readyz (readiness) are built in.

Kubernetes probe example:

livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 15

groups:
- name: zart
rules:
- alert: ZartTaskFailureRateHigh
expr: |
rate(zart_tasks_total{status="failed"}[5m])
/ rate(zart_tasks_total[5m]) > 0.05
for: 2m
annotations:
summary: "More than 5% of tasks are failing"
- alert: ZartQueueDepthHigh
expr: zart_queue_depth > 500
for: 5m
annotations:
summary: "Task queue depth exceeds 500 for 5 minutes"
- alert: ZartHeartbeatRenewalsFailing
expr: rate(zart_task_heartbeat_renewals_total{status="failed"}[5m]) > 0
for: 1m
annotations:
summary: "Heartbeat renewals are failing — possible database connectivity issue"