# Ursula — full documentation > Open-source Distributed Durable Streams over HTTP, backed by S3. Source: https://opendurability.github.io/ursula --- # Ursula > Distributed Durable Streams over HTTP, backed by S3. > [!NOTE] > Ursula is a joint project by

Tonbo and

Loro. Ursula is an open-source implementation of Durable Streams: URL-addressable, append-only byte streams that can be replayed, resumed, and tailed with plain HTTP. Run it as a single node for local development, or as a Raft cluster for replicated writes, follower reads, snapshots, bootstrap recovery, and optional S3-backed cold storage. ## What it is **HTTP-native streams.** `PUT` creates buckets and streams, `POST` appends bytes, and `GET` reads from an offset or tails live with long-poll/SSE. **One timeline per resource.** Give each document, session, workflow, room, or agent run its own durable stream. **Distributed serving.** Ursula adds leader-serialized appends, quorum replication, follower reads, and bootstrap/snapshot recovery around the Durable Streams protocol. ## Run locally Start a single node: ```bash cargo run --bin ursula -- serve \ --node-id 1 \ --listen 127.0.0.1:4437 \ --advertise 127.0.0.1:4437 \ --data-dir ./data ``` Create a bucket and stream, append bytes, and read them back: ```bash curl -X PUT http://127.0.0.1:4437/demo curl -X PUT http://127.0.0.1:4437/demo/hello curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'hello world' curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` ## Learn more - **[Quick Start](/docs/quick-start):** step-by-step walkthrough - **[Why Ursula](/docs/why-ursula):** where it fits, and where it does not - **[Concepts](/docs/concepts/streams):** streams, buckets, offsets, read modes, snapshots - **[Architecture](/docs/architecture/overview):** how the cluster works - **[API Reference](/docs/api/overview):** full HTTP API surface - **[Protocol Spec](/docs/specs/durable-stream):** the Durable Streams Protocol and extensions --- # Why Ursula > Where Ursula fits as a durable, replayable stream layer for application state. Ursula is a durable, replayable stream layer for application state. Each document, session, task, room, or agent run gets its own ordered stream that supports recovery, replay, live updates, snapshots, and eventual expiry. You can think of it as a per-entity durable log runtime. It is purpose-built for this pattern, not a general-purpose event backbone or stream storage primitive. ## One stream per entity Most event systems multiplex many entities into shared channels. Consumers reconstruct per-entity state downstream. Ursula inverts this: each entity owns its own durable timeline. Writers append directly to that stream, and readers resume from its offsets. Buckets group related streams under one namespace. This gives you: - **Replayable recovery.** When a worker, sandbox, or agent restarts, replay the entity's log and continue. - **Live tails with simple clients.** Watch progress over HTTP with catch-up reads, long-poll, or SSE. - **Built-in lifecycle.** Snapshots, bootstrap, and TTL fit the same timeline model instead of becoming separate infrastructure. ## Tradeoffs Ursula makes one durable timeline per entity cheap and ergonomic. In exchange, it is not trying to be the most general abstraction for cross-system event distribution or arbitrary stream processing pipelines. That tradeoff is intentional: the simpler the per-entity model, the easier it is to recover, inspect, and operate long-running application state. For how the write path achieves low latency and strong durability without S3 on the hot path, see [Architecture](/docs/architecture/overview). For concrete latency and cost numbers, see [Competitive comparison](/docs/competitive-comparison). ## Next steps - [Quick start](/docs/quick-start): create a stream and append data with curl - [Streams](/docs/concepts/streams): the core stream abstraction and lifecycle - [Architecture](/docs/architecture/overview): how the cluster works - [API overview](/docs/api/overview): the HTTP routes and request patterns --- # Quick Start > Run Ursula locally, create a bucket, append data, and subscribe with SSE. Start a local Ursula node first: ```bash cargo run --bin ursula -- serve \ --node-id 1 \ --listen 127.0.0.1:4437 \ --advertise 127.0.0.1:4437 \ --data-dir ./data ``` Then use `curl` against the local HTTP API. ## Create a bucket and a stream ```bash curl -X PUT http://127.0.0.1:4437/demo curl -X PUT http://127.0.0.1:4437/demo/hello ``` ## Append data ```bash curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'first message' curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'second message' ``` ## Read everything from the beginning ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` ## Subscribe for live updates Open a second terminal and start an SSE subscription: ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1&live=sse' ``` Then go back to the first terminal and append more data. You'll see it arrive in the SSE stream instantly. ## Read more - [Install](/docs/install): build Ursula from source - [Run locally](/docs/run-locally): single-node config and smoke checks - [Deploy a cluster](/docs/deploy-cluster): three-node cluster shape - [Configure S3](/docs/configure-s3): shared cold storage for clusters - [API overview](/docs/api/overview): all HTTP routes and request patterns - [Streams](/docs/concepts/streams): the core stream abstraction and lifecycle - [Architecture](/docs/architecture/overview): how the cluster works under the hood --- # Install > Build Ursula from source and prepare a host to run the server. Ursula is currently easiest to install from source. The build compiles Rust plus the RocksDB native dependency, so the host needs a Rust toolchain and a C/C++ build toolchain. ## Prerequisites Required: - Rust stable and Cargo - a C/C++ compiler - CMake - `pkg-config` Optional but recommended for the repo's developer workflow: `just` (recipe runner) and `sccache` (compiler cache). They speed up rebuilds; everything still works with plain `cargo`. On macOS: ```bash brew install rust cmake pkg-config brew install just sccache # optional ``` On Debian or Ubuntu: ```bash sudo apt-get update sudo apt-get install -y build-essential cmake pkg-config curl curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh cargo install just sccache # optional ``` ## Build Clone the repository, then build the workspace: ```bash git clone https://github.com/opendurability/ursula.git cd ursula cargo build --release --bin ursula ``` If you installed `just` and `sccache`, you can use the cached recipes instead, which wire `sccache` into both Rust and RocksDB's C/C++ build steps: ```bash just build ``` For a release binary: ```bash cargo build --release --bin ursula ./target/release/ursula --help ``` ## Run from source For local development, use the checked-in `justfile`: ```bash just serve --node-id 1 \ --listen 127.0.0.1:4437 \ --advertise 127.0.0.1:4437 \ --data-dir ./data ``` For deployment, build `target/release/ursula` and run the binary with a config file. --- # Run locally > Start a single-node Ursula server and verify the HTTP API. A single node auto-bootstraps on first start. It is the fastest way to evaluate the HTTP API and stream semantics. ## Start the server ```bash cargo run --bin ursula -- serve \ --node-id 1 \ --listen 127.0.0.1:4437 \ --advertise 127.0.0.1:4437 \ --data-dir ./data ``` Or use the repo recipe: ```bash just serve --node-id 1 \ --listen 127.0.0.1:4437 \ --advertise 127.0.0.1:4437 \ --data-dir ./data ``` ## Smoke test ```bash curl -f http://127.0.0.1:4437/healthz curl -f http://127.0.0.1:4437/readyz/local curl -X PUT http://127.0.0.1:4437/demo curl -X PUT http://127.0.0.1:4437/demo/hello curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'hello' curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` ## Config file ```toml version = 1 [server] node_id = 1 listen = "127.0.0.1:4437" advertise = "127.0.0.1:4437" data_dir = "data" hot_payload_backend = "rocksdb" write_only_leader = false [cold] backend = "fs" ``` Run it: ```bash cargo run --bin ursula -- serve --config ./ursula.toml ``` Configuration precedence is `defaults < TOML < environment < CLI flags`. --- # Deploy a cluster > Run a three-node Ursula cluster with Raft replication and shared cold storage. A production-style Ursula deployment runs multiple voters and a shared cold backend. The common shape is three nodes: one leader at a time, two followers, quorum-committed writes, and follower-capable reads. ## Topology Use one config per node. Each node needs: - a stable `server.node_id` - a `server.listen` address for local binding - a `server.advertise` address reachable by other nodes - a persistent `server.data_dir` - the same `cluster.bootstrap_node_id` - `cluster.routes` containing the other nodes' advertised addresses - a shared cold backend such as S3 ## Node 1 ```toml version = 1 [server] node_id = 1 listen = "0.0.0.0:4437" advertise = "10.0.0.1:4437" data_dir = "/var/lib/ursula" [cluster] name = "prod" bootstrap_node_id = 1 routes = ["10.0.0.2:4437", "10.0.0.3:4437"] [cold] backend = "s3" s3_bucket = "ursula-prod" s3_region = "us-east-1" ``` ## Node 2 ```toml version = 1 [server] node_id = 2 listen = "0.0.0.0:4437" advertise = "10.0.0.2:4437" data_dir = "/var/lib/ursula" [cluster] name = "prod" bootstrap_node_id = 1 routes = ["10.0.0.1:4437", "10.0.0.3:4437"] [cold] backend = "s3" s3_bucket = "ursula-prod" s3_region = "us-east-1" ``` ## Node 3 ```toml version = 1 [server] node_id = 3 listen = "0.0.0.0:4437" advertise = "10.0.0.3:4437" data_dir = "/var/lib/ursula" [cluster] name = "prod" bootstrap_node_id = 1 routes = ["10.0.0.1:4437", "10.0.0.2:4437"] [cold] backend = "s3" s3_bucket = "ursula-prod" s3_region = "us-east-1" ``` Start each node: ```bash ./ursula serve --config /etc/ursula/ursula.toml ``` For managed cluster operations, use [`ursulactl`](/docs/control-cli). It can inspect cluster state, deploy the orchestrator service, bootstrap membership, run rolling upgrades, migrate nodes, and wait for operations to complete. ## Bootstrap behavior When no initialized peer is found, only `cluster.bootstrap_node_id` may initialize the cluster. Other nodes wait for an initialized peer or for the control plane to add them. Cluster mode rejects `cold.backend = "fs"` because each node must be able to recover shared cold chunks and snapshots. --- # Configure S3 > Configure Ursula cold storage with S3-compatible object storage. Ursula keeps recent data in the hot payload backend and flushes cold chunks and snapshot blobs to the configured cold backend. Multi-node clusters should use S3 or an S3-compatible object store. ## Minimal S3 config ```toml [cold] backend = "s3" s3_bucket = "ursula-prod" s3_region = "us-east-1" ``` Ursula reads standard AWS credentials from the process environment through the AWS SDK stack. For explicit credentials in the config, prefer environment references: ```toml [cold] backend = "s3" s3_bucket = "ursula-prod" s3_region = "us-east-1" s3_access_key_id = "$URSULA_AWS_ACCESS_KEY_ID" s3_secret_access_key = "$URSULA_AWS_SECRET_ACCESS_KEY" s3_session_token = "$URSULA_AWS_SESSION_TOKEN" ``` If an environment reference is missing, startup fails instead of silently running with an incomplete credential. ## S3-compatible endpoints Set `s3_endpoint` for compatible stores: ```toml [cold] backend = "s3" s3_bucket = "ursula-dev" s3_region = "us-east-1" s3_endpoint = "http://127.0.0.1:9000" ``` ## Local development Single-node development can use the filesystem backend: ```toml [cold] backend = "fs" ``` Do not use the filesystem cold backend for multi-node clusters. --- # Control CLI > Use ursulactl to inspect, bootstrap, and operate Ursula clusters. `ursulactl` is Ursula's operational CLI for cluster administration. It wraps the lower-level HTTP admin endpoints, remote node execution, service control, artifact installation, and orchestrator operations used by multi-node deployments. For local single-node development, `curl` and the public HTTP API are usually enough. Use `ursulactl` when you are running a real cluster and need repeatable operational actions. ## Build ```bash cargo build --release -p ursulactl ``` Or run it from source: ```bash cargo run --release -p ursulactl -- status ``` ## Cluster inspection ```bash ursulactl status ursulactl cluster status ursulactl service status --node-id 1 --node-id 2 --node-id 3 ursulactl service tail --service ursula-ds --node-id 1 --lines 120 ``` `status` returns a cluster-level JSON report, including Ursula node status, service state, control-plane state, and load-generator state when present. `cluster status` focuses on Ursula membership and readiness across nodes. ## Service operations ```bash ursulactl service restart --service ursula-ds --node-id 2 ursulactl service stop --service ursula-ds --node-id 3 ursulactl service start --service ursula-ds --node-id 3 ``` For routine cluster maintenance, prefer orchestrator operations over ad hoc service restarts. They encode ordering, readiness checks, leader stability windows, and operation status. ## Orchestrator service Deploy the control-plane service onto a control node: ```bash ursulactl deploy orchestrator --control-node-id 4 ``` The orchestrator exposes an HTTP control service. `ursulactl operations ...` talks to it through `ORCHESTRATOR_SERVICE_URL`, defaulting to `http://127.0.0.1:7010`. ```bash export ORCHESTRATOR_SERVICE_URL=http://127.0.0.1:7010 ursulactl operations list ursulactl operations latest --limit 5 ``` ## Cluster lifecycle operations Bootstrap a three-voter cluster: ```bash ursulactl operations bootstrap \ --bootstrap-node-id 1 \ --voter-id 1 \ --voter-id 2 \ --voter-id 3 \ --wait ``` Run a graceful rolling upgrade: ```bash ursulactl operations graceful-upgrade \ --target-version 20260514-abcdef0 \ --node-id 1 \ --node-id 2 \ --node-id 3 \ --wait ``` Replace a node: ```bash ursulactl operations node-migration \ --source-node-id 2 \ --target-node-id 4 \ --wait ``` Scale membership: ```bash ursulactl operations scale-up --target-node-id 4 --wait ursulactl operations scale-down --target-node-id 4 --wait ``` Restart and recover one node at a time: ```bash ursulactl operations restart-recovery --node-id 2 --wait ``` ## Operation control ```bash ursulactl operations get --id ursulactl operations wait --id ursulactl operations pause --id ursulactl operations resume --id ursulactl operations cancel --id ``` Operations are persisted by the orchestrator service, so a client can submit an operation, disconnect, and later inspect or wait on the same operation ID. ## Direct node access `ursulactl node ...` is useful for diagnostics: ```bash ursulactl node exec --node-id 1 --script 'hostname && systemctl status ursula-ds --no-pager' ursulactl node cat --node-id 1 --path /etc/ursula/ursula.toml ursulactl node tail --node-id 1 --path /var/log/ursula/ursula.log --lines 100 ``` These commands are powerful and bypass normal orchestrator sequencing. Use them for inspection and break-glass repair, not as the default way to perform cluster changes. --- # Operations > Health, capacity, snapshots, backup, and upgrades for Ursula clusters. For routine cluster-wide operations (bootstrap, graceful upgrade, node migration, scale up/down, restart-recovery), use [`ursulactl`](/docs/control-cli) against an orchestrator. The endpoints documented on this page are the lower-level HTTP surface those operations build on, useful for ad-hoc inspection and break-glass repair on a single node. ## Health and readiness ```bash curl -f http://127.0.0.1:4437/healthz curl -f http://127.0.0.1:4437/readyz/local curl -f http://127.0.0.1:4437/readyz/cluster ``` | Endpoint | OK when | | --- | --- | | `/healthz` | Process is alive. | | `/readyz/local` | Local recovery and hot-payload rehydration complete. | | `/readyz/cluster` | Local payload is ready, membership has voters, and a leader is known. | `/readyz/cluster` is the right gate for rolling-restart automation. Don't move to the next node until the just-restarted node passes it. ## Cluster status ```bash curl http://127.0.0.1:4437/cluster/status ``` Returns node ID, leader, term, membership (voters + learners), Raft state, last-log / last-applied indices, and payload-readiness flags. The fields are documented in [Troubleshooting → Diagnostic endpoints](/docs/troubleshooting#diagnostic-endpoints). ## Metrics ```bash curl http://127.0.0.1:4437/metrics ``` Prometheus text format. Series families: - Public HTTP request counters and latency - Raft RPC request counters, durations, in-flight, and last-success age (`ursula_raft_rpc_*`) - Snapshot transfer counters and latency (`ursula_raft_snapshot_*`) - Hot-payload rehydrate counters (`ursula_rehydrate_*`) - Write backpressure state (`ursula_write_backpressure_*`) - Live connection and watch fan-out Start triage with `raft_rpc_*` when replication slows or elections look unstable, `raft_snapshot_*` when follower catch-up shifts into snapshot install, and `rehydrate_*` when a restarting node stays payload-incomplete. ## Capacity inspection ```bash curl http://127.0.0.1:4437/cluster/hot-cold-stats ``` Returns a snapshot of hot bytes, cold bytes, per-stream hot tallies, flush backlog, current backpressure state and reason, and cache sizes. This is the right endpoint to check when you're deciding whether to raise the hot-payload thresholds documented in [Configuration → Runtime tuning](/docs/configuration#runtime-tuning-environment-only). ## Manual Raft snapshots and log compaction `ursulactl` and the orchestrator handle snapshots automatically as part of upgrade and migration flows. For one-off operator work, the raw HTTP endpoints are: ```bash # Take a Raft snapshot of cluster state on this node curl -X POST http://127.0.0.1:4437/cluster/trigger-snapshot # After a snapshot, the leader can purge log entries below the snapshot index curl -X POST http://127.0.0.1:4437/cluster/purge-log ``` `trigger-snapshot` queues the snapshot and returns immediately; the result lands in cold storage asynchronously. `purge-log` is only safe after a snapshot has been taken and replicated. ## Rolling restarts (manual) For one-off manual restarts without the orchestrator: 1. Pick a follower (not the current leader; confirm via `/cluster/status.leader_id`). 2. Stop the process. 3. Restart it pointing at the same `server.data_dir` and config. 4. Wait for `/readyz/cluster` to return `200`. 5. Move to the next follower. 6. To restart the leader, first transfer leadership with `POST /cluster/transfer-leader`, wait for the new leader to be elected, then proceed as for a follower. For repeatable rolling restarts across a cluster, use [`ursulactl operations restart-recovery`](/docs/control-cli#cluster-lifecycle-operations) instead. It encodes leader transfer, readiness waits, and stability windows. ## Logs ```bash RUST_LOG=ursula=info ./ursula serve --config /etc/ursula/ursula.toml RUST_LOG=ursula=debug ./ursula serve --config /etc/ursula/ursula.toml ``` `info` is the expected baseline for production. `debug` is helpful when chasing replication or rehydrate issues but is verbose enough that you'll want to redirect to a file. ## Backup and disaster recovery > [!WARNING] > Ursula does not provide a backup or restore tool. There is no `--export` flag, no cluster-state dump, and no "restore from snapshot file" command. Plan your recovery story accordingly. What you can rely on: - **Quorum durability.** Acknowledged writes are durable as long as a majority of voters survives. With three voters in three AZs, any single-AZ outage is recoverable. - **Cold-tier durability.** Once a Raft snapshot or chunk has been flushed to S3, it inherits S3-grade durability. The window of acknowledged-but-unflushed data is measured in the configured flush interval (seconds). - **Recovery invariants.** A node refuses to start if its data directory is inconsistent (e.g., logs purged without a covering snapshot). This protects against silent corruption on restart. What this means in practice: - **For node-level loss**, replace the node and re-join the cluster via [`ursulactl operations node-migration`](/docs/control-cli#cluster-lifecycle-operations). The new node rehydrates from peers and cold storage. - **For total cluster loss** (all voters gone simultaneously), you can recover only what's in S3: the last persisted Raft snapshot plus any chunks. There is no tooling to spin up a fresh cluster from those S3 objects today. Treat full-cluster loss as needing a bespoke recovery procedure. - **For accidental deletion** of a specific stream, there is no point-in-time restore. Application-level snapshots (`PUT /{bucket}/{stream}/snapshot/{offset}`) are the only application-visible undo and are the writer's responsibility. If you have stricter RTO/RPO needs, snapshot the S3 bucket out-of-band (versioning, cross-region replication, or scheduled object copies) so you at least retain the durable chunks for offline recovery analysis. ## Upgrades Ursula is at v0.x. **There is no on-disk format compatibility shim and no migration code.** Across versions, expect to rebuild clusters. The runtime will not refuse to start on a stale format, but it also will not migrate anything. For routine same-version rolling restarts and for upgrades **where the team has confirmed the new build is on-disk compatible with the running one**, use: ```bash ursulactl operations graceful-upgrade \ --target-version \ --node-id 1 --node-id 2 --node-id 3 \ --wait ``` This drains each node, transfers leadership when needed, installs the new artifact, and waits for readiness before moving on. See [Control CLI → Cluster lifecycle operations](/docs/control-cli#cluster-lifecycle-operations). For upgrades that are **not** known-compatible (or any time you bump across a breaking change), the operational shape is: 1. Stop writes at the edge (proxy / application). 2. Drain the cold flush backlog (let the hot tier empty; check `/cluster/hot-cold-stats`). 3. Tear the cluster down. 4. Bring up the new version against fresh data directories. 5. Reload application state from your own sources of truth. This is intentionally heavy-handed. It will get lighter as v1.0 approaches and on-disk format compatibility is committed to. --- # Streams > Ursula's core append-only stream primitive, with naming, content type, and lifecycle. A stream is an append-only byte sequence addressed by a URL: `/{bucket}/{stream}`. Once data is written it cannot be modified or reordered; only new data can be appended. ## Naming Stream IDs may contain any byte that is not `/`, `\0`, or `..`. Maximum stream ID length is 122 bytes, and the combined bucket-plus-stream path is also capped at 122 bytes. The literal name `streams` is reserved (it's the bucket-level listing endpoint). A workspace app might use `/workspace-a/doc-123`; an agent system might use `/agents/run-2026-05-13-abc`. Anything within the rules above is fair. ## Content type A stream's content type is set on creation (PUT) or on first append (POST) and defaults to `application/octet-stream` if no `Content-Type` header is supplied. Subsequent appends must declare the same content type, or the server rejects them with `400`. The content type rides along on reads so clients can dispatch on it without inspecting payloads. ## Lifecycle Every stream is in one of these states: - **Open.** Accepts appends. The default on creation. - **Closed.** Sealed by `Stream-Closed: true` on a POST. Readers receive an EOF signal; further appends return `409`. Close is irreversible. - **Expired.** Past its `Stream-TTL` or `Stream-Expires-At` deadline. Reads and writes return `404`. Expired streams are eventually garbage-collected. - **Deleted.** Removed by `DELETE`. Reads return `404`. `Stream-TTL` (seconds, relative) and `Stream-Expires-At` (RFC 3339 absolute) are mutually exclusive; sending both yields `400`. ## Related - [Buckets](/docs/concepts/buckets): how streams are grouped - [Snapshots](/docs/concepts/snapshots): compacting a long stream - [Bootstrap](/docs/concepts/bootstrap): efficient first-load for new clients --- # Buckets > Buckets group related streams into namespaces. Streams are organized into buckets. A bucket is a namespace, like a folder that holds related streams under one URL prefix: ``` /{bucket}/{stream} ``` A collaborative editing app might use one bucket per workspace, with one stream per document: `/workspace-a/doc-1`, `/workspace-a/doc-2`. An agent platform might use one bucket per tenant: `/tenant-acme/run-xyz`. ## Naming Bucket IDs match the regex `[a-z0-9_-]{4,64}`: 4–64 bytes, lowercase ASCII letters and digits plus `_` and `-`. Uppercase letters, `/`, `.`, and most punctuation are rejected with `400`. The combined bucket-plus-stream URL path also has a 122-byte ceiling. With a 64-byte bucket name, you have 58 bytes left for the stream ID. ## Lifecycle - **`PUT /{bucket}`** creates the bucket; idempotent if it already exists. - **`GET /{bucket}`** returns metadata. - **`DELETE /{bucket}`** removes the bucket but only when empty. If any stream still exists, `DELETE` returns `409 Conflict` with `bucket_not_empty`. There is no cascading delete; remove streams first. - **`GET /{bucket}/streams`** lists streams in the bucket with optional prefix filtering and cursor-based pagination (page size up to 1000, default 1000). ## Related - [Streams](/docs/concepts/streams) - [API: list streams](/docs/api/list-streams) --- # Offsets > Position tokens inside a stream, with how to read, resume, and follow safely. An offset is a position inside a stream. Clients use offsets to read from a specific point or resume where they left off. ## Format Offsets are numeric (`u64` byte positions internally) but are returned to clients as 20-character zero-padded decimal strings, for example `"00000000000000000042"`, so they sort lexicographically. Treat them as opaque tokens: read the value from the server's response and pass it back unchanged. Two special values are accepted on read requests: - `offset=-1`: start from the very beginning of retained data (the earliest still-available offset). - `offset=now`: start from the stream's current tail; useful for "only new data" subscriptions. ## Response headers After every read or append, the server returns two related headers: - **`Stream-Next-Offset`**: always present. The numeric position to use for the next request. - **`Stream-Cursor`**: set on live (long-poll, SSE) responses. An opaque token that bundles the stream identity and epoch with the offset, so a reconnecting client lands on the same stream version it was reading before. For pure catch-up reads, `Stream-Next-Offset` is enough. For live tailing across reconnects, prefer `Stream-Cursor` (passed as `?cursor=`); it surfaces stream re-creation as a clean error rather than silently re-reading new data under the same name. ## Stability across snapshots Offsets are byte positions, not sequence numbers, and they're stable across snapshot publishes. Publishing a snapshot at offset 100 doesn't renumber later offsets; it only enables the server to garbage-collect data before offset 100. Reads to trimmed offsets return `410 Gone` with a `stream-earliest-offset` header pointing at the first still-available position. ## Related - [Read modes](/docs/concepts/read-modes) - [Snapshots](/docs/concepts/snapshots) - [Bootstrap](/docs/concepts/bootstrap): recovery when offsets you remember have been trimmed --- # Read Modes > Compare catch-up, long-poll, and SSE reads for Ursula clients. Streams support three read modes, all via `GET`: - **Catch-up**: `GET /b/s?offset=-1` returns all data from the given offset immediately. Use this to sync historical data. - **Long-poll**: `GET /b/s?offset=...&live=long-poll` returns immediately if data is available, otherwise holds the connection until new data arrives or a timeout. Good for simple polling loops. - **SSE**: `GET /b/s?offset=...&live=sse` opens a persistent Server-Sent Events connection. The server pushes new data as it arrives. This is the recommended mode for real-time frontends (`EventSource` in the browser works out of the box). --- # Snapshots > Compact stream history and accelerate first-load for new clients. As a stream grows, replaying it from `offset=-1` becomes expensive. Snapshots solve this by storing a compacted representation of all data up to a given offset. ``` PUT /{bucket}/{stream}/snapshot/{offset} publish a snapshot GET /{bucket}/{stream}/snapshot read the latest snapshot GET /{bucket}/{stream}/snapshot/{offset} read a specific snapshot ``` ## Who publishes Snapshots are an **application-level** operation. Ursula stores them but does not produce them. The writer that knows how to compute the merged state is responsible for publishing. A CRDT editor, for example, periodically computes the merged document and publishes it as the latest snapshot. An agent system might snapshot accumulated tool-call state. Application snapshots are separate from the Raft-internal snapshots Ursula takes for replication and recovery. The two never collide. ## Size and content type Snapshot bodies are capped at **128 MiB**. The snapshot has its own content type (separate from the stream's), stored as `snapshot_content_type` and defaulting to `application/octet-stream`. A binary CRDT document can be snapshotted under `application/octet-stream` even when the stream itself carries `application/json` deltas. ## Garbage collection Publishing a snapshot at offset N tells Ursula that data before N is no longer needed for application replay. Cold-tier chunks below N are enqueued for asynchronous GC immediately on publish. Reads to trimmed offsets return `410 Gone`; clients should switch to the snapshot via [`/bootstrap`](/docs/concepts/bootstrap) or seek forward. ## Immutability Snapshots are immutable once published. The `DELETE /{bucket}/{stream}/snapshot/{offset}` endpoint exists for protocol-level completeness, but Ursula refuses every delete: targeting the latest snapshot returns `409`, anything else returns `404`. To replace a snapshot, publish a new one at a higher offset. ## Related - [Bootstrap](/docs/concepts/bootstrap): fetch latest snapshot plus post-snapshot updates in one request - [API: publish snapshot](/docs/api/publish-snapshot) - [API: read snapshot](/docs/api/read-snapshot) --- # Bootstrap > Initialize a new client with one request, fetching snapshot plus post-snapshot updates. Bootstrap is the new-client initialization endpoint. A single request returns everything a client needs to catch up: 1. The latest snapshot (if any) 2. All updates after the snapshot, up to the stream's current tail ``` GET /{bucket}/{stream}/bootstrap ``` The response is `multipart/mixed`: - **First part.** The snapshot. If no snapshot exists, this part is present but empty. - **Subsequent parts.** Incremental updates from the snapshot offset (or from the earliest retained offset, if no snapshot exists). Each part has its own `Content-Type`: the snapshot part uses `snapshot_content_type`; update parts use the stream's content type. The client does not need to know in advance whether a snapshot exists. The same parsing path handles both cases. ## When to use bootstrap vs catch-up | You want… | Use | | --- | --- | | Latest state, fastest first-load | `/bootstrap` | | Replay every byte from the beginning | `GET /{b}/{s}?offset=-1` | | Resume from a known offset that's still retained | `GET /{b}/{s}?offset=N` | | Resume from an offset the server has since trimmed (`410 Gone`) | `/bootstrap`, then continue from the snapshot offset | ## Live continuation Bootstrap does not accept `?live=sse`. Combining the multipart body with an SSE event stream is rejected with `400`. To go live after bootstrap, finish reading the bootstrap response, then open a separate `GET /{b}/{s}?offset=&live=sse` (or pass the `cursor` returned by bootstrap). ## Related - [Snapshots](/docs/concepts/snapshots) - [Read modes](/docs/concepts/read-modes) --- # Exactly-Once Writes > Producer identifiers, epochs, and sequence numbers deliver deduplicated appends. Network retries can produce duplicate appends. Ursula provides server-side deduplication so your application doesn't need its own bookkeeping. Three headers form the identity: - **`Producer-Id`**: a stable client identifier (UUID, hostname, etc.). - **`Producer-Epoch`**: bumped on producer restart. Must be ≥ the last epoch the server saw from this producer. - **`Producer-Seq`**: a per-epoch sequence number. Starts at `0` for a new epoch and must increase by exactly `1` per append. Both epoch and seq are capped at `2^53 − 1` so they survive a round-trip through JSON. ## How dedup works The server records the highest `(epoch, seq)` accepted from each `Producer-Id` per stream. On an append: - **Exact duplicate** (same epoch and same seq as already accepted): silently deduplicated; the response carries the existing offset. - **Next in sequence** (`seq = last_seq + 1`): accepted. - **Out of order** (seq skips ahead or goes backward): rejected with `409 producer_seq_conflict`; the response body includes the `expected_seq` the server wanted. - **Epoch regression** (new epoch less than last accepted): rejected with `409`. ## Restart and epoch hygiene When a producer crashes and restarts, **bump `Producer-Epoch`**. Otherwise the server still expects the next contiguous seq within the old epoch, and the producer (which has lost its in-memory seq counter) will collide. A new epoch starts the seq counter back at `0`. ## Per-stream state Dedup state is scoped per stream. The same `Producer-Id` writing to two different streams maintains two independent epoch/seq counters, so you do not have to elect a single global writer per producer. The server retains only the highest accepted `(epoch, seq)` per `(producer_id, stream)` pair. Older epochs are dropped once a higher epoch is observed. ## When you don't need this For append-only workloads where each write is independent (event logging, audit trails), exactly-once headers are optional; just POST without them. Add them when retries are real and double-applies would be visible. ## Related - [Conditional writes](/docs/concepts/conditional-writes): coordinate multiple writers, different problem - [API: append](/docs/api/append) --- # Conditional Writes > Coordinate multiple writers using If-Match and Stream-Seq without adding locks or transactions. Multiple writers appending to the same stream may need coordination. Ursula supports two forms of optimistic concurrency control via request headers, without locks or transactions. ## If-Match `If-Match` guards an append against a stale view of the stream. The value is an ETag returned by a previous `HEAD` or read response. If the stream's current ETag doesn't match, the server rejects the append with `412 Precondition Failed`. ```bash # Get the current ETag etag=$(curl -s -I 'http://127.0.0.1:4437/demo/hello' | grep -i etag | tr -d '\r' | awk '{print $2}') # Append only if nobody else has written since curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ -H "If-Match: $etag" \ --data-binary 'conditional update' ``` Use this when your write depends on the stream being in a known state. For example, a CRDT client that only wants to append if its local view is up to date. ## Stream-Seq `Stream-Seq` is a client-supplied monotonic sequence token. The server tracks the last accepted value per stream and rejects any append whose `Stream-Seq` is not lexicographically greater than the previous one. This is useful when a single logical writer wants to enforce ordering without relying on server-side ETags. For instance, an agent that numbers its steps and wants the server to reject out-of-order delivery. ## When to use which - **If-Match**: multi-writer coordination. "Only write if the stream hasn't changed since I last looked." - **Stream-Seq**: single-writer ordering. "Reject this if my writes arrive out of order." - **Neither**: append-only workloads where every write is independent (e.g. event logging). Just POST. --- # Durability and Consistency > Understand Ursula's durability guarantees, linearizable writes, and hot-cold storage model. ## When is a write durable? An append is acknowledged after a **majority of nodes** have replicated it. A single node failure cannot lose an acknowledged write. No write is ever acknowledged based on a single copy. Acknowledged data is held in replicated hot state across the cluster. A background process then flushes data to S3 on a configurable interval, after which the data inherits S3-grade durability. In a cross-region deployment with a 5-second flush interval, Ursula provides approximately 9-10 nines of per-message durability. The window where acknowledged-but-unflushed data is at risk from a simultaneous multi-region failure is measured in seconds. ## Consistency model Ursula provides **linearizable writes**. All appends are serialized through a single leader node, which assigns a total order. Reads on the leader are linearizable: if an append has been acknowledged, any subsequent read on the leader will reflect that write. Follower reads are eventually consistent and may lag slightly behind the leader (typically sub-millisecond). They will never see data out of order or observe a write that was later rolled back, but a follower may not yet have applied the most recent committed writes. The `Stream-Up-To-Date` response header indicates whether the serving node has fully caught up. ## Data lifecycle Writes first enter replicated hot state across cluster nodes. This is where acknowledgment happens and where recent reads are served from. A background process flushes older data to S3 on a configurable interval. Once flushed, the data gains S3-grade durability and nodes can recover it without replaying the full history. Reads fall through transparently from hot to cold storage. ## Availability and durability The estimates below assume a standard Ursula deployment: three voting replicas across availability zones, with two non-voting replicas in a second region. | Property | What it means | |---|---| | **Write availability** | Writes continue as long as a majority of voting replicas remain healthy. In this layout, that means the system maintains write quorum through any single voting-AZ failure. Non-voting replicas do not participate in write quorum, but they hold additional copies of acknowledged data for durability and read serving. | | **Read availability** | Any healthy replica can serve reads, including non-voting replicas. Read availability can outlast write quorum in some failure scenarios. | | **Per-message durability** | Approximately 9-10 nines with a 5-second flush interval. Writes are acknowledged after quorum replication in hot state. Full durability follows after background flush to S3. | | **Annual zero-loss probability** | Approximately 3-4 nines. The probability of experiencing no data-loss events over a full year. | --- # Length-Prefixed Framing > A simple convention for delimiting messages inside raw byte streams. Streams are raw byte sequences. The protocol does not impose message boundaries. Every append and every read returns whatever bytes are in flight, possibly mid-record. For streams carrying many small messages (CRDT operations, agent events), Ursula recommends a simple framing convention: ``` [4-byte big-endian length][payload bytes] ``` Each application-level append writes one framed record. Each read returns a concatenation of framed records that the client parses incrementally. ## Server is frame-agnostic The server **does not** validate or enforce frames. It stores and serves raw bytes. That means: - A read may end mid-frame; the client must handle partial frames and resume parsing on the next read. - An ill-formed frame (wrong length, truncated payload) is the producer's bug, not a server-rejectable error. - The framing convention is per-application; writers and readers of a stream must agree on it. ## When you don't need framing - Streams using `application/json` typically use newline-delimited JSON or a single self-describing document; JSON has its own delimiters. - Streams holding one logical blob per stream don't need framing at all. Use framing when one stream carries many independently meaningful messages and the client wants to dispatch on each one without scanning. ## Related - [Streams](/docs/concepts/streams) - [Binary SSE](/docs/concepts/binary-sse): over SSE, each event is one delivery; framing lets one event carry multiple records. --- # Binary SSE > How Ursula delivers binary payloads over the Server-Sent Events transport. The SSE wire format is text-only: every `data:` field must be valid UTF-8. For streams with text-shaped content types (`text/*`, `application/json`), SSE works out of the box: each event's `data` field carries the bytes verbatim. For other content types (`application/octet-stream`, custom binary), Ursula wraps each event in a JSON envelope and base64-encodes the payload: ```json { "encoding": "base64", "contentType": "application/octet-stream", "payload": "aGVsbG8=" } ``` The choice is automatic, determined by the stream's content type. There is no client opt-in; what you get is what the stream's content type implied. The response advertises the choice with the `Stream-Sse-Data-Encoding` header: `base64` for wrapped streams, absent for raw. ## Decoding For wrapped streams, decode each event as JSON, then base64-decode `payload`. For raw streams, the `data` field is already the payload (UTF-8 text or JSON). Browser `EventSource` works for both modes; only the `data` handler differs. ## Related - [Read modes](/docs/concepts/read-modes) - [API: read](/docs/api/read) - [Length-prefixed framing](/docs/concepts/len-prefixed-framing): when one SSE event contains multiple application records --- # Overview ## The problem If you want to build a collaborative app, a real-time AI agent, or anything that needs to push state changes to multiple clients the moment they happen, you need a durable stream. Something you can append to, read back from any point, and subscribe to for live updates. The obvious place to build this is on top of object storage. S3 is cheap, durable, and infinitely scalable. But if you've tried to build a low-latency write path on S3, you've probably hit two problems that no amount of clever engineering can fully hide. The first is that S3 has no append operation. Objects are immutable. Every write creates a new object, which means you either pay per-PUT at high frequency or you batch writes and accept the latency that comes with waiting for a batch to fill. The second is that S3's conditional writes are optimistic. Under concurrent writers, they degrade into retry storms where most attempts conflict, back off, and try again. Latency grows exponentially with contention. ([This post from Chroma](https://www.trychroma.com/engineering/wal3) explains the mechanics well.) [S2](https://s2.dev/blog/intro) addresses the latency problem by writing to S3 Express One Zone, which gets them under 50ms. But Express storage costs 7x more than Standard, and the optimistic concurrency model is still there underneath. [WarpStream](https://www.warpstream.com/blog/minimizing-s3-api-costs-with-warpstream) goes the other direction: batch everything to S3 Standard and keep costs low, but accept 250ms+ p50 latency as a consequence. Both are making the best of what the S3 write path allows. The fact that multiple teams are independently building the same thing tells you something: "append to a stream, read it back, subscribe for updates" is a general-purpose primitive, not an implementation detail of any one product. ## How Ursula works Ursula sidesteps this trade-off by not using S3 for the write path at all. ### The interface The API implements the [Durable Streams](https://durablestreams.com/) protocol, an open specification (MIT licensed) published by [Electric](https://electric-sql.com/). It defines the primitive that keeps coming up: create a stream, append to it, read from any offset, and tail it in real time, with exactly-once semantics. The whole thing runs over plain HTTP with SSE for live subscriptions, so there is no custom binary protocol and no mandatory client library. But a protocol only defines the interface. The hard part is what sits underneath, specifically the write path. "Append this record durably based on S3" is easy to say. Making it fast, consistent, and cheap is where the architectural choices live. ### The write path Instead, writes go to a cluster of nodes that coordinate through a [consensus algorithm](https://en.wikipedia.org/wiki/Consensus_(computer_science)). One node acts as the leader and sequences every append into a replicated log. Once a majority of nodes have persisted the entry, it's committed and the client gets back a confirmed offset. There are no CAS races, no retry storms, and no batching delays. Appending is just appending. Each node keeps the recent portion of every stream in memory, so reads serve from local state with no network round-trip. A background process flushes older data to S3 Standard for long-term retention. When a client reads an offset that has left the hot window, the read transparently pulls from S3. From the client's perspective, there's just one stream with one API. The hot-to-cold boundary is invisible. ``` Clients │ ┌──────┴──────┐ │ HTTP API │ │ /{b}/{s} │ └──────┬──────┘ │ ┌────────────┼────────────┐ v v v ┌─────────┐ ┌─────────┐ ┌─────────┐ │ Node A │ │ Node B │ │ Node C │ │ leader │ │follower │ │follower │ ├─────────┤ ├─────────┤ ├─────────┤ │ OpenRaft│ │ OpenRaft│ │ OpenRaft│ ├─────────┤ ├─────────┤ ├─────────┤ │ state │ │ state │ │ state │ │ machine │ │ machine │ │ machine │ ├─────────┤ ├─────────┤ ├─────────┤ │ RocksDB │ │ RocksDB │ │ RocksDB │ │ (hot) │ │ (hot) │ │ (hot) │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ └────────────┼────────────┘ v ┌─────────────┐ │ Cold storage│ │ (fs / S3) │ └─────────────┘ ``` The whole thing is written in async Rust. ## Why this matters now A year ago, most applications could get away without a durable stream. State lived in a database, updates happened through polling or webhooks, and if something got lost in transit, the user refreshed the page. That's changing fast, for a few reasons. The first is AI agents. When an agent takes an action on behalf of a user, other agents and the user both need to see it immediately and be able to replay the full history. Traditional request-response APIs don't support subscribing to a feed of changes. WebSockets are real-time but ephemeral. What agents need is something in between: a persistent, ordered log they can write to and read from, ideally over a protocol they already understand. HTTP and SSE fit that bill, and the Durable Streams protocol is exactly this interface. Collaborative applications also have gone from niche to default. The Notion/Figma model, where multiple users see each other's changes in real time, is now what users expect from any tool that involves shared state. Every team building one of these apps ends up building their own sync layer on top of WebSockets, CRDTs, or a message queue. A durable stream is the primitive that makes all of those layers simpler, because it gives you a single source of truth that supports both catch-up reads and live tailing. --- # API overview > Learn the public Ursula HTTP routes for creating, appending, reading, and replaying streams. Ursula exposes a small public HTTP API for durable append-only streams. Most users only need the `/{bucket}/{stream}` route family, which maps cleanly to buckets and stream IDs. This API is Ursula's implementation of the [Durable Streams Protocol](/docs/specs/durable-stream), defined by the durable-streams community, with additional route families for compatibility and deployment needs. ## Bucket operations | Method | Path | Description | | -------- | ----------------------------- | ------------------------------------------------------------ | | `PUT` | `/{bucket}` | [Create a bucket](/docs/api/create-bucket) | | `GET` | `/{bucket}` | [Get bucket metadata](/docs/api/get-bucket) | | `DELETE` | `/{bucket}` | [Delete a bucket](/docs/api/delete-bucket) | | `GET` | `/{bucket}/streams` | [List streams in a bucket](/docs/api/list-streams) | ## Stream operations | Method | Path | Description | | -------- | ----------------------------- | ------------------------------------------------------------ | | `PUT` | `/{bucket}/{stream}` | [Create a stream](/docs/api/create-stream) | | `POST` | `/{bucket}/{stream}` | [Append data or close](/docs/api/append) | | `GET` | `/{bucket}/{stream}` | [Read (catch-up, long-poll, SSE)](/docs/api/read) | | `HEAD` | `/{bucket}/{stream}` | [Get stream metadata](/docs/api/head-stream) | | `DELETE` | `/{bucket}/{stream}` | [Delete a stream](/docs/api/delete-stream) | ## Bootstrap and snapshots | Method | Path | Description | | -------- | --------------------------------------------- | ---------------------------------------------------- | | `GET` | `/{bucket}/{stream}/bootstrap` | [Snapshot + tail replay](/docs/api/bootstrap) | | `GET` | `/{bucket}/{stream}/snapshot` | [Read latest snapshot](/docs/api/read-snapshot) | | `GET` | `/{bucket}/{stream}/snapshot/{offset}` | [Read snapshot at offset](/docs/api/read-snapshot) | | `PUT` | `/{bucket}/{stream}/snapshot/{offset}` | [Publish a snapshot](/docs/api/publish-snapshot) | ## v1 compatibility A flat [v1 compatibility layer](/docs/api/v1-compatibility) is also available under `/v1/stream/{path}` for simpler integrations that don't need explicit bucket management. ## Common request patterns ### Create a bucket and stream ```bash curl -X PUT http://127.0.0.1:4437/demo curl -X PUT http://127.0.0.1:4437/demo/hello ``` ### Append data ```bash curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'hello world' ``` ### Read from the beginning ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` ### Subscribe with SSE ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1&live=sse' ``` ## Related concepts - [Read modes](/docs/concepts/read-modes): catch-up vs long-poll vs SSE - [Bootstrap](/docs/concepts/bootstrap): snapshot + tail recovery - [Snapshots](/docs/concepts/snapshots): snapshot lifecycle - [Exactly-once writes](/docs/concepts/exactly-once-writes): producer deduplication - [Conditional writes](/docs/concepts/conditional-writes): ETag and sequence-based writes --- # Create bucket > Create a new bucket to organize streams. Bucket ID. Must match `[a-z0-9_-]{4,64}`. ## Response | Status | Meaning | | ------ | ---------------------------- | | `201` | Bucket created successfully. | | `400` | Invalid bucket ID. | | `409` | Bucket already exists. | ```bash Example curl -X PUT http://127.0.0.1:4437/demo ``` > [!NOTE] > Write requests are automatically forwarded to the cluster leader. If the current node is not the leader, you will receive a `307` redirect with a `Location` header pointing to the leader. --- # Get bucket > Retrieve metadata for a bucket. Bucket ID. Must match `[a-z0-9_-]{4,64}`. ## Response | Status | Meaning | | ------ | ------------------ | | `200` | Bucket found. | | `400` | Invalid bucket ID. | | `404` | Bucket not found. | The bucket identifier. Number of streams in the bucket. ```bash Example curl http://127.0.0.1:4437/demo ``` ```json 200 { "bucket_id": "demo", "streams": 3 } ``` --- # Delete bucket > Delete an empty bucket. Bucket ID. Must match `[a-z0-9_-]{4,64}`. ## Response | Status | Meaning | | ------ | -------------------- | | `204` | Bucket deleted. | | `400` | Invalid bucket ID. | | `404` | Bucket not found. | | `409` | Bucket is not empty. | ```bash Example curl -X DELETE http://127.0.0.1:4437/demo ``` --- # List streams > List streams within a bucket with optional prefix filtering and pagination. Bucket ID. Must match `[a-z0-9_-]{4,64}`. Only return streams whose ID starts with this prefix. Cursor for pagination. Return streams after this stream ID. Maximum number of streams to return. Must be between 1 and 1000. ## Response | Status | Meaning | | ------ | ---------------------------------------- | | `200` | Streams listed. | | `400` | Invalid bucket ID or limit out of range. | | `404` | Bucket not found. | The bucket identifier. The prefix filter applied. Number of streams in this page. Stream ID (local to the bucket). Stream status: `open` or `closed`. The stream's content type. Current tail offset (next writable position). Creation timestamp in milliseconds since epoch. Last write timestamp in milliseconds since epoch. Cursor for the next page. Pass as `after` to fetch the next page. Whether more streams exist beyond this page. ```bash All streams curl 'http://127.0.0.1:4437/demo/streams' ``` ```bash With prefix and pagination curl 'http://127.0.0.1:4437/demo/streams?prefix=session-&limit=100' ``` ```json 200 { "bucket_id": "demo", "prefix": "", "stream_count": 2, "streams": [ { "stream_id": "hello", "status": "open", "content_type": "application/json", "tail_offset": 42, "created_at_ms": 1711234567890, "last_write_at_ms": 1711234599000 } ], "next_cursor": "hello", "has_more": false } ``` --- # Create stream > Create a new append-only stream within a bucket. Bucket ID. Must match `[a-z0-9_-]{4,64}`. Stream ID within the bucket. Max 122 bytes; cannot contain `/`, `\0`, or `..`. The literal name `streams` is reserved (it routes to bucket-level listing). Combined `bucket/stream` path is also capped at 122 bytes. Content type of the initial payload (e.g. `application/json`). Becomes the stream's content type. Set to `true` to close the stream immediately after creation. Time-to-live in seconds. The stream will expire after this duration. Absolute expiration timestamp (RFC 3339). Mutually exclusive with `Stream-TTL`. Conditional create: only succeed if the stream's ETag matches. Client-supplied sequence token for conflict detection. Producer identity for [exactly-once writes](/docs/concepts/exactly-once-writes). Producer epoch (must accompany `Producer-Id`). Producer sequence number (must accompany `Producer-Id`). Optional initial payload. If provided, becomes the first entry in the stream. ## Response | Status | Meaning | | ------ | ------------------------------------------------------------------------- | | `201` | Stream created. | | `200` | Stream already exists with identical configuration (idempotent). | | `400` | Invalid stream ID, invalid headers, or bad JSON payload. | | `404` | Bucket not found. | | `409` | Stream already exists with different configuration, or sequence conflict. | | `412` | `If-Match` precondition failed. | Response headers include `Location`, `Content-Type`, `Stream-Next-Offset`, and `ETag`. ```bash Create empty stream curl -X PUT http://127.0.0.1:4437/demo/hello ``` ```bash Create with initial payload curl -X PUT http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/json' \ --data-binary '{"msg": "first entry"}' ``` ```bash Create with TTL curl -X PUT http://127.0.0.1:4437/demo/ephemeral \ -H 'Stream-TTL: 3600' ``` > [!NOTE] > If a stream with the same ID already exists and has identical configuration, the response is `200 OK` (idempotent). If the existing stream differs in content type, closed state, or retention, the response is `409 Conflict`. --- # Append > Append data to an existing stream, or close it. Bucket ID. Stream ID within the bucket. Must match the stream's content type (set at PUT or first POST). Required when the body is non-empty; mismatch returns `400`. Set to `true` to close the stream after this append. Conditional write: only succeed if the stream's ETag matches. Client-supplied sequence token for conflict detection. Stable producer identity (UUID, hostname, etc.) for [exactly-once writes](/docs/concepts/exactly-once-writes). Dedup state is per-stream. Producer epoch. Bumped on producer restart; must be ≥ the last epoch the server accepted for this `Producer-Id`. A new epoch resets the seq counter. Max value `2^53 − 1`. Producer sequence number. Starts at `0` for a new epoch and must increase by exactly `1` per append. Exact `(epoch, seq)` duplicates are silently deduplicated. Max value `2^53 − 1`. The bytes to append. Must not be empty unless `Stream-Closed: true` is set (close-only request). ## Response | Status | Meaning | | ------ | ------------------------------------------------------------------ | | `200` | Append successful. | | `400` | Empty body without `Stream-Closed: true`, missing content type, or bad JSON. | | `404` | Stream not found. | | `409` | Stream is already closed, or sequence/producer conflict. | | `412` | `If-Match` precondition failed. | | `503` | Write backpressure. Retry after the duration in `Retry-After`. | Response headers include `Stream-Next-Offset` and `ETag`. ```bash Append binary curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/octet-stream' \ --data-binary 'hello world' ``` ```bash Append JSON curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Content-Type: application/json' \ --data-binary '{"event": "click", "ts": 1711234567}' ``` ```bash Close a stream curl -X POST http://127.0.0.1:4437/demo/hello \ -H 'Stream-Closed: true' ``` > [!NOTE] > Appends to JSON streams are validated and normalized. The server may coalesce multiple concurrent appends into a single batch for performance. --- # Read stream > Read data from a stream using catch-up, long-poll, or SSE modes. Bucket ID. Stream ID within the bucket. Starting offset. Use `-1` to read from the beginning, or a numeric offset. Opaque cursor token returned by a previous read. Alternative to `offset`. Alias for `cursor`. Live mode: `sse` for Server-Sent Events, `long-poll` for long-polling. Omit for catch-up read. Maximum bytes to return in a single response (catch-up and long-poll only). ## Read modes No `live` parameter. Returns all available data from the given offset immediately. ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` `live=long-poll`. Returns immediately if data is available, otherwise holds the connection until new data arrives or a ~3 second timeout. ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=42&live=long-poll' ``` `live=sse`. Opens a persistent Server-Sent Events connection. The server pushes data events as new entries are appended. Includes periodic heartbeat comments. ```bash curl 'http://127.0.0.1:4437/demo/hello?offset=-1&live=sse' ``` ## Response | Status | Meaning | | ------ | ------------------------------------------------------------- | | `200` | Data returned (catch-up or long-poll with data). | | `204` | No new data at the requested offset (catch-up only). | | `400` | Invalid offset or live mode. | | `404` | Stream not found or expired. | | `410` | Requested offset has been trimmed (data no longer available). | Response headers include `Stream-Next-Offset`, `Stream-Cursor`, `ETag`, `Stream-Up-To-Date`, `Stream-Closed`, and `Content-Type`. ## SSE event format In SSE mode, the server sends: - **Data events** (`event: data`): stream payload in the `data` field. For binary streams, data is base64-encoded (controlled by the `Stream-Sse-Data-Encoding` header). - **Control events** (`event: control`): JSON metadata including the current offset and stream state. - **Heartbeat comments**: periodic `:` lines to keep the connection alive through proxies. ```bash Catch-up curl 'http://127.0.0.1:4437/demo/hello?offset=-1' ``` ```bash Long-poll curl 'http://127.0.0.1:4437/demo/hello?offset=42&live=long-poll' ``` ```bash SSE tail curl 'http://127.0.0.1:4437/demo/hello?offset=-1&live=sse' ``` > [!NOTE] > See [read modes](/docs/concepts/read-modes), [binary SSE](/docs/concepts/binary-sse), and [offsets](/docs/concepts/offsets) for more details. --- # Head stream > Get stream metadata without reading its content. Bucket ID. Stream ID within the bucket. ETag for conditional request. Returns `304` if the stream state has not changed. ## Response | Status | Meaning | | ------ | ------------------------------------------------------ | | `200` | Stream found. | | `304` | Stream state unchanged (when `If-None-Match` matches). | | `404` | Stream not found or expired. | Response headers include: | Header | Description | | ------------------------ | --------------------------------------------------------- | | `Content-Type` | The stream's content type. | | `Stream-Next-Offset` | The next writable offset (= current length). | | `ETag` | Stream state ETag (encodes offset and open/closed state). | | `Stream-Closed` | Present and `true` if the stream is closed. | | `Stream-Snapshot-Offset` | Present if a snapshot exists, showing its offset. | | `Stream-TTL` | Remaining TTL in seconds (if a TTL was set). | | `Stream-Expires-At` | Expiration timestamp (if set). | | `Cache-Control` | `no-store`. | ```bash Example curl -I http://127.0.0.1:4437/demo/hello ``` > [!TIP] > Use `If-None-Match` with a previously received `ETag` to efficiently poll for state changes without transferring data. --- # Delete stream > Delete a stream and its data. Bucket ID. Stream ID within the bucket. ## Response | Status | Meaning | | ------ | ----------------- | | `204` | Stream deleted. | | `404` | Stream not found. | ```bash Example curl -X DELETE http://127.0.0.1:4437/demo/hello ``` > [!WARNING] > Deletion is permanent. The stream's data will be asynchronously garbage-collected after the delete is committed. --- # Bootstrap > Recover full stream state from snapshot plus retained updates in a single request. Returns the stream's latest snapshot (if any) plus all retained updates after the snapshot offset, packed as a `multipart/mixed` response. This is the recommended way to initialize a client that needs the complete current state of a stream. Bucket ID. Stream ID within the bucket. Bootstrap does not accept `?live=sse`. Combining the multipart body with an SSE event stream is rejected with `400`. To go live after bootstrap, finish the multipart response, then open a separate `GET /{bucket}/{stream}?offset=&live=sse`. ## Response | Status | Meaning | | ------ | --------------------------------------------- | | `200` | Bootstrap response with snapshot and updates. | | `400` | Invalid query parameters (including `live=sse`). | | `404` | Stream not found or expired. | | `410` | Requested offset has been trimmed. | Response headers include: | Header | Description | | ------------------------ | ---------------------------------------------------------------- | | `Content-Type` | `multipart/mixed; boundary=...` | | `Stream-Next-Offset` | The offset after the last included update. | | `Stream-Snapshot-Offset` | The snapshot offset (or `none` if no snapshot exists). | | `Stream-Up-To-Date` | `true` if the response includes all data up to the tail. | | `Stream-Closed` | Present and `true` if the stream is closed and fully caught up. | ## Response body The body is a `multipart/mixed` message: **Snapshot part** The first part is the snapshot blob (or an empty part if no snapshot exists). **Update parts** Subsequent parts are individual update messages appended after the snapshot offset. For JSON streams, each update is a separate `application/json` part. ```bash Example curl 'http://127.0.0.1:4437/demo/hello/bootstrap' ``` > [!TIP] > After bootstrapping, switch to [SSE reads](/docs/api/read) with `live=sse` starting from the `Stream-Next-Offset` to receive real-time updates. > [!NOTE] > See [bootstrap](/docs/concepts/bootstrap) and [snapshots](/docs/concepts/snapshots) for the conceptual model. --- # Read snapshot > Read the latest snapshot or a snapshot at a specific offset. ## Latest snapshot `GET /{bucket}/{stream}/snapshot` redirects to the latest published snapshot's offset-specific URL. Bucket ID. Stream ID. | Status | Meaning | | ------ | ------------------------------------------------------------- | | `307` | Redirect to `/{bucket}/{stream}/snapshot/{offset}`. | | `404` | Stream not found, expired, or no snapshot has been published. | Response headers include `Location`, `Stream-Next-Offset`, `Stream-Snapshot-Offset`, and `Stream-Up-To-Date`. --- ## Snapshot at offset `GET /{bucket}/{stream}/snapshot/{offset}` returns the snapshot blob at a specific offset. Snapshot offset: the 20-character zero-padded decimal token returned by previous reads or `Stream-Snapshot-Offset` headers. | Status | Meaning | | ------ | -------------------------------------- | | `200` | Snapshot blob returned. | | `404` | Stream, snapshot, or offset not found. | Response headers include `Content-Type`, `Stream-Next-Offset`, `Stream-Snapshot-Offset`, `Stream-Up-To-Date`, and `Stream-Closed`. ```bash Follow redirect to latest curl -L 'http://127.0.0.1:4437/demo/hello/snapshot' ``` ```bash Read specific offset curl 'http://127.0.0.1:4437/demo/hello/snapshot/00000000000000000042' ``` > [!NOTE] > Snapshot reads go through a linearizable freshness check to ensure you see the latest published snapshot. If the snapshot blob hasn't replicated to the current node yet, the request may be redirected to the leader. > [!NOTE] > See [snapshots](/docs/concepts/snapshots) for the snapshot lifecycle. --- # Publish snapshot > Upload a snapshot blob at a specific stream offset. Publishes a new snapshot blob at the given offset. The snapshot replaces any previously published snapshot and enables cold storage cleanup of data before this offset. Bucket ID. Stream ID. Stream offset this snapshot represents: the 20-character zero-padded decimal token. Content type of the snapshot blob, stored separately from the stream's own content type. Defaults to `application/octet-stream`. The snapshot blob bytes. Maximum size is **128 MiB**; larger bodies return `413`. ## Response | Status | Meaning | | ------ | --------------------------------------------------------------- | | `204` | Snapshot published successfully. | | `400` | Invalid offset or content type. | | `404` | Stream not found or expired. | | `409` | Stale publish (a newer snapshot exists) or offset out of range. | | `410` | Snapshot offset is too old (data already garbage-collected). | | `413` | Snapshot body exceeds the maximum allowed size. | Response headers include `Stream-Next-Offset` and `Stream-Snapshot-Offset`. ```bash Example curl -X PUT 'http://127.0.0.1:4437/demo/hello/snapshot/00000000000000000042' \ -H 'Content-Type: application/json' \ --data-binary '{"state": "aggregated snapshot data"}' ``` --- ## Delete snapshot `DELETE /{bucket}/{stream}/snapshot/{offset}` Attempting to delete the current visible snapshot is not allowed. | Status | Meaning | | ------ | ----------------------------------- | | `404` | No snapshot at the given offset. | | `409` | Cannot delete the current snapshot. | > [!NOTE] > Snapshots are immutable once published. Each publish creates a new blob with a unique key, so retries cannot overwrite a committed snapshot. After a snapshot is committed, cold storage chunks before the snapshot offset are automatically cleaned up. --- # v1 compatibility > Flat Durable Streams protocol routes under /v1/stream/. Ursula provides a compatibility layer under `/v1/stream/` that implements the [Durable Streams Protocol](/docs/specs/durable-stream), defined by the durable-streams community, with a flat path model. This is useful when you want a simpler path structure without explicit bucket management. ## Path mapping The v1 path is mapped to the bucket/stream model using the first `/` as the separator: | v1 path | Bucket | Stream | | ------------------------------- | ---------- | ---------- | | `/v1/stream/mystream` | `_default` | `mystream` | | `/v1/stream/workspace/mystream` | `workspace`| `mystream` | | `/v1/stream/a/b/c` | `a` | `b/c` | Paths without a `/` are placed in the `_default` bucket. ## Routes These routes accept the same headers, query parameters, and body formats as their `/{bucket}/{stream}` equivalents: | Method | Path | Equivalent | | -------- | ------------------- | ----------------------------------- | | `PUT` | `/v1/stream/{path}` | [Create stream](/docs/api/create-stream) | | `POST` | `/v1/stream/{path}` | [Append](/docs/api/append) | | `GET` | `/v1/stream/{path}` | [Read](/docs/api/read) | | `HEAD` | `/v1/stream/{path}` | [Head stream](/docs/api/head-stream) | | `DELETE` | `/v1/stream/{path}` | [Delete stream](/docs/api/delete-stream) | ## Differences from bucketed routes - Buckets do not need to be created beforehand. Streams are created without bucket-existence validation. - No bucket-level operations (create/get/delete bucket, list streams). Use the `/` routes for those. - No snapshot or bootstrap routes. Use the `/` equivalents. - Stream close uses the `Stream-Closed: true` header on POST, not a path suffix. ```bash Create curl -X PUT http://127.0.0.1:4437/v1/stream/mystream ``` ```bash Append curl -X POST http://127.0.0.1:4437/v1/stream/mystream \ -H 'Content-Type: application/octet-stream' \ --data-binary 'hello' ``` ```bash Read curl 'http://127.0.0.1:4437/v1/stream/mystream?offset=-1' ``` ```bash SSE tail curl 'http://127.0.0.1:4437/v1/stream/mystream?offset=-1&live=sse' ``` --- # Durable Streams Protocol > The base HTTP protocol specification for creating, appending to, and reading from durable, append-only byte streams. > [!NOTE] > This page is a verbatim mirror of the upstream Durable Streams Protocol specification, authored by ElectricSQL. The authoritative source is [github.com/durable-streams/durable-streams](https://github.com/durable-streams/durable-streams/blob/main/PROTOCOL.md); please open issues and pull requests against that repository, not Ursula's. Ursula's own [extensions](/docs/specs/extensions) are documented separately. **Document:** Durable Streams Protocol **Version:** 1.0 **Author:** ElectricSQL --- ## Abstract This document specifies the Durable Streams Protocol, an HTTP-based protocol for creating, appending to, and reading from durable, append-only byte streams. The protocol provides a simple, web-native primitive for applications requiring ordered, replayable data streams with support for catch-up reads, live tailing, and explicit stream closure (EOF). It is designed to be a foundation for higher-level abstractions such as event sourcing, database synchronization, collaborative editing, AI conversation histories, and finite response streaming. ## Copyright Notice Copyright (c) 2025 ElectricSQL ## Table of Contents 1. [Introduction](#1-introduction) 2. [Terminology](#2-terminology) 3. [Protocol Overview](#3-protocol-overview) 4. [Stream Model](#4-stream-model) - 4.1. [Stream Closure](#41-stream-closure) 5. [HTTP Operations](#5-http-operations) - 5.1. [Create Stream](#51-create-stream) - 5.2. [Append to Stream](#52-append-to-stream) - 5.2.1. [Idempotent Producers](#521-idempotent-producers) - 5.3. [Close Stream](#53-close-stream) - 5.4. [Delete Stream](#54-delete-stream) - 5.5. [Stream Metadata](#55-stream-metadata) - 5.6. [Read Stream - Catch-up](#56-read-stream---catch-up) - 5.7. [Read Stream - Live (Long-poll)](#57-read-stream---live-long-poll) - 5.8. [Read Stream - Live (SSE)](#58-read-stream---live-sse) 6. [Offsets](#6-offsets) 7. [Content Types](#7-content-types) 8. [Caching and Collapsing](#8-caching-and-collapsing) 9. [Extensibility](#9-extensibility) 10. [Security Considerations](#10-security-considerations) 11. [IANA Considerations](#11-iana-considerations) 12. [References](#12-references) --- ## 1. Introduction Modern web and cloud applications frequently require ordered, durable sequences of data that can be replayed from arbitrary points and tailed in real time. Common use cases include: - Database synchronization and change feeds - Event-sourced architectures - Collaborative editing and CRDTs - AI conversation histories and token streaming - Workflow execution histories - Real-time application state updates - Finite response streaming (proxied HTTP responses, job outputs, file transfers) While these patterns are widespread, the web platform lacks a simple, first-class primitive for durable streams. Applications typically implement ad-hoc solutions using combinations of databases, queues, and polling mechanisms, each reinventing similar offset-based replay semantics. The Durable Streams Protocol provides a minimal HTTP-based interface for durable, append-only byte streams. It is intentionally low-level and byte-oriented, allowing higher-level abstractions to be built on top without protocol changes. ## 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. **Stream**: A URL-addressable, append-only byte stream that can be read and written to. A stream is simply a URL; the protocol defines how to interact with that URL using HTTP methods, query parameters, and headers. Streams are durable and immutable by position; new data can only be appended. **Offset**: An opaque, lexicographically sortable token that identifies a position within a stream. Clients use offsets to resume reading from a specific previously reached point. **Content Type**: A MIME type set on stream creation that describes the format of the stream's bytes. The content type is returned on reads and may be used by clients to interpret message boundaries. **Tail Offset**: The offset immediately after the last byte in the stream. This is the position where new appends will be written. **Closed Stream**: A stream that has been explicitly closed by a writer. Once closed, a stream is in a terminal state: no further appends are permitted, and readers can observe the closure as an end-of-stream (EOF) signal. Closure is durable and monotonic — once closed, a stream remains closed. ## 3. Protocol Overview The Durable Streams Protocol is an HTTP-based protocol that operates on URLs. A stream is simply a URL; the protocol defines how to interact with that URL using standard HTTP methods, query parameters, and custom headers. The protocol defines operations to create, append to, read, close, delete, and query metadata for streams. Reads have three modes: catch-up, long-poll, and Server-Sent Events (SSE). The primary operations are: 1. **Create**: Establish a new stream at a URL with optional initial content (PUT) 2. **Append**: Add bytes to the end of an existing stream (POST) 3. **Close**: Transition a stream to closed state, optionally with a final append (POST with `Stream-Closed: true`) 4. **Read**: Retrieve bytes starting from a given offset, with support for catch-up and live modes (GET) 5. **Delete**: Remove a stream (DELETE) 6. **Head**: Query stream metadata without transferring data (HEAD) The protocol does not prescribe a specific URL structure. Servers may organize streams using any URL scheme they choose (e.g., `/v1/stream/{path}`, `/{id}`, or domain-specific paths). The protocol is defined by the HTTP methods, query parameters, and headers applied to any stream URL. Streams support arbitrary content types. The protocol operates at the byte level, leaving message framing and schema interpretation to clients. **Independent Read/Write Implementation**: Servers **MAY** implement the read and write paths independently. For example, a database synchronization server may only implement the read path and use its own injection system for writes, while a collaborative editing service might implement both paths. ## 4. Stream Model A stream is an append-only sequence of bytes with the following properties: - **Durability**: Once written and acknowledged, bytes persist until the stream is deleted or expired - **Immutability by Position**: Bytes at a given offset never change; new data is only appended - **Ordering**: Bytes are strictly ordered by offset - **Content Type**: Each stream has a MIME content type set at creation - **TTL/Expiry**: Streams may have optional time-to-live or absolute expiry times - **Retention**: Servers **MAY** implement retention policies that drop data older than a certain age while the stream continues. If a stream is deleted a new stream **SHOULD NOT** be created at the same URL. - **Stream State**: A stream is either **open** (accepts appends) or **closed** (no further appends permitted). Streams start in the open state and transition to closed via an explicit close operation. This transition is **durable** (persisted) and **monotonic** (once closed, a stream cannot be reopened). Clients track their position in a stream using offsets. Offsets are opaque to clients but are lexicographically sortable, allowing clients to determine ordering and resume from any point. ### 4.1. Stream Closure Stream closure provides an explicit end-of-stream (EOF) signal that allows readers to distinguish between "no data yet" and "no more data ever." This is essential for finite streams where writers need to signal completion, such as: - Proxied HTTP responses that have finished streaming - Completed job outputs or workflow executions - Finalized conversation histories or document streams **Properties of stream closure:** - **Durable**: The closed state is persisted and survives server restarts - **Monotonic**: Once closed, a stream cannot be reopened - **Idempotent**: Closing an already-closed stream succeeds (or returns a stable "already closed" response) - **Observable**: Readers can detect closure and treat it as EOF - **Atomic with final append**: Writers can atomically append a final message and close in a single operation After closure, the stream's data remains fully readable. Only new appends are rejected. **Stream-Closed Header Value:** The `Stream-Closed` header uses the value `true` (case-insensitive) to indicate closure. Servers **MUST** treat the header as present only when its value is exactly `true` (case-insensitive comparison). Other values such as `false`, `yes`, `1`, or empty string **MUST** be treated as if the header were absent. Servers **SHOULD NOT** return error responses for non-`true` values; they simply ignore the header. ## 5. HTTP Operations The protocol defines operations that are applied to a stream URL. The examples in this section use `{stream-url}` to represent any stream URL. Servers may implement any URL structure they choose; the protocol is defined by the HTTP methods, query parameters, and headers. ### 5.1. Create Stream #### Request ``` PUT {stream-url} ``` Where `{stream-url}` is any URL that identifies the stream to be created. Creates a new stream. If the stream already exists at `{stream-url}`, the server **MUST** either: - return `200 OK` if the existing stream's configuration (content type, TTL/expiry, and closure status) matches the request, or - return `409 Conflict` if it does not. This provides idempotent "create or ensure exists" semantics aligned with HTTP PUT expectations. **Closure status matching:** When checking for idempotent success (200 OK), servers **MUST** compare the `Stream-Closed` header in the request against the stream's current closure status. For example: - `PUT /stream` (no `Stream-Closed`) to an **open** stream with matching config → `200 OK` - `PUT /stream` (no `Stream-Closed`) to a **closed** stream → `409 Conflict` (closure status mismatch) - `PUT /stream + Stream-Closed: true` to a **closed** stream with matching config → `200 OK` - `PUT /stream + Stream-Closed: true` to an **open** stream → `409 Conflict` (closure status mismatch) #### Request Headers (Optional) - `Content-Type: ` - Sets the stream's content type. If omitted, the server **MAY** default to `application/octet-stream`. - `Stream-TTL: ` - Sets a relative time-to-live in seconds from creation. The value **MUST** be a non-negative integer in decimal notation without leading zeros, plus signs, decimal points, or scientific notation (e.g., `3600` is valid; `+3600`, `03600`, `3600.0`, and `3.6e3` are not). - `Stream-Expires-At: ` - Sets an absolute expiry time as an RFC 3339 timestamp. - If both `Stream-TTL` and `Stream-Expires-At` are supplied, servers **SHOULD** reject the request with `400 Bad Request`. Implementations **MAY** define a deterministic precedence rule, but **MUST** document it. - `Stream-Closed: true` (optional) - When present, the stream is created in the **closed** state. Any body provided becomes the complete and final content of the stream. - This enables atomic "create and close" semantics for single-message or empty streams that are immediately complete (e.g., cached responses, placeholder errors, pre-computed results). - **Examples:** - `PUT /stream + Stream-Closed: true` (empty body): Creates an empty, immediately-closed stream (useful for "completed with no output" or error placeholders). - `PUT /stream + Stream-Closed: true + body`: Creates a single-shot stream with the body as its complete content (useful for cached responses, pre-computed results). #### Request Body (Optional) - Initial stream bytes. If provided, these bytes form the first content of the stream. #### Response Codes - `201 Created`: Stream created successfully - `200 OK`: Stream already exists with matching configuration (idempotent success) - `409 Conflict`: Stream already exists with different configuration - `400 Bad Request`: Invalid headers or parameters (including conflicting TTL/expiry) - `429 Too Many Requests`: Rate limit exceeded #### Response Headers (on 201 or 200) - `Location: {stream-url}` (on 201): Servers **SHOULD** include a `Location` header equal to `{stream-url}` in `201 Created` responses. - `Content-Type: `: The stream's content type - `Stream-Next-Offset: `: The tail offset after any initial content - `Stream-Closed: true`: Present when the stream was created in the closed state ### 5.2. Append to Stream #### Request ``` POST {stream-url} ``` Where `{stream-url}` is the URL of an existing stream. Appends bytes to the end of an existing stream. Supports both full-body and streaming (chunked) append operations. Optionally closes the stream atomically with the append. Servers that do not support appends for a given stream **SHOULD** return `405 Method Not Allowed` or `501 Not Implemented` to `POST` requests on that URL. #### Request Headers - `Content-Type: ` - **MUST** match the stream's existing content type when a body is provided. Servers **MUST** return `409 Conflict` when the content type is valid but does not match the stream's configured type. - **MAY** be omitted when the request body is empty (i.e., close-only requests with `Stream-Closed: true`). When the request body is empty, servers **MUST NOT** reject based on `Content-Type` and **MAY** ignore it entirely. This ensures close-only requests remain robust even when clients/libraries attach default `Content-Type` headers. - `Transfer-Encoding: chunked` (optional) - Indicates a streaming body. Servers **SHOULD** support HTTP/1.1 chunked encoding and HTTP/2 streaming semantics. - `Stream-Seq: ` (optional) - A monotonic, lexicographic writer sequence number for coordination. - `Stream-Seq` values are opaque strings that **MUST** compare using simple byte-wise lexicographic ordering. Sequence numbers are scoped per authenticated writer identity (or per stream, depending on implementation). Servers **MUST** document the scope they enforce. - If provided and less than or equal to the last appended sequence (as determined by lexicographic comparison), the server **MUST** return `409 Conflict`. Sequence numbers **MUST** be strictly increasing. - `Stream-Closed: true` (optional) - When present with value `true`, the stream is **closed** after the append completes. This is an atomic operation: the body (if any) is appended as the final data, and the stream transitions to the closed state in the same step. - If the request body is empty (Content-Length: 0 or no body), the stream is closed without appending any data. This is the only case where an empty POST body is valid. - Once closed, the stream rejects all subsequent appends with `409 Conflict` (see below). - **Close-only requests are idempotent**: if the stream is already closed and the request includes `Stream-Closed: true` with an empty body, servers **SHOULD** return `204 No Content` with `Stream-Closed: true`. - **Append-and-close requests are NOT idempotent** (without idempotent producer headers): if the stream is already closed and the request includes a body but no idempotent producer headers, servers **MUST** return `409 Conflict` with `Stream-Closed: true`, since the body cannot be appended. However, if idempotent producer semantics apply and the request matches the `(producerId, epoch, seq)` tuple that performed the closing append, servers treat it as a deduplicated success (see Section 5.2.1). #### Request Body - Bytes to append to the stream. Servers **MUST** reject POST requests with an empty body (Content-Length: 0 or no body) with `400 Bad Request`, **unless** the `Stream-Closed: true` header is present (which allows closing without appending data). #### Response Codes - `204 No Content`: Append successful (or stream already closed when closing idempotently) - `400 Bad Request`: Malformed request (invalid header syntax, missing Content-Type, empty body without `Stream-Closed: true`) - `404 Not Found`: Stream does not exist - `405 Method Not Allowed` or `501 Not Implemented`: Append not supported for this stream - `409 Conflict`: Content type mismatch with stream's configured type, sequence regression (if `Stream-Seq` provided), or **stream is closed** (when attempting to append without `Stream-Closed: true`) - `413 Payload Too Large`: Request body exceeds server limits - `429 Too Many Requests`: Rate limit exceeded #### Response Headers (on success) - `Stream-Next-Offset: `: The new tail offset after the append - `Stream-Closed: true`: Present when the stream is now closed (either by this request or previously) #### Response Headers (on 409 Conflict due to closed stream) When a client attempts to append to a closed stream (without `Stream-Closed: true`), servers **MUST** return: - `409 Conflict` status code - `Stream-Closed: true` header - `Stream-Next-Offset: `: The final offset of the closed stream (useful for clients to know the stream's final position) This allows clients to detect and handle the "stream already closed" condition programmatically without parsing the response body. Servers **SHOULD** keep the response body empty or use a standardized error format; clients **SHOULD NOT** rely on parsing the body to determine the reason for rejection. **Error Precedence:** When an append request would trigger multiple conflict conditions (e.g., stream is closed AND content type mismatches), servers **SHOULD** check the stream's closed status first. This ensures clients receive the `Stream-Closed: true` header, enabling correct error handling. The recommended precedence is: 1. Stream closed → `409 Conflict` with `Stream-Closed: true` 2. Content type mismatch → `409 Conflict` 3. Sequence regression → `409 Conflict` ### 5.2.1. Idempotent Producers Durable Streams supports Kafka-style idempotent producers for exactly-once write semantics. This enables fire-and-forget writes with server-side deduplication, eliminating duplicates from client retries. #### Design - **Client-provided producer IDs**: Zero RTT overhead, no handshake required - **Client-declared epochs, server-validated fencing**: Client increments epoch on restart; server validates monotonicity and fences stale epochs - **Per-batch sequence numbers**: Separate from `Stream-Seq`, used for retry safety - **Two-layer sequence design**: - Transport layer: `Producer-Id` + `Producer-Epoch` + `Producer-Seq` (retry safety) - Application layer: `Stream-Seq` (cross-restart ordering, lexicographic) #### Request Headers All three producer headers **MUST** be provided together or none at all. If only some headers are provided, servers **MUST** return `400 Bad Request`. - `Producer-Id: ` - Client-supplied stable identifier (e.g., "order-service-1", UUID) - **MUST** be a non-empty string; empty values result in `400 Bad Request` - Identifies the logical producer across restarts - `Producer-Epoch: ` - Client-declared epoch, starting at 0 - Increment on producer restart to establish a new session - Server validates that epoch is monotonically non-decreasing - **MUST** be a non-negative integer ≤ 2^53-1 (for JavaScript interoperability) - `Producer-Seq: ` - Monotonically increasing sequence number per epoch - Starts at 0 for each new epoch - Applies per-batch (per HTTP request), not per-message - **MUST** be a non-negative integer ≤ 2^53-1 (for JavaScript interoperability) #### Response Headers - `Producer-Epoch: `: Echoed back on success (200/204), or current server epoch on stale epoch (403) - `Producer-Seq: `: On success (200/204), the highest accepted sequence number for this `(stream, producerId, epoch)` tuple. Enables clients to confirm pipelined requests and recover state after crashes. - `Producer-Expected-Seq: `: On 409 Conflict (sequence gap), the expected sequence - `Producer-Received-Seq: `: On 409 Conflict (sequence gap), the received sequence #### Validation Logic ``` # Epoch validation (client-declared, server-validated) if epoch < state.epoch: → 403 Forbidden → Headers: Producer-Epoch: if epoch > state.epoch: if seq != 0: → 400 Bad Request (new epoch must start at seq=0) → Accept: update state.epoch = epoch, state.lastSeq = 0 → 200 OK (new epoch established) # Same epoch: sequence validation if seq <= state.lastSeq: → 204 No Content (duplicate, idempotent success) if seq == state.lastSeq + 1: → Accept, update state.lastSeq = seq → 200 OK if seq > state.lastSeq + 1: → 409 Conflict → Headers: Producer-Expected-Seq: , Producer-Received-Seq: ``` #### Response Codes (with Producer Headers) - `200 OK`: Append successful (new data) - `204 No Content`: Duplicate append (idempotent success, data already exists) - `400 Bad Request`: Invalid producer headers (e.g., non-integer values, epoch increase with seq != 0) - `403 Forbidden`: Stale producer epoch (zombie fencing). Response includes `Producer-Epoch` header with current server epoch. - `409 Conflict`: Sequence gap detected. Response includes `Producer-Expected-Seq` and `Producer-Received-Seq` headers. #### Bootstrap and Restart Flow 1. **Initial start (epoch=0)**: - Producer sends `(epoch=0, seq=0)` - Server accepts, establishes producer state 2. **Producer restart**: - Producer increments local epoch (0 → 1), resets seq to 0 - Sends `(epoch=1, seq=0)` - Server sees epoch > state.epoch, accepts, updates state 3. **Zombie fencing**: - Old producer (zombie) still sending `(epoch=0, seq=N)` gets 403 Forbidden - Response includes `Producer-Epoch: 1` header #### Auto-claim Flow (for ephemeral producers) For serverless or ephemeral producers without persisted epoch: 1. Producer starts fresh with `(epoch=0, seq=0)` 2. If server has `state.epoch=5`, returns 403 with `Producer-Epoch: 5` 3. Client can retry with `(epoch=6, seq=0)` to claim the producer ID This is opt-in client behavior and should be used with caution. #### Concurrency Requirements Servers **MUST** serialize validation + append operations per `(stream, producerId)` pair. HTTP requests can arrive out-of-order; without serialization, seq=1 arriving before seq=0 would cause false sequence gaps. #### Atomicity Requirements For persistent storage, servers **SHOULD** commit producer state updates and log appends atomically (e.g., in a single database transaction). Non-atomic implementations have a crash window where: 1. Data is appended to the log 2. Crash occurs before producer state is updated 3. On recovery, a retry may be re-accepted, causing duplicate data **Recovery for non-atomic stores**: Clients can bump their epoch after a crash to establish a clean session. This trades "exactly once within epoch" for "at least once across crashes" which is acceptable for many use cases. Stores **SHOULD** document their atomicity guarantees clearly. #### Producer State Cleanup Servers **MAY** implement TTL-based cleanup for producer state: - **In-memory stores**: 7 days TTL recommended, clean up on stream access - **Persistent stores**: Retain as long as stream data exists (stronger guarantee) After state expiry, the producer is treated as new. A zombie alive past TTL expiry can write again, which is acceptable for testing but persistent stores should use longer retention. #### Stream Closure with Idempotent Producers Idempotent producers can close streams using the `Stream-Closed: true` header. The behavior is: - **Close with final append**: Include body, producer headers, and `Stream-Closed: true`. The append is deduplicated normally, and the stream closes atomically with the final append. - **Close without append**: Include `Stream-Closed: true` with empty body. Producer headers are optional but if provided, the close operation is still idempotent. - **Duplicate close**: If the stream was already closed by the same `(producerId, epoch, seq)` tuple, servers **SHOULD** return `204 No Content` with `Stream-Closed: true`. When a closed stream receives an append from an idempotent producer: - If the `(producerId, epoch, seq)` matches the request that closed the stream, return `204 No Content` (duplicate/idempotent success) with `Stream-Closed: true` - Otherwise, return `409 Conflict` with `Stream-Closed: true` (stream is closed, no further appends allowed) ### 5.3. Close Stream To close a stream without appending data, send a POST request with `Stream-Closed: true` and an empty body: #### Request ``` POST {stream-url} Stream-Closed: true ``` #### Response Codes - `204 No Content`: Stream closed successfully (or already closed—idempotent) - `404 Not Found`: Stream does not exist - `405 Method Not Allowed` or `501 Not Implemented`: Append/close not supported for this stream #### Response Headers - `Stream-Next-Offset: `: The tail offset (unchanged, since no data was appended) - `Stream-Closed: true`: Confirms the stream is now closed This is the canonical "close-only" operation. For atomic "append final message and close", include a request body as described in Section 5.2. ### 5.4. Delete Stream #### Request ``` DELETE {stream-url} ``` Where `{stream-url}` is the URL of the stream to delete. Deletes the stream and all its data. In-flight reads may terminate with a `404 Not Found` on subsequent requests after deletion. #### Response Codes - `204 No Content`: Stream deleted successfully - `404 Not Found`: Stream does not exist - `405 Method Not Allowed` or `501 Not Implemented`: Delete not supported for this stream ### 5.5. Stream Metadata #### Request ``` HEAD {stream-url} ``` Where `{stream-url}` is the URL of the stream. Checks stream existence and returns metadata without transferring a body. This is the canonical way to find the tail offset, TTL, expiry information, and **closure status**. #### Response Codes - `200 OK`: Stream exists - `404 Not Found`: Stream does not exist - `429 Too Many Requests`: Rate limit exceeded #### Response Headers (on 200) - `Content-Type: `: The stream's content type - `Stream-Next-Offset: `: The tail offset (next offset after the current end) - `Stream-TTL: ` (optional): Remaining time-to-live, if applicable - `Stream-Expires-At: ` (optional): Absolute expiry time, if applicable - `Stream-Closed: true` (optional): Present when the stream has been closed. Absence indicates the stream is still open. - `Cache-Control`: See Section 8 #### Caching Guidance Servers **SHOULD** make `HEAD` responses effectively non-cacheable, for example by returning `Cache-Control: no-store`. Servers **MAY** use `Cache-Control: private, max-age=0, must-revalidate` as an alternative, but `no-store` is recommended to avoid stale tail offsets and closure status. ### 5.6. Read Stream - Catch-up #### Request ``` GET {stream-url}?offset= ``` Where `{stream-url}` is the URL of the stream. Returns bytes starting from the specified offset. This is used for catch-up reads when a client needs to replay stream content from a known position. #### Query Parameters - `offset` (optional) - Start offset token. If omitted, defaults to the stream start (offset -1). #### Response Codes - `200 OK`: Data available (or empty body if offset equals tail) - `400 Bad Request`: Malformed offset or invalid parameters - `404 Not Found`: Stream does not exist - `410 Gone`: Offset is before the earliest retained position (retention/compaction) - `429 Too Many Requests`: Rate limit exceeded For non-live reads without data beyond the requested offset, servers **SHOULD** return `200 OK` with an empty body and `Stream-Next-Offset` equal to the requested offset. If the stream is closed, this response **MUST** also include `Stream-Closed: true` to signal EOF. #### Response Headers (on 200) - `Cache-Control`: Derived from TTL/expiry (see Section 8) - `ETag: {internal_stream_id}:{start_offset}:{end_offset}` - Entity tag for cache validation - `Stream-Cursor: ` (optional for catch-up, required for live modes) - Cursor to echo on subsequent long-poll requests to improve CDN collapsing. Servers **MAY** include this on catch-up reads; it is **required** for live modes when the stream is open (see Sections 5.7, 5.8). Servers **MAY** omit it when `Stream-Closed` is true. Clients **MUST** tolerate its absence when `Stream-Closed` is present. - `Stream-Next-Offset: ` - The next offset to read from (for subsequent requests) - `Stream-Up-To-Date: true` - **MUST** be present and set to `true` when the response includes all data available in the stream at the time the response was generated (i.e., when the requested offset has reached the tail and no more data exists). - **SHOULD NOT** be present when returning partial data due to server-defined chunk size limits (when more data exists beyond what was returned). - Clients **MAY** use this header to determine when they have caught up and can transition to live tailing mode. - **Important:** `Stream-Up-To-Date: true` does **NOT** imply EOF. More data may be appended in the future. Only `Stream-Closed: true` indicates that no more data will ever arrive. - `Stream-Closed: true` - **MUST** be present when the stream is closed **and** the client has reached the final offset **at the time the response is generated**. This includes: - Responses that return the final chunk of data, when the stream is already closed at response generation time, or - Responses with an empty body when the requested offset equals the tail offset of a closed stream (the canonical EOF signal). - When present, clients can conclude that no more data will ever be appended and treat this as EOF. - **SHOULD NOT** be present when returning partial data from a closed stream (when more data exists between the response and the final offset). In this case, `Stream-Closed: true` will be returned on a subsequent request that reaches the final offset. - **Timing note:** If a stream is closed **after** the final chunk was served (or cached), that chunk will not include `Stream-Closed: true`. Clients discover closure by requesting the next offset (`Stream-Next-Offset` from the previous response), which returns an empty body with `Stream-Closed: true`. This is the expected flow when closure occurs between chunk responses or when serving cached chunks. - Clients that need to know closure status before reaching the tail **SHOULD** use `HEAD` (see Section 5.5). #### Response Body - Bytes from the stream starting at the specified offset, up to a server-defined maximum chunk size. ### 5.7. Read Stream - Live (Long-poll) #### Request ``` GET {stream-url}?offset=&live=long-poll[&cursor=] ``` Where `{stream-url}` is the URL of the stream. If no data is available at the specified offset, the server waits up to a timeout for new data to arrive. This enables efficient live tailing without constant polling. #### Query Parameters - `offset` (required) - The offset to read from. **MUST** be provided. - `live=long-poll` (required) - Indicates long-polling mode. - `cursor` (optional) - Echo of the last `Stream-Cursor` header value from a previous response. - Used for collapsing keys in CDN/proxy configurations. #### Response Codes - `200 OK`: Data became available within the timeout - `204 No Content`: Timeout expired with no new data - `400 Bad Request`: Invalid parameters - `404 Not Found`: Stream does not exist - `429 Too Many Requests`: Rate limit exceeded #### Response Headers (on 200) - Same as catch-up reads (Section 5.6), plus: - `Stream-Cursor: `: Servers **MUST** include this header. See Section 8.1. #### Response Headers (on 204) - `Stream-Next-Offset: `: Servers **MUST** include a `Stream-Next-Offset` header indicating the current tail offset. - `Stream-Up-To-Date: true`: Servers **MUST** include this header to indicate the client is caught up with all available data. - `Stream-Cursor: `: Servers **MUST** include this header when the stream is open. Servers **MAY** omit this header when `Stream-Closed` is true (cursor is unnecessary when no further polling is expected). Clients **MUST** tolerate its absence when `Stream-Closed` is present. See Section 8.1. - `Stream-Closed: true`: **MUST** be present when the stream is closed (see Section 5.6 for semantics). A `204 No Content` with `Stream-Closed: true` indicates EOF. **EOF Signaling Across Modes:** Clients should treat **either** of the following as EOF, depending on the mode used: - **Catch-up mode**: `200 OK` with empty body and `Stream-Closed: true` - **Long-poll mode**: `204 No Content` with `Stream-Closed: true` - **SSE mode**: `control` event with `streamClosed: true` In all cases, `Stream-Closed` / `streamClosed` is the definitive EOF signal. The presence of `Stream-Up-To-Date` / `upToDate` alone does **not** indicate EOF—it only means the client has caught up with currently available data, but more may arrive. #### Stream Closure Behavior in Long-poll Mode When the stream is closed and the client is already at the tail offset: - Servers **MUST NOT** wait for the long-poll timeout - Servers **MUST** immediately return `204 No Content` with `Stream-Closed: true` and `Stream-Up-To-Date: true` This ensures clients observing a closed stream do not have hanging connections waiting for data that will never arrive. #### Response Body (on 200) - New bytes that arrived during the long-poll period. #### Timeout Behavior The timeout for long-polling is implementation-defined. Servers **MAY** accept a `timeout` query parameter (in seconds) as a future extension, but this is not required by the base protocol. ### 5.8. Read Stream - Live (SSE) #### Request ``` GET {stream-url}?offset=&live=sse ``` Where `{stream-url}` is the URL of the stream. Returns data as a Server-Sent Events (SSE) stream. SSE mode supports all content types. For streams with `content-type: text/*` or `application/json`, data events carry UTF-8 text directly. For streams with any other `content-type` (binary streams), servers **MUST** automatically base64-encode data events and include the response header `stream-sse-data-encoding: base64`. SSE responses **MUST** use `Content-Type: text/event-stream` in the HTTP response headers. When the stream's configured `content-type` is neither `text/*` nor `application/json`, servers **MUST** include the HTTP response header `stream-sse-data-encoding: base64`. Clients **MUST** check for this header and decode data events accordingly. #### Query Parameters - `offset` (required) - The offset to start reading from. - `live=sse` (required) - Indicates SSE streaming mode. #### Response Codes - `200 OK`: Streaming body (SSE format) - `400 Bad Request`: Invalid parameters - `404 Not Found`: Stream does not exist - `429 Too Many Requests`: Rate limit exceeded #### Response Format Data is emitted in [Server-Sent Events format](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). **Events:** - `data`: Emitted for each batch of data - Each line prefixed with `data:` - For binary streams (where `stream-sse-data-encoding: base64` is present), the `data` event payload represents bytes encoded using standard base64 per [RFC 4648](https://www.rfc-editor.org/rfc/rfc4648) (alphabet: A-Z, a-z, 0-9, +, /). - Servers **MAY** split the base64 text across multiple `data:` lines within the same SSE `data` event. - Clients **MUST** concatenate the `data:` lines for the event (per SSE rules) and **MUST** remove all `\n` and `\r` characters inserted between lines before base64-decoding. - The resulting string (after removing `\n` and `\r`) **MUST** be valid base64 text with length that is a multiple of 4 (or empty). - If a `data` event's byte payload length is 0, the base64 text **MUST** be the empty string. - Base64 encoding affects only `event: data` payloads. `event: control` events remain JSON as specified and are not encoded. - When the stream content type is `application/json`, implementations **MAY** batch multiple logical messages into a single SSE `data` event by streaming a JSON array across multiple `data:` lines, as in the example below. - `control`: Emitted after every data event - **MUST** include `streamNextOffset`. See Section 8.1. - **MUST** include `streamCursor` when the stream is open. Servers **MAY** omit `streamCursor` when `streamClosed` is true (cursor is unnecessary when no reconnection is expected). - **MUST** include `upToDate: true` when the client is caught up with all available data. Note: `streamClosed: true` implies `upToDate: true` (a closed stream at the final offset is by definition up-to-date), so `upToDate` **MAY** be omitted when `streamClosed` is true. - **MUST** include `streamClosed: true` when the stream is closed and all data up to the final offset has been sent. - Format: JSON object with offset, cursor (when applicable), up-to-date status, and optionally closed status. Field names use camelCase: `streamNextOffset`, `streamCursor`, `upToDate`, and `streamClosed`. **Example (normal data):** ``` event: data data: [ data: {"k":"v"}, data: {"k":"w"}, data: ] event: control data: {"streamNextOffset":"123456_789","streamCursor":"abc"} ``` **Example (final data with stream closure):** ``` event: data data: [ data: {"k":"final"} data: ] event: control data: {"streamNextOffset":"123456_999","streamClosed":true} ``` Note: `streamCursor` is omitted when `streamClosed` is true, since clients must not reconnect after receiving a closed signal. **Client Compatibility:** Clients **MUST** tolerate the absence of `streamCursor` (in SSE) and `Stream-Cursor` (in HTTP headers) when `streamClosed` / `Stream-Closed` is present. Implementations that assume cursor is always present will break when processing closed stream responses. #### Stream Closure Behavior in SSE Mode When the stream is closed: - The final `control` event **MUST** include `streamClosed: true` - After emitting the final control event, servers **MUST** close the SSE connection - Clients receiving `streamClosed: true` **MUST NOT** attempt to reconnect, as no more data will arrive If the stream is already closed when an SSE connection is established and the client's offset is at the tail: - Servers **MUST** immediately emit a `control` event with `streamClosed: true` and `upToDate: true` - Servers **MUST** then close the connection **Example (binary stream with automatic base64 encoding):** ``` event: data data: AQIDBAUG data: BwgJCg== event: control data: {"streamNextOffset":"123456_789","streamCursor":"abc"} ``` #### Connection Lifecycle - Server **SHOULD** close connections roughly every ~60 seconds to enable CDN collapsing - Client **MUST** reconnect using the last received `streamNextOffset` value from the control event - Client **MUST NOT** reconnect if the last control event included `streamClosed: true` ## 6. Offsets Offsets are opaque tokens that identify positions within a stream. They have the following properties: 1. **Opaque**: Clients **MUST NOT** interpret offset structure or meaning 2. **Lexicographically Sortable**: For any two valid offsets for the same stream, a lexicographic comparison determines their relative position in the stream. Clients **MAY** compare offsets lexicographically to determine ordering. 3. **Persistent**: Offsets remain valid for the lifetime of the stream (until deletion or expiration) 4. **Unique**: Each offset identifies exactly one position in the stream. No two positions **MAY** share the same offset. 5. **Strictly Increasing**: Offsets assigned to appended data **MUST** be lexicographically greater than all previously assigned offsets. Server implementations **MUST NOT** use schemes (such as raw UTC timestamps) that can produce duplicate or non-monotonic offsets. Time-based identifiers like ULIDs, which combine timestamps with random components to guarantee uniqueness and monotonicity, are acceptable. **Format**: Offset tokens are opaque, case-sensitive strings. Their internal structure is implementation-defined. Offsets are single tokens and **MUST NOT** contain `,`, `&`, `=`, `?`, or `/` (to avoid conflict with URL query parameter syntax). Servers **SHOULD** use URL-safe characters to avoid encoding issues, but clients **MUST** properly URL-encode offset values when including them in query parameters. Servers **SHOULD** keep offsets reasonably short (under 256 characters) since they appear in every request URL. **Sentinel Values**: The protocol defines two special offset sentinel values: - **`-1` (Stream Beginning)**: The special offset value `-1` represents the beginning of the stream. Clients **MAY** use `offset=-1` as an explicit way to request data from the start. This is semantically equivalent to omitting the offset parameter. Servers **MUST** recognize `-1` as a valid offset that returns data from the beginning of the stream. - **`now` (Current Tail Position)**: The special offset value `now` allows clients to skip all existing data and begin reading from the current tail position. This is useful for applications that only care about future data (e.g., presence tracking, live monitoring, late joiners to a conversation). The behavior varies by read mode: **Catch-up mode** (`offset=now` without `live` parameter): - Servers **MUST** return `200 OK` with an empty response body appropriate to the stream's content type: - For `application/json` streams: the body **MUST** be `[]` (empty JSON array), consistent with Section 7.1 - For all other content types: the body **MUST** be 0 bytes (empty) - Servers **MUST** include a `Stream-Next-Offset` header set to the current tail position - Servers **MUST** include `Stream-Up-To-Date: true` header - Servers **SHOULD** return `Cache-Control: no-store` to prevent caching of the tail offset - The response **MUST** contain no data messages, regardless of stream content **Long-poll mode** (`offset=now&live=long-poll`): - Servers **MUST** immediately begin waiting for new data (no initial empty response) - This eliminates a round-trip: clients can subscribe to future data in a single request - If new data arrives during the wait, servers return `200 OK` with the new data - If the timeout expires, servers return `204 No Content` with `Stream-Up-To-Date: true` - The `Stream-Next-Offset` header **MUST** be set to the tail position **SSE mode** (`offset=now&live=sse`): - Servers **MUST** immediately begin the SSE stream from the tail position - The first control event **MUST** include the tail offset in `streamNextOffset` - If no data has arrived, the first control event **MUST** include `upToDate: true` - If data arrives before the first control event, `upToDate` reflects the current state - No historical data is sent; only future data events are streamed **Closed streams** (`offset=now` on a closed stream): - Regardless of the `live` parameter, servers **MUST** return immediately with the closure signal - The response **MUST** include `Stream-Closed: true` and `Stream-Up-To-Date: true` headers - The `Stream-Next-Offset` header **MUST** be set to the stream's final (tail) offset - For catch-up mode: `200 OK` with empty body (or empty JSON array for JSON streams) - For long-poll mode: `204 No Content` (no waiting, immediate return) - For SSE mode: The first (and only) control event includes `streamClosed: true` and `upToDate: true`, then the connection closes - This ensures clients using `offset=now` can immediately discover that a stream has no future data **Reserved Values**: The sentinel values `-1` and `now` are reserved by the protocol. Server implementations **MUST NOT** generate these strings as actual stream offsets (in `Stream-Next-Offset` headers or SSE control events). This ensures clients can always distinguish between sentinel requests and real offset values. The opaque nature of offsets enables important server-side optimizations. For example, offsets may encode chunk file identifiers, allowing catch-up requests to be served directly from object storage without touching the main database. Clients **MUST** use the `Stream-Next-Offset` value returned in responses for subsequent read requests. They **SHOULD** persist offsets locally (e.g., in browser local storage or a database) to enable resumability after disconnection or restart. ## 7. Content Types The protocol supports arbitrary MIME content types. Most content types operate at the byte level, leaving message framing and interpretation to clients. The `application/json` content type has special semantics defined below. **SSE Encoding:** - SSE mode (Section 5.8) supports all content types. For streams with `content-type: text/*` or `application/json`, data events carry UTF-8 text natively. For all other content types, servers automatically base64-encode data events (see Section 5.8). Clients **MAY** use any content type for their streams, including: - `application/json` for JSON mode with message boundary preservation - `application/ndjson` for newline-delimited JSON - `application/x-protobuf` for Protocol Buffer messages - `text/plain` for plain text - Custom types for application-specific formats ### 7.1. JSON Mode Streams created with `Content-Type: application/json` have special semantics for message boundaries and batch operations. #### Message Boundaries For `application/json` streams, servers **MUST** preserve message boundaries. Each POST request stores messages as a distinct unit, and GET responses **MUST** return data as a JSON array containing all messages from the requested offset range. #### Array Flattening for Batch Operations When a POST request body contains a JSON array, servers **MUST** flatten exactly one level of the array, treating each element as a separate message. This enables clients to batch multiple messages in a single HTTP request while preserving individual message semantics. **Examples (direct POST to server):** - POST body `{"event": "created"}` stores one message: `{"event": "created"}` - POST body `[{"event": "a"}, {"event": "b"}]` stores two messages: `{"event": "a"}`, `{"event": "b"}` - POST body `[[1,2], [3,4]]` stores two messages: `[1,2]`, `[3,4]` - POST body `[[[1,2,3]]]` stores one message: `[[1,2,3]]` **Note:** Client libraries **MAY** automatically wrap individual values in arrays for batching. For example, a client calling `append({"x": 1})` might send POST body `[{"x": 1}]` to the server, which flattens it to store one message: `{"x": 1}`. #### Empty Arrays Servers **MUST** reject POST requests containing empty JSON arrays (`[]`) with `400 Bad Request`. Empty arrays in append operations represent no-op operations with no semantic meaning and likely indicate a client bug. PUT requests with an empty array body (`[]`) are valid and create an empty stream. The empty array simply means no initial messages are being written. #### JSON Validation Servers **MUST** validate that appended data is valid JSON. If validation fails, servers **MUST** return `400 Bad Request` with an appropriate error message. #### Response Format GET responses for `application/json` streams **MUST** return `Content-Type: application/json` with a body containing a JSON array of all messages in the requested range: ```http HTTP/1.1 200 OK Content-Type: application/json [{"event":"created"},{"event":"updated"}] ``` If no messages exist in the range, servers **MUST** return an empty JSON array `[]`. ## 8. Caching and Collapsing ### 8.1. Catch-up and Long-poll Reads For **shared, non-user-specific streams**, servers **SHOULD** return: ``` Cache-Control: public, max-age=60, stale-while-revalidate=300 ``` For **streams that may contain user-specific or confidential data**, servers **SHOULD** use `private` instead of `public` and rely on CDN configurations that respect `Authorization` or other cache keys: ``` Cache-Control: private, max-age=60, stale-while-revalidate=300 ``` This enables CDN/proxy caching while allowing stale content to be served during revalidation. **Caching and Stream Closure:** Catch-up chunks remain fully cacheable, including chunks at the tail of the stream. When a chunk is returned, it may or may not be the final chunk—this is unknown until the client requests the next offset. The closure signal is discovered when the client requests the offset **after** the final data: 1. Client reads data and receives `Stream-Next-Offset: X` (the tail offset) 2. Client requests offset `X` 3. If stream is closed: server returns `200 OK` with **empty body** and `Stream-Closed: true` 4. If stream is open: server returns `200 OK` with empty body and `Stream-Up-To-Date: true` (or long-poll/SSE waits for data) This design ensures: - All data chunks are cacheable (a chunk that later becomes "final" was still valid data) - The closure signal is a distinct request/response at the tail offset - Cached chunks never become "stale" due to closure—clients simply make one more request to discover EOF **ETag Usage:** Servers **MUST** generate `ETag` headers for GET responses, except for `offset=now` responses. Clients **MAY** use `If-None-Match` with the `ETag` value on repeat catch-up requests. When a client provides a valid `If-None-Match` header that matches the current ETag, servers **MUST** respond with `304 Not Modified` (with no body) instead of re-sending the same data. This is essential for fast loading and efficient bandwidth usage. **ETag and Stream Closure:** ETags **MUST** vary with the stream's closure status. When a stream is closed (without new data being appended), the ETag **MUST** change to ensure clients do not receive `304 Not Modified` responses that would hide the closure signal. Implementations **SHOULD** include a closure indicator in the ETag format (e.g., appending `:c` to the ETag when the stream is closed). **Query Parameter Ordering:** For optimal cache behavior, clients **SHOULD** order query parameters lexicographically by key name. This ensures consistent URL serialization across implementations and improves CDN cache hit rates. **Collapsing:** Clients **SHOULD** echo the `Stream-Cursor` value as `cursor=` in subsequent long-poll requests. This, along with the appropriate `Cache-Control` header, enables CDNs and proxies to collapse multiple clients waiting for the same data into a single upstream request. **Server-Generated Cursors:** To prevent infinite CDN cache loops (where clients receive the same cached empty response indefinitely), servers **MUST** generate cursors on all live mode responses: - **Long-poll**: `Stream-Cursor` response header - **SSE**: `streamCursor` field in `control` events The cursor mechanism works as follows: 1. **Interval-based Calculation**: Servers divide time into fixed intervals (default: 20 seconds) counted from an epoch (default: October 9, 2024 00:00:00 UTC). The cursor value is the interval number as a decimal string. 2. **Cursor Generation**: For each live response, the server calculates the current interval number and returns it as the cursor value. 3. **Monotonic Progression**: Servers **MUST** ensure cursors never go backwards. When a client provides a `cursor` query parameter that is greater than or equal to the current interval number, the server **MUST** return a cursor strictly greater than the client's cursor (by adding random jitter of 1-3600 seconds). This guarantees monotonic progression and prevents cache cycles. 4. **Client Behavior**: Clients **MUST** include the received cursor value as the `cursor` query parameter in subsequent requests. This creates different cache keys as time progresses, ensuring CDN caches eventually expire. **Example Cursor Flow:** ``` # Client makes initial long-poll request GET /stream?offset=123&live=long-poll # Server returns cursor based on current interval (e.g., interval 1000) < Stream-Cursor: 1000 # Client echoes cursor on next request GET /stream?offset=123&live=long-poll&cursor=1000 # If still in same interval, server adds jitter and returns advanced cursor < Stream-Cursor: 1050 ``` **Long-poll Caching:** CDNs and proxies **SHOULD NOT** cache `204 No Content` responses from long-poll requests in most cases. Long-poll `200 OK` responses are safe to cache when keyed by `offset`, `cursor`, and authentication credentials. ### 8.2. SSE SSE connections **SHOULD** be closed by the server approximately every 60 seconds. This enables new clients to collapse onto edge requests rather than maintaining long-lived connections to origin servers. ## 9. Extensibility The Durable Streams Protocol is designed to be extended for specific use cases and implementations. Extensions **SHOULD** be pure supersets of the base protocol, ensuring compatibility with any client that implements the base protocol. ### 9.1. Protocol Extensions Implementations **MAY** extend the protocol with additional query parameters, headers, or response fields to support domain-specific semantics. For example, a database synchronization implementation might add query parameters to filter by table or schema, or include additional metadata in response headers. Extensions **SHOULD** follow these principles: - **Backward Compatibility**: Extensions **MUST NOT** break base protocol semantics. Clients that do not understand extension parameters or headers **MUST** be able to operate using only base protocol features. - **Pure Superset**: Extensions **SHOULD** be additive only. New parameters and headers **SHOULD** be optional, and servers **SHOULD** provide sensible defaults or fallback behavior when extensions are not used. - **Version Independence**: Extensions **SHOULD** work with any version of a client that implements the base protocol. Extension negotiation **MAY** be handled through headers or query parameters, but base protocol operations **MUST** remain functional without extension support. ### 9.2. Authentication Extensions See Section 10.1 for authentication and authorization details. Implementations **MAY** extend the protocol with authentication-related query parameters or headers (e.g., API keys, OAuth tokens, custom authentication headers). ## 10. Security Considerations ### 10.1. Authentication and Authorization Authentication and authorization are explicitly out of scope for this protocol specification. Clients **SHOULD** implement all standard HTTP authentication primitives (e.g., Basic Authentication [RFC7617], Bearer tokens [RFC6750], Digest Authentication [RFC7616]). Implementations **MUST** provide appropriate access controls to prevent unauthorized stream creation, modification, or deletion, but may do so using any mechanism they choose, including extending the protocol with authentication-related parameters or headers as described in Section 9.2. ### 10.2. Multi-tenant Safety If stream URLs are guessable, servers **MUST** enforce access controls even when using shared caches. Servers **SHOULD** validate and sanitize stream URLs to prevent path traversal attacks and ensure URL components are within acceptable limits. ### 10.3. Untrusted Content Clients **MUST** treat stream contents as untrusted input and **MUST NOT** evaluate or execute stream data without appropriate validation. This is particularly important for append-only streams used as logs, where log injection attacks are a concern. ### 10.4. Content Type Validation Servers **MUST** validate that appended content types match the stream's declared content type to prevent type confusion attacks. ### 10.5. Rate Limiting Servers **SHOULD** implement rate limiting to prevent abuse. The `429 Too Many Requests` response code indicates rate limit exhaustion. ### 10.6. Sequence Validation The optional `Stream-Seq` header provides protection against out-of-order writes in multi-writer scenarios. Servers **MUST** reject sequence regressions to maintain stream integrity. ### 10.7. Browser Security Headers When serving streams to browser clients, servers **SHOULD** include the following headers to prevent MIME-sniffing attacks, cross-origin embedding exploits, and cache-related vulnerabilities: - `X-Content-Type-Options: nosniff` - Servers **SHOULD** include this header on all responses. This prevents browsers from MIME-sniffing the response content and potentially executing it as a different content type (e.g., interpreting binary data as HTML/JavaScript). - `Cross-Origin-Resource-Policy: cross-origin` (or `same-origin`/`same-site`) - Servers **SHOULD** include this header to explicitly control cross-origin embedding. Use `cross-origin` to allow cross-origin access via `fetch()`, `same-site` to restrict to the same registrable domain, or `same-origin` for strict same-origin only. This prevents Cross-Origin Read Blocking (CORB) issues and protects against Spectre-like side-channel attacks. - `Cache-Control: no-store` - Servers **SHOULD** include this header on HEAD responses and on responses containing sensitive or user-specific stream data. This prevents intermediate proxies and CDNs from caching potentially sensitive content. For public, non-sensitive historical reads, servers **MAY** use `Cache-Control: public, max-age=60, stale-while-revalidate=300` as described in Section 8. - `Content-Disposition: attachment` (optional) - Servers **MAY** include this header for `application/octet-stream` responses to prevent inline rendering if a user navigates directly to the stream URL. These headers provide defense-in-depth for scenarios where stream URLs might be accessed outside the intended programmatic fetch context (e.g., direct navigation, malicious cross-origin embedding via `