OSS HTTP Streams Benchmark

A uniform HTTP benchmark for Ursula, Durable Streams, and S2 Lite S3. It covers bulk append/read, small-event append, and SSE live-tail delivery.

Benchmark setup

Workload

Bulk append

8 KiB records, 100 sequential latency samples, then a 15-second write throughput phase.

Catch-up read

512 KiB records, read after data is visible; throughput validates status and response length.

Small events

1 KiB append workload reported as request/s, not MiB/s.

SSE live tail

512 B text events with open SSE subscriptions; reported as append-to-delivery latency. Ursula uses three subscribers across replicas.

Environment

RoleInstance typevCPUMemoryNetwork
Service instancec5.xlarge48 GiBUp to 10 Gbit/s
Load generatorc6in.4xlarge1632 GiBUp to 50 Gbit/s

Server configuration

Ursula

  • 3 EC2 nodes
  • Writes go to the leader
  • Reads are distributed across replicas
  • Local log/store, memory hot payload backend, S3 cold-storage config

Durable Streams

  • Single EC2 process
  • Single-node reads and writes
  • File-backed data directory

S2 Lite S3

  • Single EC2 process
  • Single-node S2 Lite API
  • S3 object-store backend

Baseline comparison at c256

MetricUrsula vs S2 Lite S3
Write throughput1.5x higher
Read throughput3.7x higher
Small-event append1.2x higher req/s
Append p99 latency1.8x lower
SSE live-tail p992.1x lower
Client-observed errors0 vs 0
MetricUrsula vs Durable Streams
Write throughput10.3x higher
Read throughput19.8x higher
Small-event append17.2x higher req/s
Append p99 latency7.6x higher (worse)
SSE live-tail p993.8x higher (worse)
Client-observed errors0 vs 89

Availability posture

TargetServing availability estimateAcked-write durabilityFailure posture
Ursula~99.99%RPO 0 while a Raft majority survives; writes are committed through the replicated quorum path.Tolerates one service instance or availability-zone loss when two of three voters remain reachable; leader loss causes a short election and retry window.
S2 Lite S3~99.9%RPO 0 for acknowledged writes with object storage enabled; writes are durable in object storage before acknowledgment.The benchmarked Lite process is a single serving point. Instance loss stops service until restart or replacement, but acknowledged data can be recovered from object storage.
Durable Streams~99.9%Durability depends on the local file-backed store. The official file store documents a crash-atomicity caveat for producer state recovery.The benchmarked server is one process with local state. Process restart can recover local data, while instance or AZ loss depends on volume recovery, backup, or external replication.

Trend

UrsulaDurable StreamsS2 Lite S3
Write throughputMiB/shigher is better0.05.511162248163264128256Ursula c4: 1.35 MiB/sUrsula c8: 2.27 MiB/sUrsula c16: 3.66 MiB/sUrsula c32: 6.34 MiB/sUrsula c64: 9.08 MiB/sUrsula c128: 14.73 MiB/sUrsula c256: 19.59 MiB/sDurable Streams c4: 1.80 MiB/sDurable Streams c8: 1.73 MiB/sDurable Streams c16: 1.74 MiB/sDurable Streams c32: 1.75 MiB/sDurable Streams c64: 1.74 MiB/sDurable Streams c128: 1.89 MiB/sDurable Streams c256: 1.90 MiB/sS2 Lite S3 c4: 0.52 MiB/sS2 Lite S3 c8: 1.04 MiB/sS2 Lite S3 c16: 2.00 MiB/sS2 Lite S3 c32: 3.45 MiB/sS2 Lite S3 c64: 3.50 MiB/sS2 Lite S3 c128: 6.27 MiB/sS2 Lite S3 c256: 12.89 MiB/sRead throughputMiB/shigher is better0.09862k3k4k48163264128256Ursula c4: 1,871.03 MiB/sUrsula c8: 3,046.62 MiB/sUrsula c16: 3,521.28 MiB/sUrsula c32: 3,510.55 MiB/sUrsula c64: 3,456.31 MiB/sUrsula c128: 3,442.20 MiB/sUrsula c256: 3,309.93 MiB/sDurable Streams c4: 171.15 MiB/sDurable Streams c8: 169.95 MiB/sDurable Streams c16: 174.76 MiB/sDurable Streams c32: 178.15 MiB/sDurable Streams c64: 169.95 MiB/sDurable Streams c128: 171.44 MiB/sDurable Streams c256: 167.35 MiB/sS2 Lite S3 c4: 770.72 MiB/sS2 Lite S3 c8: 862.31 MiB/sS2 Lite S3 c16: 909.06 MiB/sS2 Lite S3 c32: 941.92 MiB/sS2 Lite S3 c64: 913.12 MiB/sS2 Lite S3 c128: 949.27 MiB/sS2 Lite S3 c256: 891.72 MiB/sSmall event appendreq/shigher is better0.01k2k4k5k48163264128256Ursula c4: 203 req/sUrsula c8: 382 req/sUrsula c16: 705 req/sUrsula c32: 1,199 req/sUrsula c64: 2,177 req/sUrsula c128: 3,489 req/sUrsula c256: 4,294 req/sDurable Streams c4: 233 req/sDurable Streams c8: 216 req/sDurable Streams c16: 224 req/sDurable Streams c32: 223 req/sDurable Streams c64: 225 req/sDurable Streams c128: 260 req/sDurable Streams c256: 250 req/sS2 Lite S3 c4: 72 req/sS2 Lite S3 c8: 135 req/sS2 Lite S3 c16: 275 req/sS2 Lite S3 c32: 455 req/sS2 Lite S3 c64: 302 req/sS2 Lite S3 c128: 1,826 req/sS2 Lite S3 c256: 3,617 req/sAppend p99 latencymslower is better0.0367210814448163264128256Ursula c4: 54.24 msUrsula c8: 58.12 msUrsula c16: 59.11 msUrsula c32: 52.78 msUrsula c64: 54.59 msUrsula c128: 58.68 msUrsula c256: 55.66 msDurable Streams c4: 6.84 msDurable Streams c8: 6.57 msDurable Streams c16: 7.80 msDurable Streams c32: 6.81 msDurable Streams c64: 6.59 msDurable Streams c128: 6.73 msDurable Streams c256: 7.35 msS2 Lite S3 c4: 128.95 msS2 Lite S3 c8: 107.89 msS2 Lite S3 c16: 114.85 msS2 Lite S3 c32: 107.30 msS2 Lite S3 c64: 118.43 msS2 Lite S3 c128: 114.09 msS2 Lite S3 c256: 102.02 msRead p99 latencymslower is better0.02.65.27.81048163264128256Ursula c4: 0.58 msUrsula c8: 0.56 msUrsula c16: 0.60 msUrsula c32: 0.51 msUrsula c64: 0.60 msUrsula c128: 0.70 msUrsula c256: 0.56 msDurable Streams c4: 7.83 msDurable Streams c8: 8.19 msDurable Streams c16: 8.14 msDurable Streams c32: 8.11 msDurable Streams c64: 7.98 msDurable Streams c128: 9.29 msDurable Streams c256: 8.37 msS2 Lite S3 c4: 0.67 msS2 Lite S3 c8: 0.78 msS2 Lite S3 c16: 0.78 msS2 Lite S3 c32: 0.95 msS2 Lite S3 c64: 0.76 msS2 Lite S3 c128: 0.91 msS2 Lite S3 c256: 0.70 msSSE live-tail p99mslower is better0.05310515821048163264128256Ursula c4: 40.83 msUrsula c8: 41.80 msUrsula c16: 40.62 msUrsula c32: 44.23 msUrsula c64: 40.29 msUrsula c128: 21.72 msUrsula c256: 41.18 msDurable Streams c4: 12.60 msDurable Streams c8: 10.61 msDurable Streams c16: 10.67 msDurable Streams c32: 10.92 msDurable Streams c64: 10.74 msDurable Streams c128: 10.42 msDurable Streams c256: 10.71 msS2 Lite S3 c4: 100.49 msS2 Lite S3 c8: 187.74 msS2 Lite S3 c16: 114.29 msS2 Lite S3 c32: 146.38 msS2 Lite S3 c64: 97.84 msS2 Lite S3 c128: 98.50 msS2 Lite S3 c256: 87.69 ms

c256 results

TargetWrite MiB/sRead MiB/sSmall-event req/sSSE p99 msErrors
Ursula19.593,309.934,29441.180
Durable Streams1.90167.3525010.7189
S2 Lite S312.89891.723,61787.690

Full results

ConcurrencyTargetAppend p99 msRead p99 msWrite MiB/sRead MiB/sSmall-event req/sSSE p99 msErrors
4Ursula54.240.581.351,871.0320340.830
4Durable Streams6.847.831.80171.1523312.600
4S2 Lite S3128.950.670.52770.7272100.490
8Ursula58.120.562.273,046.6238241.800
8Durable Streams6.578.191.73169.9521610.610
8S2 Lite S3107.890.781.04862.31135187.740
16Ursula59.110.603.663,521.2870540.620
16Durable Streams7.808.141.74174.7622410.670
16S2 Lite S3114.850.782.00909.06275114.290
32Ursula52.780.516.343,510.551,19944.230
32Durable Streams6.818.111.75178.1522310.920
32S2 Lite S3107.300.953.45941.92455146.380
64Ursula54.590.609.083,456.312,17740.290
64Durable Streams6.597.981.74169.9522510.740
64S2 Lite S3118.430.763.50913.1230297.840
128Ursula58.680.7014.733,442.203,48921.720
128Durable Streams6.739.291.89171.4426010.420
128S2 Lite S3114.090.916.27949.271,82698.500
256Ursula55.660.5619.593,309.934,29441.180
256Durable Streams7.358.371.90167.3525010.7189
256S2 Lite S3102.020.7012.89891.723,61787.690

Reproduce

Use crates/perf-compare as the load generator. Run it from a separate EC2 instance in the same private network as the tested services, after starting each target with the server configuration above.

cargo run --release -p perf-compare --bin perf_compare -- \
  --ursula http://URSULA_LEADER:4437 \
  --ursula-read-bases http://URSULA_NODE_1:4437,http://URSULA_NODE_2:4437,http://URSULA_NODE_3:4437 \
  --durable http://DURABLE_STREAMS:8080 \
  --s2 http://S2_LITE:3000 \
  --payload-bytes 8192 \
  --small-payload-bytes 1024 \
  --read-payload-bytes 524288 \
  --sse-payload-bytes 512 \
  --latency-count 100 \
  --throughput-secs 15 \
  --sse-count 100 \
  --sse-readers 3 \
  --concurrency 256

Repeat the same command with --concurrency set to 4, 8, 16, 32, 64, 128, and 256. The benchmark emits JSON containing append latency, read latency, write throughput, read throughput, small-event request/s, SSE delivery latency, and client-observed errors for each target.