CIRISServer — benchmarks, interpreted

The CEWP crux use case — stream and store at massive scale — measured and interpreted. Receivers are charged open-only; every blended figure names its model. Raw criterion numbers at the bottom. Absolute µs are host-relative (this run's runner); the ratios and ceilings travel.

PQC group video is effectively free at steady state. the steady-state frame path runs at up to 2.2 GiB/s (AES-256-GCM — quantum-irrelevant at 256-bit); the post-quantum tax is ~83 µs once per peer-link, never per frame.

Video throughput — realtime A/V mesh (two-layer hybrid-PQC)

Per frame: inner AES-256-GCM (E2E epoch DEK) → wire codec → outer AES-256-GCM (per-Link transit key). seal is the sender's cost; open is the receiver's (it never re-seals).

Framesizeseal→wire→openseal (send)open (recv)throughput
Opus voice frame (~20 ms @128 kbps)320 B0.95 µs0.51 µs0.45 µs0.31 GiB/s
720p inter-frame (low motion)4 KiB2.37 µs1.21 µs1.12 µs1.61 GiB/s
720p inter-frame (typical)16 KiB7.07 µs3.26 µs3.08 µs2.16 GiB/s
1080p inter / 720p keyframe64 KiB27.24 µs12.78 µs12.79 µs2.24 GiB/s
1080p keyframe256 KiB112.27 µs56.78 µs48.99 µs2.17 GiB/s

A 50-person room costs ~0.5% of one core to receive

…under a stated model: 720p, 30 fps, GOP=30 (1×64 KiB keyframe + 29×16 KiB inter per second), receiver opens 49 streams. Range across motion: ~0.17% (low-motion 4 KiB) to ~0.45% (typical 16 KiB), receive-only. Publishing one stream to the room is ~0.24% of a core. Crypto is nowhere near the bottleneck — bandwidth and the network are.

Post-quantum overhead — the hybrid handshake (per peer-link, one-time)

The only place the post-quantum cost can live: bulk frames are AES-256-GCM (already quantum-fine), so the PQ cost is structurally confined to the KEM at session setup. Per-frame PQC cost is zero.

handshakefull (initiate+respond)
Hybrid X25519 + ML-KEM-768 (PQ-safe)161.9 µs
Classical X25519 only78.7 µs
ML-KEM-768 tax+83.2 µs, once per peer-link

Mesh fan-out — sender cost per frame (16 KiB)

naive = N× full seal_av_chunk. shared_inner = v3.7.0's seal_av_inner once + seal_av_outer per Link (CIRISEdge#122) — wire-identical, inner AEAD done once.

room (N)naiveshared-innerspeedup
26.51 µs4.79 µs1.36×
825.98 µs14.02 µs1.85×
50162.93 µs78.57 µs2.07×

Membership-change rekey — the churn path (projected)

Projected, not implemented. The substrate does not yet rekey on join/leave (CIRISEdge#129 — EpochDek has no ratchet; epoch rotation is owned out-of-module). These numbers project the intended cost from the real hybrid-KEM key_grant wrap. They are the answer to "is rekey-on-membership-change affordable?" — not a measurement of shipped code.

Each join/leave rewraps the fresh epoch DEK to the member-set. flat = O(N) (the unicast-mesh baseline, #129). tree = O(log N) (the TreeKEM optimization, needs multicast, #66). The outer per-Link key is not re-KEX'd on churn (KEX is one-shot per session).

room (N)flat O(N) / deltatree O(log N) / deltatree win
20.135 ms0.068 ms2.0×
80.543 ms0.203 ms2.7×
503.405 ms0.412 ms8.3×

A 50-room paying the flat baseline spends ~3.41 ms per membership delta — ~10.2% of a single 33 ms frame, or ~0.34% of a core at one join/leave per second. Affordable even unoptimized; the tree (#66) removes it as a concern. Steady-state video is untouched by churn.

Replication speed — CEG-RC5 corpus ingest (the "store" spine)

What a node pays per replicated trace: Ed25519 verify → decompose → persist (5-tuple ON CONFLICT DO NOTHING dedup — not a content-hash lookup).

pathper tracethroughput
new trace (insert — replication intake)230.61 µs4,336 traces/s/core
re-delivery (dedup — anti gossip-loop)205.79 µs4,859 /s/core

The finding: re-delivery saves only ~10.8% over a fresh insert — because verify runs before dedup, a duplicate still pays full Ed25519 verification. So a replay / gossip flood is bounded by verify throughput, not a cheap reject. This is deliberate (verify-before-mutation; reordering dedup ahead of verify is an AV-9 suppression oracle — the dedup key is attacker-controllable). The scale levers are the pre-verified relay path (VerifyMode::TrustPreVerified gated on an Edge verify_outcome) and batch verification (CIRISPersist#225) — not dedup-first.

Assumptions

Raw criterion means (36)
benchmean
av_fanout_plan_5015.405 µs
av_frame_e2e/163847.065 µs
av_frame_e2e/262144112.272 µs
av_frame_e2e/3200.951 µs
av_frame_e2e/40962.372 µs
av_frame_e2e/6553627.241 µs
av_frame_halves/open/163843.080 µs
av_frame_halves/open/26214448.985 µs
av_frame_halves/open/3200.452 µs
av_frame_halves/open/40961.125 µs
av_frame_halves/open/6553612.792 µs
av_frame_halves/open_64KiB12.361 µs
av_frame_halves/seal/163843.258 µs
av_frame_halves/seal/26214456.775 µs
av_frame_halves/seal/3200.514 µs
av_frame_halves/seal/40961.214 µs
av_frame_halves/seal/6553612.776 µs
av_frame_halves/seal_64KiB12.802 µs
av_mesh_fanout/naive/26.507 µs
av_mesh_fanout/naive/50162.927 µs
av_mesh_fanout/naive/825.978 µs
av_mesh_fanout/shared_inner/24.787 µs
av_mesh_fanout/shared_inner/5078.574 µs
av_mesh_fanout/shared_inner/814.020 µs
av_rekey/flat_rewrap/2134.567 µs
av_rekey/flat_rewrap/503405.314 µs
av_rekey/flat_rewrap/8543.160 µs
av_rekey/tree_rewrap/267.523 µs
av_rekey/tree_rewrap/50411.945 µs
av_rekey/tree_rewrap/8203.095 µs
pqc_kex/classical_initiate39.588 µs
pqc_kex/classical_respond39.104 µs
pqc_kex/hybrid_initiate68.210 µs
pqc_kex/hybrid_respond93.655 µs
replication_ingest/ingest_dedup205.788 µs
replication_ingest/ingest_new230.610 µs