The CEWP crux use case — stream and store at massive scale — measured and interpreted. Receivers are charged open-only; every blended figure names its model. Raw criterion numbers at the bottom. Absolute µs are host-relative (this run's runner); the ratios and ceilings travel.
PQC group video is effectively free at steady state. the steady-state frame path runs at up to 2.2 GiB/s (AES-256-GCM — quantum-irrelevant at 256-bit); the post-quantum tax is ~83 µs once per peer-link, never per frame.
Per frame: inner AES-256-GCM (E2E epoch DEK) → wire codec → outer AES-256-GCM (per-Link transit
key). seal is the sender's cost; open is the receiver's (it never re-seals).
| Frame | size | seal→wire→open | seal (send) | open (recv) | throughput |
|---|---|---|---|---|---|
| Opus voice frame (~20 ms @128 kbps) | 320 B | 0.95 µs | 0.51 µs | 0.45 µs | 0.31 GiB/s |
| 720p inter-frame (low motion) | 4 KiB | 2.37 µs | 1.21 µs | 1.12 µs | 1.61 GiB/s |
| 720p inter-frame (typical) | 16 KiB | 7.07 µs | 3.26 µs | 3.08 µs | 2.16 GiB/s |
| 1080p inter / 720p keyframe | 64 KiB | 27.24 µs | 12.78 µs | 12.79 µs | 2.24 GiB/s |
| 1080p keyframe | 256 KiB | 112.27 µs | 56.78 µs | 48.99 µs | 2.17 GiB/s |
A 50-person room costs ~0.5% of one core to receive
…under a stated model: 720p, 30 fps, GOP=30 (1×64 KiB keyframe + 29×16 KiB inter per second), receiver opens 49 streams. Range across motion: ~0.17% (low-motion 4 KiB) to ~0.45% (typical 16 KiB), receive-only. Publishing one stream to the room is ~0.24% of a core. Crypto is nowhere near the bottleneck — bandwidth and the network are.
The only place the post-quantum cost can live: bulk frames are AES-256-GCM (already quantum-fine), so the PQ cost is structurally confined to the KEM at session setup. Per-frame PQC cost is zero.
| handshake | full (initiate+respond) |
|---|---|
| Hybrid X25519 + ML-KEM-768 (PQ-safe) | 161.9 µs |
| Classical X25519 only | 78.7 µs |
| ML-KEM-768 tax | +83.2 µs, once per peer-link |
naive = N× full seal_av_chunk. shared_inner = v3.7.0's
seal_av_inner once + seal_av_outer per Link (CIRISEdge#122) — wire-identical,
inner AEAD done once.
| room (N) | naive | shared-inner | speedup |
|---|---|---|---|
| 2 | 6.51 µs | 4.79 µs | 1.36× |
| 8 | 25.98 µs | 14.02 µs | 1.85× |
| 50 | 162.93 µs | 78.57 µs | 2.07× |
EpochDek has no ratchet; epoch rotation is owned out-of-module). These
numbers project the intended cost from the real hybrid-KEM key_grant wrap. They
are the answer to "is rekey-on-membership-change affordable?" — not a measurement of shipped code.Each join/leave rewraps the fresh epoch DEK to the member-set. flat = O(N) (the
unicast-mesh baseline, #129). tree = O(log N) (the TreeKEM optimization, needs multicast,
#66). The outer per-Link key is not re-KEX'd on churn (KEX is one-shot per session).
| room (N) | flat O(N) / delta | tree O(log N) / delta | tree win |
|---|---|---|---|
| 2 | 0.135 ms | 0.068 ms | 2.0× |
| 8 | 0.543 ms | 0.203 ms | 2.7× |
| 50 | 3.405 ms | 0.412 ms | 8.3× |
A 50-room paying the flat baseline spends ~3.41 ms per membership delta — ~10.2% of a single 33 ms frame, or ~0.34% of a core at one join/leave per second. Affordable even unoptimized; the tree (#66) removes it as a concern. Steady-state video is untouched by churn.
What a node pays per replicated trace: Ed25519 verify → decompose → persist (5-tuple
ON CONFLICT DO NOTHING dedup — not a content-hash lookup).
| path | per trace | throughput |
|---|---|---|
| new trace (insert — replication intake) | 230.61 µs | 4,336 traces/s/core |
| re-delivery (dedup — anti gossip-loop) | 205.79 µs | 4,859 /s/core |
The finding: re-delivery saves only ~10.8% over a fresh
insert — because verify runs before dedup, a duplicate still pays full Ed25519 verification. So a
replay / gossip flood is bounded by verify throughput, not a cheap reject. This is deliberate
(verify-before-mutation; reordering dedup ahead of verify is an AV-9 suppression oracle — the dedup key
is attacker-controllable). The scale levers are the pre-verified relay path
(VerifyMode::TrustPreVerified gated on an Edge verify_outcome) and batch
verification (CIRISPersist#225) — not dedup-first.
| bench | mean |
|---|---|
| av_fanout_plan_50 | 15.405 µs |
| av_frame_e2e/16384 | 7.065 µs |
| av_frame_e2e/262144 | 112.272 µs |
| av_frame_e2e/320 | 0.951 µs |
| av_frame_e2e/4096 | 2.372 µs |
| av_frame_e2e/65536 | 27.241 µs |
| av_frame_halves/open/16384 | 3.080 µs |
| av_frame_halves/open/262144 | 48.985 µs |
| av_frame_halves/open/320 | 0.452 µs |
| av_frame_halves/open/4096 | 1.125 µs |
| av_frame_halves/open/65536 | 12.792 µs |
| av_frame_halves/open_64KiB | 12.361 µs |
| av_frame_halves/seal/16384 | 3.258 µs |
| av_frame_halves/seal/262144 | 56.775 µs |
| av_frame_halves/seal/320 | 0.514 µs |
| av_frame_halves/seal/4096 | 1.214 µs |
| av_frame_halves/seal/65536 | 12.776 µs |
| av_frame_halves/seal_64KiB | 12.802 µs |
| av_mesh_fanout/naive/2 | 6.507 µs |
| av_mesh_fanout/naive/50 | 162.927 µs |
| av_mesh_fanout/naive/8 | 25.978 µs |
| av_mesh_fanout/shared_inner/2 | 4.787 µs |
| av_mesh_fanout/shared_inner/50 | 78.574 µs |
| av_mesh_fanout/shared_inner/8 | 14.020 µs |
| av_rekey/flat_rewrap/2 | 134.567 µs |
| av_rekey/flat_rewrap/50 | 3405.314 µs |
| av_rekey/flat_rewrap/8 | 543.160 µs |
| av_rekey/tree_rewrap/2 | 67.523 µs |
| av_rekey/tree_rewrap/50 | 411.945 µs |
| av_rekey/tree_rewrap/8 | 203.095 µs |
| pqc_kex/classical_initiate | 39.588 µs |
| pqc_kex/classical_respond | 39.104 µs |
| pqc_kex/hybrid_initiate | 68.210 µs |
| pqc_kex/hybrid_respond | 93.655 µs |
| replication_ingest/ingest_dedup | 205.788 µs |
| replication_ingest/ingest_new | 230.610 µs |