edge

console

Blog

Docs

Developer Update #2

Welcome back to Hyve protocol updates. In Update #1, we outlined the architecture and introduced Hadamard ZODA erasure coding. Now we cover the write path, how blobs move from client submission to distributed storage, and how the system is secure.

Technology

Mar 10, 2026

Welcome back to Hyve protocol updates. In [Update #1](https://hyve.com/blog/update-1) we introduced the overall architecture and previewed our Hadamard ZODA-style erasure coding scheme. We promised a deeper dive into the write path, how blobs move from client submission to distributed storage, and how we secure the whole thing.

This is that dive. We'll walk through the encoding pipeline, how composable encoding removes the single-encoder bottleneck, how the protocol stays secure when nodes misbehave, and why the math backs it up. Benchmarks are still being finalized and will land in Update #3 alongside the read path and recovery flow.

Quick Recap: Why Encoding Matters

Most DA systems require a single encoder—a block proposer or sequencer—to collect *all* blobs, batch-encode them, and produce validity proofs in one shot. This creates a centralization chokepoint: total bandwidth to feed the encoder grows linearly with node count, even though storage per node decreases. We call this the bandwidth-storage paradox.

Our solution: composable erasure coding. The foundation is the [ZODA](https://eprint.iacr.org/2025/034) construction by Bain Capital, which builds erasure codes where sampled rows are self-verifying—no KZG, no trusted setup. Our implementation builds on the open-source [Commonware](https://github.com/commonwarexyz/monorepo) libraries for field arithmetic, NTT, and Merkle primitives. What we've added on top is composability: every node encodes its own blobs independently, and the network assembles a verifiable batch from the pieces after the fact. No coordinator, no mempool, no sequencer bottleneck.

The Write Path

When a client submits a blob, the entire encoding pipeline runs locally on the receiving node. No network round-trips, no waiting on other blobs.

Bytes → Field Elements → Matrix. Raw bytes are packed into elements of the Goldilocks prime field (`p = 2^64 − 2^32 + 1`), chosen because its structure admits fast NTT (roots of unity for any power-of-two domain) and its arithmetic maps to native 64-bit CPU instructions. The field elements are arranged into a matrix and transposed so each column defines a polynomial in coefficient form. The row count is fixed by network parameters (node count and coding rate); the column count absorbs the blob size. This means **every blob, regardless of size, produces polynomials over the same evaluation domain**—the structural property that makes composition possible.

**Reed-Solomon Encoding via NTT.** Each column polynomial of degree `< data_rows` is evaluated at `encoded_rows` points via a forward Number Theoretic Transform. We get more evaluation points than coefficients, so any sufficient subset can reconstruct the original. The NTT runs in-place in `O(n log n)` and parallelizes across columns with Rayon.

**Commitment.** Each row of the encoded matrix is hashed into a Merkle Mountain Range (MMR). A Fiat-Shamir transcript binds the blob's identity: `σ = Transcript(namespace ‖ blob_size ‖ mmr_root)`. This commitment is deterministic and computed *before* any randomness is derived—critical for security.

**Checksum.** Using `σ` as a PRG seed, we derive a random checking matrix `R` of shape `(cols × t)` where `t` is a small number of column samples (tuned for 126-bit security). The checksum `C = M · R` is a compact fingerprint of the full data matrix. Because `R` is derived from the commitment (which is derived from the encoded data), an adversary who wants to cheat must produce encoded data that commits to an `R` under which their corrupted shard still looks valid—the Fiat-Shamir binding prevents this.

**Shard Extraction.** The encoded matrix is sliced into `N` shards (one per node), each packaged with an MMR multi-inclusion proof. The encoding node gossips each shard to its designated storage node. Total outbound bandwidth is proportional to the blob size—*not* to the number of nodes.

Composition: The Part That Changes Everything

After individual encoding, each storage node holds shard `k` from potentially many different blobs. Composition is local and incremental:

- Shards are horizontally concatenated. Blob A's shard has 5 columns, blob B's has 8—the composed shard has 13 columns. Row count stays the same because every blob shares the same NTT domain.

- Checksums are added elementwise over the field, which works because checksums are linear:

`C_combined = M_A · R_A + M_B · R_B`.

- Checking matrices are vertically stacked. When multiplied by the concatenated shard, the block structure decomposes exactly into the sum of individual checksum checks.

The result: a verifier checks a composed shard from hundreds of blobs with a single matrix multiply**, and the result is mathematically equivalent to checking each blob individually. No node ever needed to see all the blobs. No coordinator ran a batch encoding pass.

Security Under Byzantine Conditions

A decentralized encoding scheme must hold up when nodes lie. Hyve catches cheating encoders through three independent layers:

Merkle Inclusion.** Every shard carries an MMR multi-inclusion proof. A verifier hashes each row and checks against the committed root. Swapping a single row breaks the proof. Soundness is bounded by the collision resistance of SHA-256 (128-bit security).

**Checksum Verification.** An adversary could commit to a *correctly structured* Merkle tree over an incorrectly encoded matrix. The checksum catches this: the verifier derives `R` from the public commitment, computes `shard · R`, and compares against the checksum evaluated at the shard's row indices. If the shard contains any error `Δ`, the check passes only if `Δ · R = 0`. Since `R` is random and derived *after* the adversary committed (Fiat-Shamir), the probability of any non-zero error escaping is at most `|F_p|^{-t}` per row—with `t ≥ 2` over Goldilocks, that's `2^{-126}`.

**Sampling Security.** Each shard contains `s` samples from the evaluation domain. The parameter `s` is chosen so the probability of an adversary hiding an inconsistency across all samples is at most `2^{-126}`.

**Composed Security.** When blobs are composed, each blob's checking matrix is independently derived from its own commitment, so forgery events across blobs are independent. A union bound gives overall soundness at most `B · ε` for `B` blobs where `ε < 2^{-125}`—astronomically small for any practical batch size.

The Full Picture

Client submits blob

│

▼

[Receiving Node — all local, no coordination]

│

├─ Pack bytes → Goldilocks field elements

├─ Matrix layout → Forward NTT (Reed-Solomon)

├─ MMR commitment → Fiat-Shamir transcript → σ

├─ Derive R from σ → Checksum C = M · R

└─ Slice into N shards + Merkle proofs → Gossip

│

▼

[Each Storage Node k — incremental, as blobs arrive]

│

├─ hstack shard onto composed shard

├─ Add checksum to composed checksum

└─ vstack checking matrix → Ready for verification

The write path scales horizontally: adding nodes increases total encoding throughput without increasing per-node bandwidth. Storage per node decreases as `1/N`, bandwidth stays constant. No proposer, no batching window, no single point of failure—126 bits of computational soundness, no trusted setup.

Acknowledgments

The encoding scheme described here builds on the foundational work of **Bain Capital**, whose [ZODA paper](https://eprint.iacr.org/2025/034) introduced zero-overhead data availability via self-verifying erasure codes, and on the open-source **[Commonware](https://github.com/commonwarexyz/monorepo)** libraries. They gave us the building blocks; composability is what we built with them.

What's Next

In **Update #3** we'll cover the read side: recovery from shard subsets, selective blob reconstruction in composed batches, and encoding/recovery benchmarks. We'll also dive into our Symbiotic integration and how staking and slashing tie into the data availability guarantees described here.

Questions? Drop them in our community channels—we'll address them in the next update.

Disclaimer:

This content is provided for informational and educational purposes only and does not constitute legal, business, investment, financial, or tax advice. You should consult your own advisers regarding those matters.

References to any protocols, projects, or digital assets are for illustrative purposes only and do not represent any recommendation or offer to buy, sell, or participate in any activity involving digital assets or financial products. This material should not be relied upon as the basis for any investment or network participation decision.

Hyve and its contributors make no representations or warranties, express or implied, regarding the accuracy, completeness, or reliability of the information provided. Digital assets and decentralized networks operate within evolving legal and regulatory environments; such risks are not addressed in this content.

All views and opinions expressed are those of the authors as of the date of publication and are subject to change without notice.

Technology

Jul 16, 2025

How does Hyve enable the CLOBs of the future?

“How does a CLOB exist on-chain without sacrificing speed, security or integrity, if a proper data layer that allows for affordable long-term storage, low latency, high throughput and top-tier security doesn’t yet exist?” Many ask themselves.

View All

edge

console

Blog

Docs

edge

console

Blog

Docs

Developer Update #2

Quick Recap: Why Encoding Matters

The Write Path

Composition: The Part That Changes Everything

Security Under Byzantine Conditions

The Full Picture

Acknowledgments

What's Next

Related Posts

How does Hyve enable the CLOBs of the future?