Scalability on Mir

Predicate Labs · 2021-01-25

TLDR: Mir uses zero-knowledge proofs to scale in two ways. First, the application layer can be abstracted away from validators. This simplifies transaction execution so throughput isn't constrained by bandwidth, storage, I/O, or compute costs.

Second, throughput scales with each new node that joins the network. This creates a flywheel effect:¹ increased transaction (and fee) volume incentivizes more nodes to join the network, increasing throughput and allowing applications to scale further.

Mir's approach to scalability

Mir uses zero-knowledge proofs to radically simplify transaction execution, improving throughput and scalability² without sacrificing decentralization, security, or usability.

On Mir, the application layer is abstracted away from validation. This makes the current state 1000x smaller for validators,³ significantly reduces transaction size, and makes transaction execution orders of magnitude less expensive (and there's no gas!).⁴

Recursive proofs allow the complex part of a transaction on Mir (verifying a transaction proof and accessing state) to be verified by just a single validator. Mir scales with every node that joins the network, without sacrificing security, decentralization, or usability.

Execution on existing chains

Let's look at how transactions work on every existing blockchain that supports smart contracts. Developers build applications that run as bytecode on a virtual machine, like the EVM. Validators execute transactions by running the bytecode associated with an application in the virtual machine.

Transactions in an Ethereum block

This model limits privacy and presents bandwidth, storage, and compute costs. Accessing and updating application state becomes a bottleneck for validators. Validators must re-execute complex computations directly, limiting application complexity and increasing cost.

Simplifying execution

On Mir, all application logic is executed off-chain. For validators, there's no notion of tokens, application logic or state storage, or even a standardized signature scheme.⁵ Transactions simply create and consume generic records⁶ and are verified with proofs.

This is preferable for developers, who can use any programming language or virtual machine to perform off-chain computations, without worrying about gas. Gas-cost optimization is typically the most difficult and time-consuming part of building a decentralized application.

A transaction on Mir

Mir's execution environment supports the range of applications running on Ethereum, but verifying complex computations is abstracted away in the ZKP. Transactions on Mir are maximally simple: they can perform two operations, adding and removing a record.

On Mir, validators don't store application state, just a single bit for every record, which we can compress further to ~5 bits per active record (we call this structure the Liveness Mask), which is extremely compact and fits in memory, even for tens of billions of active records.⁷ Applying transactions to the Liveness Mask is incredibly cheap and consists of flipping a bit for each consumed record.

Executing transactions in a block on Mir

Scaling with recursion

Even though SNARKs are succinctly verifiable, they are much more expensive to verify than most transactions on Ethereum. This is a major drawback for systems like ZEXE, because it means that throughput is even lower than on Ethereum.

Fortunately, Mir uses recursive proofs to parallelize transaction proof verification. The expensive part of verifying a transaction (verifying a proof and accessing state) is performed by just a single node generating a recursive proof (more detail). The state transitions that every validator must perform to update the Commitment Set are extremely cheap, so throughput isn't constrained by compute cost.⁸

Validating a block on Mir

Mir is also bandwidth-efficient. Nodes only need to download an estimated average of ~10 bytes per transaction, because the set of pointers to consumed records and created record commitments is compact.⁹ Therefore, throughput on Mir isn't limited by bandwidth, storage, memory, or disk I/O.

Throughput is limited by the aggregate compute available to generate recursive proofs. Fortunately, compute is cheap and we've made recursive proof generation efficient.¹⁰ This gives Mir a flywheel mechanism for scaling throughput: increased fee volume attracts more compute, which allows applications to scale and increase transaction volume, all without sacrificing decentralization, security, or usability.

Tradeoffs

What's the catch? Mir applications run in an expressive execution environment, but applications must be compiled to arithmetic circuits. This presents negligible overhead for most applications. For complex applications, proving can be delegated to powerful proving nodes.

More importantly, this prevents developers from using existing tooling developed for Ethereum. Fortunately, developer-friendly tools for building circuits already exist, and improving developer tools and programming languages is a problem that is well understood. We're hard at work on developer tooling for Mir that will make this even easier and more efficient.

We argue that Mir is actually more developer-friendly than other chains. In other scalable blockchain designs, such as Ethereum 2.0, the sharding model leaks into the developer experience via mechanisms like contract yanking. Mir avoids this complexity by fully replicating record state (in a highly compact form). Additionally, developers on Mir never need to consider gas costs, which has traditionally been the most time-consuming part of building a decentralized app.

Forward.

TCP/IP provided the foundation for the Internet revolution because it created a feedback loop between bandwidth providers and application developers, and abstracted the application layer from providers. As application usage grew, bandwidth providers were able to bootstrap increased capacity, allowing applications to scale even further and repeating the cycle.

Mir is similar. It's a minimal protocol that separates application execution from validation, radically improving scalability. Recursive proofs allow throughput on Mir to scale with each new node. Increased transaction volume incentivizes new validators to join the network, paving the way for a new generation of scalable decentralized applications.

Eventually, re-executing bytecode on a blockchain to run smart contracts will be considered archaic. Mir offers a better future: developers can build Internet-scale (and private!) applications without fighting for blockspace.

Notes

This framing is influenced by Ali Yahya's writing on thin protocols and the flywheel effect in TCP/IP.

We follow these definitions, where scalability refers to the number of transactions that a node can process, and throughput refers to the number of transactions that can be processed by the network as a whole.

This is based on a comparison with Ethereum, where Etherscan reports 130gb using Geth, with ~50gb in the current state, and ~80 million addresses, yielding 625 bytes/active account, vs ~5 bits/active account for Mir.

⁴

As we'll see, compute costs are negligible for updates to the Liveness Mask, current state fits in memory with tens of billions of active records, and bandwidth costs will average ~10 bytes/transaction, so even with 1m tps, bandwidth costs will be negligible.

⁵

Verifying consensus obviously requires tokens, but is handled separately from transaction validation.

⁶

A record is the base unit of state on Mir, a commitment to an application's state and logic. It can be thought of as a snapshot of an account's state on Ethereum.

⁷

The 1000x improvement in state size is based on a run-length encoding of the sparse bit vector. This reduces the amount of state that a new validator will need to download to sync, but in practice, it might be more efficient for a validator to use the decoded form of the Liveness Mask in certain cases or for certain subtrees, slightly reducing the space savings.

⁸

For Mir, or any system that uses arithmetic hash functions, hashing can be a bottleneck in the system. But hashing is only necessary to check that the sets of consumed and created records are consistent with the public input in the recursive proof, and it's fairly straightforward to design the system so that each node only needs to check a subset of each set.

⁹

Transaction size increases slightly to ensure data availability for newly-created records (32 bytes/created record commitment). However, not every validator needs to download every newly-created record.

¹⁰

We've made significant improvements on recursive proof efficiency, both architecturally and in the cryptographic primitives that we're using. We're currently working on speeding up recursive proofs by as much as 5x, optimistically targeting < 1s proof times.