Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc and Georgios Konstantopoulos.
Initially, there were two scaling strategies in the roadmap of Ethereum. One (see an early paper in 2015) is ‘Sharding’: each Node only needs to validate and store a small portion of the transactions, rather than validating and storing all transactions in the chain. Any other peer-to-peer network (such as BitTorrent) also works this way, so we can certainly make the blockchain work in the same way. The other is Layer2 protocol: these networks will be built on top of Ethereum, allowing it to fully benefit from its security while keeping most of the data and computation off-chain. Layer2 protocols refer to state channels in 2015, Plasma in 2017, and then Rollup in 2019. Rollup is more powerful than state channels or Plasma, but they require a large amount of on-chain data bandwidth. Fortunately, by 2019, Sharding research had already solved the problem of validating ‘data availability’ at a large scale. As a result, the two paths merged together, and we obtained a roadmap centered around Rollup, which is still the scaling strategy of Ethereum today.
The Surge, 2023 Roadmap Edition
The Rollup-centric roadmap proposes a simple division of labor: ETHereum L1 focuses on being a robust and Decentralization-oriented base layer, while L2 takes on the task of helping the ecosystem expand. This model is ubiquitous in society: the existence of the court system (L1) is not for the pursuit of ultra-high speed and efficiency, but to protect contracts and property rights, while entrepreneurs (L2) are to build on this solid base layer and lead humanity towards Mars, whether in a literal or metaphorical sense.
This year, the roadmap centered on Rollup has achieved significant milestones: with the launch of EIP-4844 blobs, the data bandwidth of Ethereum L1 has increased significantly, and multiple Ethereum Virtual Machine (EVM) Rollups have entered the first stage. Each L2 exists as a ‘Sharding’ with its own internal rules and logic, and the diversity and diversification of Sharding implementation have now become a reality. However, as we can see, this path also faces some unique challenges. Therefore, our current task is to complete the roadmap centered on Rollup, address these issues, while maintaining the robustness and decentralization unique to Ethereum L1.
The Surge: Key Objectives
In the future, Ethereum can achieve more than 100,000 TPS through L2.
Maintain the Decentralization and robustness of L1;
At least some L2 fully inherit Ethereum’s core properties (Trustless, Open, Anti-censorship);
4, the ETH community should feel like a unified ecosystem, not 34 different blockchains.
Contents of this chapter
Scalability Trilemma
Further Progress in Data Availability Sampling
Data Compression
4.Generalized Plasma
Mature L2 proof system
Improved L2 interoperability
Extend execution on L1
Scalability Trilemma
The scalability trilemma is an idea proposed in 2017 that suggests a contradiction between the three characteristics of blockchain: Decentralization (specifically, low cost of running Nodes), scalability (handling a large number of transactions), and security (attackers need to destroy a large portion of the network’s Nodes to make a single transaction fail).
It is worth noting that the triangle paradox is not a theorem, and the post introducing the triangle paradox does not come with a mathematical proof. It does present a heuristic mathematical argument: if a Decentralization-friendly Node (e.g., a consumer-grade laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then (i) each transaction can only be seen by 1/k nodes, which means that an attacker can carry out a malicious transaction by attacking a small number of nodes, or (ii) your node will become powerful, but your chain will not be Decentralized. The purpose of this article is not to prove that breaking the triangle paradox is impossible; on the contrary, it aims to show that breaking the trilemma is difficult, and it requires to some extent jumping out of the thinking framework implied by the argument.
For many years, some high-performance chains have often claimed to have solved the trilemma without fundamentally changing the architecture, usually by optimizing Nodes through software engineering techniques. This is always misleading, as running Nodes on-chain is much more difficult than running Nodes on the Ethereum blockchain. This article will explore why this is the case and why L1 client software engineering alone cannot scale Ethereum.
However, the combination of data availability sampling and SNARKs does solve the Trilemma: it allows clients to verify that a certain amount of data is available and a certain number of computational steps are correctly executed with only a small amount of data downloaded and minimal computation. SNARKs are trustless. Data availability sampling has a subtle few-of-N trust model, but it retains the fundamental property of an unscalable chain, which is that even a 51% attack cannot force bad blocks to be accepted by the network.
Another way to solve the trilemma is the Plasma architecture, which uses clever techniques to incentivize users with the responsibility of monitoring data availability in a compatible manner. As early as 2017-2019, when we only had fraud proof as a means to scale computational power, Plasma was severely limited in terms of secure execution. However, with the widespread adoption of SNARKs (Succinct Non-Interactive Zero Knowledge Proofs), the Plasma architecture becomes more feasible for a wider range of use cases than ever before.
Further Progress on Data Availability Sampling
What problem are we solving?
On March 13, 2024, when Dencun is upgraded and launched, there will be approximately 3 blobs of about 125 kB per 12-second slot in the Ethereum blockchain, or about 375 kB of available bandwidth for each slot. Assuming transaction data is published directly on-chain, the maximum TPS of ERC20 transfers on Ethereum Rollup is approximately 173.6 TPS.
If we add calldata from ETH Ethereum (theoretical maximum: 30 million Gas per slot / 16 gas per byte = 1,875,000 bytes per slot), the TPS will become 607. With PeerDAS, the number of blobs may increase to 8-16, which will provide 463-926 TPS for calldata.
This is a significant upgrade to the Ethereum L1, but it’s not enough. We want more scalability. Our mid-term goal is 16 MB per slot, which, combined with Rollup data compression improvements, will bring ~58000 TPS.
What is it? How does it work?
PeerDAS is a relatively simple implementation of ‘1D sampling’. In the ETH network, each blob is a 4096-degree polynomial over a 253-bit prime field. We broadcast shares of the polynomial, where each share contains 16 evaluation values from adjacent 16 coordinates out of a total of 8192 coordinates. Among these 8192 evaluation values, any 4096 (any 64 out of 128 possible samples according to the current proposed parameters) can recover the blob.
The working principle of PeerDAS is to make each client listen to a small amount of subnet, where the i-th subnet broadcasts any blob’s i-th sample, and requests for blobs on other subnets needed by asking peers in the global p2p network (who will listen to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism without additional inquiries at the peer layer. The current proposal is for Nodes participating in Proof of Stake to use SubnetDAS, while other Nodes (i.e., clients) use PeerDAS.
In theory, we can scale the ‘1D sampling’ to a fairly large extent: if we increase the maximum number of blobs to 256 (targeting 128), we can achieve the 16MB goal, and the data availability sampling has 16 samples per Node * 128 blobs * 512 bytes per sample per blob = 1MB data bandwidth per slot. This is just within our tolerance range: it is feasible, but it means that bandwidth-constrained clients cannot sample. We can optimize this to some extent by reducing the number of blobs and increasing the blob size, but this will increase the reconstruction cost.
Therefore, we ultimately want to go further and perform 2D sampling, which not only randomly samples within a blob, but also randomly samples between blobs. By utilizing the linear properties of KZG commitments, we can extend a set of blobs within a Block with a new set of virtual blobs that redundantly encode the same information.
Therefore, in the end, we want to go further and perform 2D sampling, which not only occurs within the blob but also between blobs for random sampling. The linear properties of KZG commitments are used to expand a set of blobs within a Block, which includes a new virtual blob list that redundantly encodes the same information.
2D Sampling. Source: a16z crypto
It is crucial that the expansion of computing commitments does not require a blob, so the scheme is fundamentally friendly to distributed Block construction. The Node that actually constructs the Block only needs to have a blob KZG commitment, and they can rely on Data Availability Sampling (DAS) to verify the availability of data blocks. One-dimensional Data Availability Sampling (1D DAS) is also fundamentally friendly to distributed block construction.
What are the links to existing research?
Introduction to Data Availability Original Post (2018):
2.Follow-up paper:
Explanation article about DAS, paradigm:
2D availability with KZG commitment:
PeerDAS on 5.ethresear.ch: And the paper:
EIP-7594:
SubnetDAS on 7.ethresear.ch:
8.2D recoverability of subtle differences in sampling:
What else needs to be done? What are the trade-offs?
Next up is the implementation and launch of PeerDAS. Afterward, we will continue to increase the number of blobs on PeerDAS while carefully observing the network and improving the software to ensure security. This is a gradual process. At the same time, we hope to have more academic work to standardize the interaction of PeerDAS and other versions of DAS with fork choice rule security issues.
In the further future, we need to do more work to determine the ideal version of 2D DAS and prove its security properties. We also hope to eventually move away from KZG towards a quantum-safe alternative that does not require trusted setup. Currently, we are not clear which candidate schemes are friendly to distributed Block construction. Even with the expensive ‘brute force’ technique, using recursive STARK to generate validity proof for reconstructing rows and columns is not enough to meet the requirements, because although technically, the size of a STARK is O(log(n) * log(log(n)) hash values (using STIR), in practice, a STARK is almost as large as the entire blob.
The long-term realistic path I believe is:
Implement ideal 2D DAS;
Stick to using 1D DAS, sacrificing sampling bandwidth efficiency, accepting lower data upper limits for simplicity and robustness.
3.(Hard pivot)Give up DA, fully accept Plasma as the main Layer2 architecture we follow.
Please note that even if we decide to scale directly on the L1 layer, this option exists. This is because if the L1 layer needs to process a large number of TPS, the L1 Block will become very large, and clients will want an efficient way to verify their correctness, so we will have to use the same technology as Rollup (such as ZK-EVM and DAS) at the L1 layer.
How to interact with other parts of the roadmap?
If data compression is implemented, the demand for 2D DAS will be reduced, or at least there will be latency, if Plasma is widely used, the demand will be further reduced. DAS also poses a challenge to the distributed Block construction protocol and mechanism: although DAS is theoretically friendly to distributed reconstruction, this needs to be combined with the package inclusion list proposal and its surrounding fork selection mechanism in practice.
Data Compression
What problem are we solving?
Each transaction in Rollup occupies a lot of on-chain data space: about 180 bytes are required for ERC20 transfer. Even with ideal data availability sampling, this limits the scalability of the Layer protocol. With each slot being 16MB, we get:
16000000 / 12 / 180 = 7407 TPS
What if we could not only solve the problem of the numerator, but also the problem of the denominator, allowing each transaction in the Rollup to occupy fewer bytes on-chain?
What is it and how does it work?
In my opinion, the best explanation is this picture from two years ago:
In zero-byte compression, replace each long sequence of zero bytes with two bytes to indicate how many zero bytes there are. Going further, we take advantage of the specific properties of the transaction:
Signature Aggregation: We have switched from ECDSA signature to BLS signature. The feature of BLS signature is that multiple signatures can be combined into a single signature, which can prove the validity of all original signatures. At the L1 level, due to the high cost of verification even after aggregation, the use of BLS signature is not considered. However, in environments such as L2 where data is scarce, using BLS signature is meaningful. The aggregation feature of ERC-4337 provides a way to implement this functionality.
Replace Address with pointers: If a certain Address has been used before, we can replace the 20-byte Address with a 4-byte pointer that points to a position in the history.
Custom Serialization of Transaction Values - Most transaction values have very few digits, for example, 0.25 ETH is represented as 250,000,000,000,000,000 wei. The maximum base fee and priority fee are also similar. Therefore, we can use a custom decimal float format to represent most currency values.
What are the links to existing research?
Explore sequence.xyz:
L2 Calldata optimized contract:
Based on validity proof Rollups (also known as ZK rollups) release status differences rather than transactions:
4.BLS Wallet - Achieve BLS aggregation through ERC-4337:
What else needs to be done and what are the trade-offs?
The next step is to actually implement the above solution. The main considerations include:
Switching to BLS signatures requires a lot of effort and may drop compatibility with trusted hardware chips that enhance security. It can be replaced with ZK-SNARK encapsulation using other signature schemes.
Dynamic compression (e.g., replacing Address with pointers) will make the client code more complex.
Publishing the state difference to on-chain instead of transactions will drop auditability and make many software (such as blockchain explorer) unable to work.
How to interact with other sections of the roadmap?
Adopting ERC-4337, and eventually incorporating part of it into L2 EVM, can greatly accelerate the deployment of aggregation technology. Placing part of the content of ERC-4337 on L1 can expedite its deployment on L2.
Generalized Plasma
What problem are we solving?
Even with a 16 MB blob and data compression, 58,000 TPS may not be enough to fully meet the needs of consumer payments, Decentralization social, or other high-bandwidth areas, especially when we begin to consider privacy factors, which could drop scalability 3-8 times. For high volume, low-value use cases, one current option is to use Validium, which keeps data off-chain and utilizes an interesting security model: operators cannot steal user funds, but they may temporarily or permanently freeze all user funds. But we can do better.
What is it, how does it work?
Plasma is a scaling solution that involves an operator publishing Blocks off-chain and putting the Merkle roots of these Blocks on-chain (unlike Rollup, which puts complete Blocks on-chain). For each Block, the operator sends each user a Merkle branch to prove what changes, if any, occurred to their assets. Users can extract their assets by providing the Merkle branch. Importantly, this branch does not have to be rooted in the latest state. Therefore, even if data availability becomes an issue, users can still recover their assets by extracting the latest state available to them. If a user submits an invalid branch (e.g., extracting assets they have already sent to someone else, or the operator creating an asset out of thin air), the legitimacy of the asset can be determined through the on-chain challenge mechanism.
Plasma Cash chain diagram. Transactions spending coin i are placed at the i-th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns Token 1, David owns Token 4, and George owns Token 6.
Earlier versions of Plasma can only handle payment use cases and cannot be effectively further promoted. However, if we require each root to be verified with SNARK, then Plasma will become much more powerful. Each challenge game can be greatly simplified because we exclude most of the possible paths for operator cheating. At the same time, new paths are also opened up, enabling Plasma technology to be expanded to a wider range of asset categories. Finally, in the case where the operator does not cheat, users can withdraw funds immediately without waiting for a one-week challenge period.
One method (not the only method) for creating an EVM Plasma chain: using ZK-SNARK to build a parallel UTXO tree that reflects the balance changes made by the EVM and defines a unique mapping of the “same Token” at different points in history. Plasma structure can then be built on top of it.
A key insight is that the Plasma system does not need to be perfect. Even if you can only protect a subset of assets (e.g., tokens that have not moved in the past week), you have greatly improved the current state of the super-scalable EVM (i.e., Validium).
Another type of structure is the hybrid Plasma/Rollup, such as Intmax. These constructions put a very small amount of data (e.g., 5 bytes) on-chain for each user, which can achieve certain characteristics between Plasma and Rollup: in the case of Intmax, you can achieve very high scalability and privacy, although theoretically limited to about 266,667 TPS even in the capacity of 16 MB.
What are the links related to existing research?
1.Original Plasma paper:
2.Plasma Cash:
Plasma Cashflow:
4.Intmax (2023):
What else needs to be done? What are the trade-offs?
The remaining main task is to put the Plasma system into practical production applications. As mentioned above, Plasma is not a choice between ‘Plasma’ and ‘Validium’: any Validium can at least improve its security attributes to some extent by incorporating Plasma features into its exit mechanism. The focus of the research is on obtaining the best attributes for the EVM (considering trust requirements, worst-case L1 Gas costs, and the ability to resist DoS attacks), as well as alternative specific application structures. In addition, compared to Rollup, Plasma is conceptually more complex, which requires direct resolution through research and construction of a better general framework.
The main trade-offs of using Plasma designs are that they rely more on operators and are more difficult to base, although the hybrid Plasma/Rollup design can often avoid this weakness.
How to interact with other parts of the roadmap?
The more effective the Plasma solution is, the less pressure there is on L1 with high-performance data availability. Moving activities to L2 can also reduce MEV pressure on L1.
Mature L2 Proof System
What problem are we solving?
Currently, most Rollups are not actually Trustless. There is a security committee that has the ability to override the behavior of the proof system (optimistic or validity). In some cases, the proof system may not even run, or if it does run, it only has a ‘consultation’ function. The most advanced Rollups include: (i) some Trustless application-specific Rollups, such as Fuel; (ii) as of the time of writing, Optimism and Arbitrum are two EVM Rollups that have implemented a partial Trustless milestone called ‘Phase 1’. The reason why Rollups have not made greater progress is due to concerns about bugs in the code. We need Trustless Rollups, so we must face and solve this problem.
What is it, how does it work?
First, let’s review the ‘stage’ system introduced in this article.
Phase 0: Users must be able to run Node and synchronize the chain. It’s okay if the verification is completely trustworthy / centralized.
Phase 1: There must be a (trustless) proof system to ensure that only valid transactions are accepted. A security committee that can override the proof system is allowed, but it requires a threshold vote of 75%. Additionally, the quorum-blocking portion (i.e. 26%+) of the committee must be outside the main company building the Rollup. A weaker upgrade mechanism (e.g. DAO) is allowed, but it must have sufficient latency so that users can withdraw their funds before the approval of malicious upgrades.
Phase 2: There must be a (trustless) proof system to ensure that only valid transactions are accepted. The Security Committee only allows intervention when there are provable errors in the code, such as if two redundant proof systems are inconsistent with each other, or if one proof system accepts two different post-state roots for the same Block (or doesn’t accept any content for a long enough time, such as a week). Upgrading mechanisms are allowed but must have a long latency.
Our goal is to reach Phase 2. The main challenge to reach Phase 2 is to gain enough confidence and prove that the system is actually trustworthy. There are two main ways to do this:
Formal Verification: We can use modern mathematics and computing techniques to prove (optimistic and validity) that the proof system only accepts Blocks that comply with the EVM specification. These techniques have been around for decades, but recent advancements (such as Lean 4) have made them more practical, and advancements in AI-assisted proofs may further accelerate this trend.
Multi-provers: Create multiple proof systems and invest funds into these proof systems along with the security committee (or other small tools with trust assumptions, such as TEE). If the proof systems agree, the security committee has no authority; if they disagree, the security committee can only choose between them and cannot unilaterally impose its own answer.
The programmatic diagram of multiple validators combines an optimistic proof system, a validity proof system, and a security council.
What are the links to existing research?
EVM K Semantics (formal verification work from 2017):
Speech on the Idea of Multi-Proof (2022):
Plan to use multi-factor authentication:
What else needs to be done? What are the trade-offs?
For Formal Verification, the workload is substantial. We need to create a formal verification version of the entire SNARK prover for the Ethereum Virtual Machine (EVM). This is an extremely complex project, although we have already started. There is a trick that can greatly simplify this task: we can create a Formal Verification SNARK prover for a minimal Virtual Machine (such as RISC-V or Cairo), and then implement the EVM within this minimal Virtual Machine (and formally prove its equivalence with the specifications of other Ethereum Virtual Machines).
For multi-proof, there are still two main parts that have not been completed. Firstly, we need to have sufficient confidence in at least two different proof systems to ensure that they are both secure and that if they have problems, these problems should be different and unrelated (so they do not occur simultaneously). Secondly, we need to have a very high level of trust in the underlying logic of the merged proof system. This part of the code should be much smaller. There are some methods to make it very small, just by storing the funds in a secure multisig contract with signers representing the various proof systems, but this will increase the on-chain Gas cost. We need to find a balance between efficiency and security.
How to interact with other parts of the roadmap?
Moving the activity to L2 can drop the MEV pressure on L1.
Cross L2 Interoperability Improvement
What problem are we solving?
One of the main challenges faced by the current L2 ecosystem is that users find it difficult to navigate within it. In addition, the simplest methods often reintroduce trust assumptions, such as centralized Cross-Chain Interaction, RPC clients, and so on. We need to make using the L2 ecosystem feel like using a unified Ethereum ecosystem.
What is it? How does it work?
There are many types of cross-L2 interoperability improvements. In theory, Rollup-centric ETH and Sharding L1 are the same thing. Currently, the L2 ecosystem of ETH is still far from ideal in practice due to these shortcomings:
Address of a specific chain: The Address should contain chain information (L1, Optimism, Arbitrum, etc.). Once this is achieved, the cross L2 sending process can be accomplished by simply putting the Address into the ‘send’ field, at which point the Wallet can handle how to send in the background (including using Cross-Chain Interaction protocol) on its own.
Payment Request for Specific Chain: It should be easy and standardized to create a message in the form of ‘send X Y tokens to me on chain Z’. There are mainly two application scenarios: (i) payments between individuals or between individuals and merchant services; (ii) DApp fund requests.
Cross-Chain Interaction Exchange and Gas Payment: There should be a standardized open protocol to express Cross-Chain Interaction operations, such as ‘I will send 1 ETH (on Optimism) to the person who sends me 0.9999 ETH on Arbitrum’, and ‘I will send 0.0001 ETH (on Optimism) to the person who includes this transaction on Arbitrum’. ERC-7683 is an attempt for the former, while RIP-7755 is an attempt for the latter, although both have broader applications than these specific use cases.
light client: Users should be able to verify the chain they are interacting with, not just trust RPC providers. a16z crypto’s Helios can achieve this (for the Ethereum chain itself), but we need to extend this trustlessness to L2. ERC-3668 (CCIP-read) is one strategy to achieve this goal.
How the light client updates its view of the Ethereum header chain. With the header chain, any state object can be verified using Merkle proofs. Once you have the correct L1 state object, you can use Merkle proofs (and signatures, if you want to check pre-commits) to verify any state object on L2. Helios has achieved the former. Extending to the latter is a standardization challenge.
Keystore Wallet: Nowadays, if you want to update the Secret Key that controls your Smart Contract Wallet, you must update it on all N chains where the Wallet exists. Keystore Wallet is a technology that allows the Secret Key to exist in only one place (either on L1 or possibly on L2 in the future), and then any L2 that has a copy of the Wallet can read the Secret Key from it. This means that the update only needs to be done once. To improve efficiency, Keystore Wallet requires L2 to have a standardized way to read information from L1 at no cost; there are two proposals for this, namely L1SLOAD and REMOTESTATICCALL.
How Keystore Wallet Works
More radical ‘Shared Token Bridge’ concept: Imagine a world where all L2s are validity proof Rollups and each slot submits to the ETH network. Even in such a world, withdrawing and depositing is still required to transfer assets from one L2 to another in its native state, which incurs a large amount of L1 gas fees. One solution to this problem is to create a shared minimal Rollup that has only one function: to maintain which L2 owns each type of token and how much balance each has, and to allow these balances to be batch updated through a series of cross-L2 sending operations initiated by any L2. This will enable cross-L2 transfers without having to pay L1 gas fees for each transfer or use technologies based on Liquidity Providers such as ERC-7683.
Synchronous Composition: Allows synchronous calls between specific L2 and L1 or multiple L2. This helps improve the financial efficiency of the Decentralized Finance protocol. The former can be achieved without any cross-L2 coordination, while the latter requires shared ordering. Rollup-based technology automatically applies to all of these technologies.
What are the links to existing research?
Specific Address for the chain: ERC-3770:
2.ERC-7683:
3.RIP-7755:
Scroll keystore Wallet design style:
Helios:
6.ERC-3668 (sometimes referred to as CCIP Read):
Justin Drake’s proposal of ‘pre-confirmed (shared) based’.
8.L1SLOAD (RIP-7728):
9.REMOTESTATICCALL in Optimism:
AggLayer, which includes the idea of a shared token bridge:
What else needs to be done? What are the trade-offs?
Many of the examples above face the dilemma of when to standardize and which layers to standardize. If standardization occurs too early, it may entrench a suboptimal solution. If standardization occurs too late, it may lead to unnecessary fragmentation. In some cases, there is both a short-term solution with weaker attributes that is easier to implement, as well as a long-term solution that is “ultimately correct” but may take years to materialize.
These tasks are not just technical issues, they are also (and perhaps primarily) social issues that require L2 and Wallet, as well as L1 collaboration.
How to interact with other parts of the roadmap?
Most of these proposals are at a “higher-level” structure, so they have little impact on the L1 level. One exception is the shared ordering, which has a significant impact on maximal extractable value (MEV).
Extend Execution on L1
What problem are we solving?
If L2 becomes highly scalable and successful, but L1 can still only handle a very small volume, then there may be many risks for Ethereum (ETH) network.
The economic situation of ETH assets will become more unstable, which in turn will affect the long-term security of the network.
Many L2s benefit greatly from close connections to highly developed financial ecosystems on L1. If this ecosystem is significantly weakened, the incentive to become an L2 (rather than an independent L1) will be weakened.
It takes a long time for L2 to achieve the same level of security as L1.
If L2 fails (e.g., due to malicious behavior or disappearance of the operator), users still need to recover their assets through L1. Therefore, L1 needs to be powerful enough to occasionally handle the complex and chaotic finalization work of L2.
For these reasons, continuing to expand L1 itself and ensuring that it can continue to accommodate more and more use cases is very valuable.
What is it? How does it work?
The simplest way to scale is to directly increase the Gas limit. However, this may lead to centralization of L1, thus weakening another important feature of Ethereum L1: its credibility as a robust base layer. There is still controversy over how much it is sustainable to simply increase the Gas limit, and this will also vary depending on which other technologies are implemented to make the validation of larger blocks easier (e.g., historical expiry, statelessness, L1 EVM validity proof). Another important aspect that needs continuous improvement is the efficiency of Ethereum client software, which is much higher today than five years ago. An effective strategy for increasing the L1 Gas limit will involve accelerating the development of these validation technologies.
EOF: A new EVM bytecode format that is more friendly to static analysis and enables faster implementation. Considering these efficiency improvements, EOF bytecode can achieve lower gas fees.
Multi-dimensional Gas pricing: setting different basic fees and limits for computation, data, and storage, can increase the average capacity of Ethereum L1 without increasing the maximum capacity (thus avoiding new security risks).
Drop specific opcodes and precompiled gas costs - Historically, to prevent denial-of-service attacks, we have repeatedly increased the gas costs of certain operations with prices that are too low. One thing that can be done more is to drop the gas costs of opcodes with prices that are too high. For example, addition is much cheaper than multiplication, but currently the costs of the ADD and MUL opcodes are the same. We can drop the cost of ADD and even make the cost of simpler opcodes like PUSH lower. EOF is more optimized in this regard overall.
EVM-MAX and SIMD: EVM-MAX is a proposal that allows for more efficient native large number modular arithmetic as a separate module of the EVM. The computed values by EVM-MAX operations can only be accessed by other EVM-MAX opcodes unless intentionally exported. This allows for a larger space to optimize the storage format of these values. SIMD (single instruction multiple data) is a proposal that allows for the efficient execution of the same instruction on arrays of values. Together, they can create a powerful co-processor alongside the EVM for more efficient encryption operations. This is particularly useful for privacy protocols and L2 protection systems, thus contributing to the scalability of both L1 and L2.
These improvements will be discussed in more detail in future Splurge articles.
Finally, the third strategy is native Rollups (or enshrined rollups): essentially, creating many parallel running EVM replicas, thus generating a model equivalent to what Rollup can offer, but more natively integrated into the protocol.
What are the links to existing research?
Polynya’s ETH L1 scalability roadmap:
Multidimensional Gas Pricing:
3.EIP-7706:
4.EOF:
5.EVM-MAX:
6.SIMD:
Native rollups: rollups
Max Resnick’s interview on the value of expanding L1 in China:
Justin Drake on using SNARK and native Rollups for scaling:
What else needs to be done and what are the trade-offs?
L1 has three expansion strategies, which can be carried out separately or in parallel:
Improve technology (e.g., client code, stateless clients, historical expiration) to make L1 easier to verify, and then increase the gas limit.
drop the cost of specific operations, increasing average capacity without increasing the worst-case risk;
Native Rollups (i.e., creating N parallel copies of the EVM).
After understanding these different technologies, we will find that each has different trade-offs. For example, native Rollups have many of the same weaknesses in terms of composability as regular Rollups: you cannot send a single transaction to carry out operations across multiple Rollups synchronously, as you can with contracts on the same L1 (or L2). Increasing the gas limit will weaken other benefits that can be achieved by simplifying L1 verification, such as increasing the proportion of users running verification nodes and increasing the number of solo stakers. Depending on the implementation, making specific operations in the Ethereum Virtual Machine (EVM) cheaper may increase the overall complexity of the EVM.
Any major question that needs to be answered by any L1 expansion roadmap is: What are the ultimate visions of L1 and L2 respectively? Obviously, it is absurd to put all content on L1: potential use cases may involve hundreds of thousands of transactions per second, which would make L1 completely unable to verify (unless we adopt the native Rollup approach). However, we do need some guiding principles to ensure that we do not fall into such a situation: Gas limit increased by 10 times, severely damaging the Decentralization of Ethereum L1.
One view of the division of labor between L1 and L2
How to interact with other parts of the roadmap?
Bringing more users into L1 not only means improving scalability, but also improving other aspects of L1. This means that more MEV will stay on L1 (rather than just being an issue for L2), so the need to address MEV explicitly becomes more urgent. This will greatly enhance the value of fast slot time on L1. At the same time, this also heavily relies on the smooth progress of L1 (the Verge) validation.
Related reading: “Vitalik’s new article: What are the areas for improvement in Ethereum’s PoS? How can it be achieved?”
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Vitalik's new article: The possible future of Ethereum, The Surge
Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc and Georgios Konstantopoulos.
Initially, there were two scaling strategies in the roadmap of Ethereum. One (see an early paper in 2015) is ‘Sharding’: each Node only needs to validate and store a small portion of the transactions, rather than validating and storing all transactions in the chain. Any other peer-to-peer network (such as BitTorrent) also works this way, so we can certainly make the blockchain work in the same way. The other is Layer2 protocol: these networks will be built on top of Ethereum, allowing it to fully benefit from its security while keeping most of the data and computation off-chain. Layer2 protocols refer to state channels in 2015, Plasma in 2017, and then Rollup in 2019. Rollup is more powerful than state channels or Plasma, but they require a large amount of on-chain data bandwidth. Fortunately, by 2019, Sharding research had already solved the problem of validating ‘data availability’ at a large scale. As a result, the two paths merged together, and we obtained a roadmap centered around Rollup, which is still the scaling strategy of Ethereum today.
The Rollup-centric roadmap proposes a simple division of labor: ETHereum L1 focuses on being a robust and Decentralization-oriented base layer, while L2 takes on the task of helping the ecosystem expand. This model is ubiquitous in society: the existence of the court system (L1) is not for the pursuit of ultra-high speed and efficiency, but to protect contracts and property rights, while entrepreneurs (L2) are to build on this solid base layer and lead humanity towards Mars, whether in a literal or metaphorical sense.
This year, the roadmap centered on Rollup has achieved significant milestones: with the launch of EIP-4844 blobs, the data bandwidth of Ethereum L1 has increased significantly, and multiple Ethereum Virtual Machine (EVM) Rollups have entered the first stage. Each L2 exists as a ‘Sharding’ with its own internal rules and logic, and the diversity and diversification of Sharding implementation have now become a reality. However, as we can see, this path also faces some unique challenges. Therefore, our current task is to complete the roadmap centered on Rollup, address these issues, while maintaining the robustness and decentralization unique to Ethereum L1.
The Surge: Key Objectives
In the future, Ethereum can achieve more than 100,000 TPS through L2.
Maintain the Decentralization and robustness of L1;
At least some L2 fully inherit Ethereum’s core properties (Trustless, Open, Anti-censorship);
4, the ETH community should feel like a unified ecosystem, not 34 different blockchains.
Contents of this chapter
Scalability Trilemma
Further Progress in Data Availability Sampling
Data Compression
4.Generalized Plasma
Mature L2 proof system
Improved L2 interoperability
Extend execution on L1
Scalability Trilemma
The scalability trilemma is an idea proposed in 2017 that suggests a contradiction between the three characteristics of blockchain: Decentralization (specifically, low cost of running Nodes), scalability (handling a large number of transactions), and security (attackers need to destroy a large portion of the network’s Nodes to make a single transaction fail).
It is worth noting that the triangle paradox is not a theorem, and the post introducing the triangle paradox does not come with a mathematical proof. It does present a heuristic mathematical argument: if a Decentralization-friendly Node (e.g., a consumer-grade laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then (i) each transaction can only be seen by 1/k nodes, which means that an attacker can carry out a malicious transaction by attacking a small number of nodes, or (ii) your node will become powerful, but your chain will not be Decentralized. The purpose of this article is not to prove that breaking the triangle paradox is impossible; on the contrary, it aims to show that breaking the trilemma is difficult, and it requires to some extent jumping out of the thinking framework implied by the argument.
For many years, some high-performance chains have often claimed to have solved the trilemma without fundamentally changing the architecture, usually by optimizing Nodes through software engineering techniques. This is always misleading, as running Nodes on-chain is much more difficult than running Nodes on the Ethereum blockchain. This article will explore why this is the case and why L1 client software engineering alone cannot scale Ethereum.
However, the combination of data availability sampling and SNARKs does solve the Trilemma: it allows clients to verify that a certain amount of data is available and a certain number of computational steps are correctly executed with only a small amount of data downloaded and minimal computation. SNARKs are trustless. Data availability sampling has a subtle few-of-N trust model, but it retains the fundamental property of an unscalable chain, which is that even a 51% attack cannot force bad blocks to be accepted by the network.
Another way to solve the trilemma is the Plasma architecture, which uses clever techniques to incentivize users with the responsibility of monitoring data availability in a compatible manner. As early as 2017-2019, when we only had fraud proof as a means to scale computational power, Plasma was severely limited in terms of secure execution. However, with the widespread adoption of SNARKs (Succinct Non-Interactive Zero Knowledge Proofs), the Plasma architecture becomes more feasible for a wider range of use cases than ever before.
Further Progress on Data Availability Sampling
What problem are we solving?
On March 13, 2024, when Dencun is upgraded and launched, there will be approximately 3 blobs of about 125 kB per 12-second slot in the Ethereum blockchain, or about 375 kB of available bandwidth for each slot. Assuming transaction data is published directly on-chain, the maximum TPS of ERC20 transfers on Ethereum Rollup is approximately 173.6 TPS.
If we add calldata from ETH Ethereum (theoretical maximum: 30 million Gas per slot / 16 gas per byte = 1,875,000 bytes per slot), the TPS will become 607. With PeerDAS, the number of blobs may increase to 8-16, which will provide 463-926 TPS for calldata.
This is a significant upgrade to the Ethereum L1, but it’s not enough. We want more scalability. Our mid-term goal is 16 MB per slot, which, combined with Rollup data compression improvements, will bring ~58000 TPS.
What is it? How does it work?
PeerDAS is a relatively simple implementation of ‘1D sampling’. In the ETH network, each blob is a 4096-degree polynomial over a 253-bit prime field. We broadcast shares of the polynomial, where each share contains 16 evaluation values from adjacent 16 coordinates out of a total of 8192 coordinates. Among these 8192 evaluation values, any 4096 (any 64 out of 128 possible samples according to the current proposed parameters) can recover the blob.
The working principle of PeerDAS is to make each client listen to a small amount of subnet, where the i-th subnet broadcasts any blob’s i-th sample, and requests for blobs on other subnets needed by asking peers in the global p2p network (who will listen to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism without additional inquiries at the peer layer. The current proposal is for Nodes participating in Proof of Stake to use SubnetDAS, while other Nodes (i.e., clients) use PeerDAS.
In theory, we can scale the ‘1D sampling’ to a fairly large extent: if we increase the maximum number of blobs to 256 (targeting 128), we can achieve the 16MB goal, and the data availability sampling has 16 samples per Node * 128 blobs * 512 bytes per sample per blob = 1MB data bandwidth per slot. This is just within our tolerance range: it is feasible, but it means that bandwidth-constrained clients cannot sample. We can optimize this to some extent by reducing the number of blobs and increasing the blob size, but this will increase the reconstruction cost.
Therefore, we ultimately want to go further and perform 2D sampling, which not only randomly samples within a blob, but also randomly samples between blobs. By utilizing the linear properties of KZG commitments, we can extend a set of blobs within a Block with a new set of virtual blobs that redundantly encode the same information.
Therefore, in the end, we want to go further and perform 2D sampling, which not only occurs within the blob but also between blobs for random sampling. The linear properties of KZG commitments are used to expand a set of blobs within a Block, which includes a new virtual blob list that redundantly encodes the same information.
It is crucial that the expansion of computing commitments does not require a blob, so the scheme is fundamentally friendly to distributed Block construction. The Node that actually constructs the Block only needs to have a blob KZG commitment, and they can rely on Data Availability Sampling (DAS) to verify the availability of data blocks. One-dimensional Data Availability Sampling (1D DAS) is also fundamentally friendly to distributed block construction.
What are the links to existing research?
2.Follow-up paper:
Explanation article about DAS, paradigm:
2D availability with KZG commitment:
PeerDAS on 5.ethresear.ch: And the paper:
SubnetDAS on 7.ethresear.ch:
8.2D recoverability of subtle differences in sampling:
What else needs to be done? What are the trade-offs?
Next up is the implementation and launch of PeerDAS. Afterward, we will continue to increase the number of blobs on PeerDAS while carefully observing the network and improving the software to ensure security. This is a gradual process. At the same time, we hope to have more academic work to standardize the interaction of PeerDAS and other versions of DAS with fork choice rule security issues.
In the further future, we need to do more work to determine the ideal version of 2D DAS and prove its security properties. We also hope to eventually move away from KZG towards a quantum-safe alternative that does not require trusted setup. Currently, we are not clear which candidate schemes are friendly to distributed Block construction. Even with the expensive ‘brute force’ technique, using recursive STARK to generate validity proof for reconstructing rows and columns is not enough to meet the requirements, because although technically, the size of a STARK is O(log(n) * log(log(n)) hash values (using STIR), in practice, a STARK is almost as large as the entire blob.
The long-term realistic path I believe is:
Implement ideal 2D DAS;
Stick to using 1D DAS, sacrificing sampling bandwidth efficiency, accepting lower data upper limits for simplicity and robustness.
3.(Hard pivot)Give up DA, fully accept Plasma as the main Layer2 architecture we follow.
Please note that even if we decide to scale directly on the L1 layer, this option exists. This is because if the L1 layer needs to process a large number of TPS, the L1 Block will become very large, and clients will want an efficient way to verify their correctness, so we will have to use the same technology as Rollup (such as ZK-EVM and DAS) at the L1 layer.
How to interact with other parts of the roadmap?
If data compression is implemented, the demand for 2D DAS will be reduced, or at least there will be latency, if Plasma is widely used, the demand will be further reduced. DAS also poses a challenge to the distributed Block construction protocol and mechanism: although DAS is theoretically friendly to distributed reconstruction, this needs to be combined with the package inclusion list proposal and its surrounding fork selection mechanism in practice.
Data Compression
What problem are we solving?
Each transaction in Rollup occupies a lot of on-chain data space: about 180 bytes are required for ERC20 transfer. Even with ideal data availability sampling, this limits the scalability of the Layer protocol. With each slot being 16MB, we get:
16000000 / 12 / 180 = 7407 TPS
What if we could not only solve the problem of the numerator, but also the problem of the denominator, allowing each transaction in the Rollup to occupy fewer bytes on-chain?
What is it and how does it work?
In my opinion, the best explanation is this picture from two years ago:
In zero-byte compression, replace each long sequence of zero bytes with two bytes to indicate how many zero bytes there are. Going further, we take advantage of the specific properties of the transaction:
Signature Aggregation: We have switched from ECDSA signature to BLS signature. The feature of BLS signature is that multiple signatures can be combined into a single signature, which can prove the validity of all original signatures. At the L1 level, due to the high cost of verification even after aggregation, the use of BLS signature is not considered. However, in environments such as L2 where data is scarce, using BLS signature is meaningful. The aggregation feature of ERC-4337 provides a way to implement this functionality.
Replace Address with pointers: If a certain Address has been used before, we can replace the 20-byte Address with a 4-byte pointer that points to a position in the history.
Custom Serialization of Transaction Values - Most transaction values have very few digits, for example, 0.25 ETH is represented as 250,000,000,000,000,000 wei. The maximum base fee and priority fee are also similar. Therefore, we can use a custom decimal float format to represent most currency values.
What are the links to existing research?
Explore sequence.xyz:
L2 Calldata optimized contract:
Based on validity proof Rollups (also known as ZK rollups) release status differences rather than transactions:
4.BLS Wallet - Achieve BLS aggregation through ERC-4337:
What else needs to be done and what are the trade-offs?
The next step is to actually implement the above solution. The main considerations include:
Switching to BLS signatures requires a lot of effort and may drop compatibility with trusted hardware chips that enhance security. It can be replaced with ZK-SNARK encapsulation using other signature schemes.
Dynamic compression (e.g., replacing Address with pointers) will make the client code more complex.
Publishing the state difference to on-chain instead of transactions will drop auditability and make many software (such as blockchain explorer) unable to work.
How to interact with other sections of the roadmap?
Adopting ERC-4337, and eventually incorporating part of it into L2 EVM, can greatly accelerate the deployment of aggregation technology. Placing part of the content of ERC-4337 on L1 can expedite its deployment on L2.
Generalized Plasma
What problem are we solving?
Even with a 16 MB blob and data compression, 58,000 TPS may not be enough to fully meet the needs of consumer payments, Decentralization social, or other high-bandwidth areas, especially when we begin to consider privacy factors, which could drop scalability 3-8 times. For high volume, low-value use cases, one current option is to use Validium, which keeps data off-chain and utilizes an interesting security model: operators cannot steal user funds, but they may temporarily or permanently freeze all user funds. But we can do better.
What is it, how does it work?
Plasma is a scaling solution that involves an operator publishing Blocks off-chain and putting the Merkle roots of these Blocks on-chain (unlike Rollup, which puts complete Blocks on-chain). For each Block, the operator sends each user a Merkle branch to prove what changes, if any, occurred to their assets. Users can extract their assets by providing the Merkle branch. Importantly, this branch does not have to be rooted in the latest state. Therefore, even if data availability becomes an issue, users can still recover their assets by extracting the latest state available to them. If a user submits an invalid branch (e.g., extracting assets they have already sent to someone else, or the operator creating an asset out of thin air), the legitimacy of the asset can be determined through the on-chain challenge mechanism.
Plasma Cash chain diagram. Transactions spending coin i are placed at the i-th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns Token 1, David owns Token 4, and George owns Token 6.
Earlier versions of Plasma can only handle payment use cases and cannot be effectively further promoted. However, if we require each root to be verified with SNARK, then Plasma will become much more powerful. Each challenge game can be greatly simplified because we exclude most of the possible paths for operator cheating. At the same time, new paths are also opened up, enabling Plasma technology to be expanded to a wider range of asset categories. Finally, in the case where the operator does not cheat, users can withdraw funds immediately without waiting for a one-week challenge period.
One method (not the only method) for creating an EVM Plasma chain: using ZK-SNARK to build a parallel UTXO tree that reflects the balance changes made by the EVM and defines a unique mapping of the “same Token” at different points in history. Plasma structure can then be built on top of it.
A key insight is that the Plasma system does not need to be perfect. Even if you can only protect a subset of assets (e.g., tokens that have not moved in the past week), you have greatly improved the current state of the super-scalable EVM (i.e., Validium).
Another type of structure is the hybrid Plasma/Rollup, such as Intmax. These constructions put a very small amount of data (e.g., 5 bytes) on-chain for each user, which can achieve certain characteristics between Plasma and Rollup: in the case of Intmax, you can achieve very high scalability and privacy, although theoretically limited to about 266,667 TPS even in the capacity of 16 MB.
What are the links related to existing research?
1.Original Plasma paper:
2.Plasma Cash:
4.Intmax (2023):
What else needs to be done? What are the trade-offs?
The remaining main task is to put the Plasma system into practical production applications. As mentioned above, Plasma is not a choice between ‘Plasma’ and ‘Validium’: any Validium can at least improve its security attributes to some extent by incorporating Plasma features into its exit mechanism. The focus of the research is on obtaining the best attributes for the EVM (considering trust requirements, worst-case L1 Gas costs, and the ability to resist DoS attacks), as well as alternative specific application structures. In addition, compared to Rollup, Plasma is conceptually more complex, which requires direct resolution through research and construction of a better general framework.
The main trade-offs of using Plasma designs are that they rely more on operators and are more difficult to base, although the hybrid Plasma/Rollup design can often avoid this weakness.
How to interact with other parts of the roadmap?
The more effective the Plasma solution is, the less pressure there is on L1 with high-performance data availability. Moving activities to L2 can also reduce MEV pressure on L1.
Mature L2 Proof System
What problem are we solving?
Currently, most Rollups are not actually Trustless. There is a security committee that has the ability to override the behavior of the proof system (optimistic or validity). In some cases, the proof system may not even run, or if it does run, it only has a ‘consultation’ function. The most advanced Rollups include: (i) some Trustless application-specific Rollups, such as Fuel; (ii) as of the time of writing, Optimism and Arbitrum are two EVM Rollups that have implemented a partial Trustless milestone called ‘Phase 1’. The reason why Rollups have not made greater progress is due to concerns about bugs in the code. We need Trustless Rollups, so we must face and solve this problem.
What is it, how does it work?
First, let’s review the ‘stage’ system introduced in this article.
Phase 0: Users must be able to run Node and synchronize the chain. It’s okay if the verification is completely trustworthy / centralized.
Phase 1: There must be a (trustless) proof system to ensure that only valid transactions are accepted. A security committee that can override the proof system is allowed, but it requires a threshold vote of 75%. Additionally, the quorum-blocking portion (i.e. 26%+) of the committee must be outside the main company building the Rollup. A weaker upgrade mechanism (e.g. DAO) is allowed, but it must have sufficient latency so that users can withdraw their funds before the approval of malicious upgrades.
Phase 2: There must be a (trustless) proof system to ensure that only valid transactions are accepted. The Security Committee only allows intervention when there are provable errors in the code, such as if two redundant proof systems are inconsistent with each other, or if one proof system accepts two different post-state roots for the same Block (or doesn’t accept any content for a long enough time, such as a week). Upgrading mechanisms are allowed but must have a long latency.
Our goal is to reach Phase 2. The main challenge to reach Phase 2 is to gain enough confidence and prove that the system is actually trustworthy. There are two main ways to do this:
Formal Verification: We can use modern mathematics and computing techniques to prove (optimistic and validity) that the proof system only accepts Blocks that comply with the EVM specification. These techniques have been around for decades, but recent advancements (such as Lean 4) have made them more practical, and advancements in AI-assisted proofs may further accelerate this trend.
Multi-provers: Create multiple proof systems and invest funds into these proof systems along with the security committee (or other small tools with trust assumptions, such as TEE). If the proof systems agree, the security committee has no authority; if they disagree, the security committee can only choose between them and cannot unilaterally impose its own answer.
The programmatic diagram of multiple validators combines an optimistic proof system, a validity proof system, and a security council.
What are the links to existing research?
EVM K Semantics (formal verification work from 2017):
Speech on the Idea of Multi-Proof (2022):
Plan to use multi-factor authentication:
What else needs to be done? What are the trade-offs?
For Formal Verification, the workload is substantial. We need to create a formal verification version of the entire SNARK prover for the Ethereum Virtual Machine (EVM). This is an extremely complex project, although we have already started. There is a trick that can greatly simplify this task: we can create a Formal Verification SNARK prover for a minimal Virtual Machine (such as RISC-V or Cairo), and then implement the EVM within this minimal Virtual Machine (and formally prove its equivalence with the specifications of other Ethereum Virtual Machines).
For multi-proof, there are still two main parts that have not been completed. Firstly, we need to have sufficient confidence in at least two different proof systems to ensure that they are both secure and that if they have problems, these problems should be different and unrelated (so they do not occur simultaneously). Secondly, we need to have a very high level of trust in the underlying logic of the merged proof system. This part of the code should be much smaller. There are some methods to make it very small, just by storing the funds in a secure multisig contract with signers representing the various proof systems, but this will increase the on-chain Gas cost. We need to find a balance between efficiency and security.
How to interact with other parts of the roadmap?
Moving the activity to L2 can drop the MEV pressure on L1.
Cross L2 Interoperability Improvement
What problem are we solving?
One of the main challenges faced by the current L2 ecosystem is that users find it difficult to navigate within it. In addition, the simplest methods often reintroduce trust assumptions, such as centralized Cross-Chain Interaction, RPC clients, and so on. We need to make using the L2 ecosystem feel like using a unified Ethereum ecosystem.
What is it? How does it work?
There are many types of cross-L2 interoperability improvements. In theory, Rollup-centric ETH and Sharding L1 are the same thing. Currently, the L2 ecosystem of ETH is still far from ideal in practice due to these shortcomings:
Address of a specific chain: The Address should contain chain information (L1, Optimism, Arbitrum, etc.). Once this is achieved, the cross L2 sending process can be accomplished by simply putting the Address into the ‘send’ field, at which point the Wallet can handle how to send in the background (including using Cross-Chain Interaction protocol) on its own.
Payment Request for Specific Chain: It should be easy and standardized to create a message in the form of ‘send X Y tokens to me on chain Z’. There are mainly two application scenarios: (i) payments between individuals or between individuals and merchant services; (ii) DApp fund requests.
Cross-Chain Interaction Exchange and Gas Payment: There should be a standardized open protocol to express Cross-Chain Interaction operations, such as ‘I will send 1 ETH (on Optimism) to the person who sends me 0.9999 ETH on Arbitrum’, and ‘I will send 0.0001 ETH (on Optimism) to the person who includes this transaction on Arbitrum’. ERC-7683 is an attempt for the former, while RIP-7755 is an attempt for the latter, although both have broader applications than these specific use cases.
light client: Users should be able to verify the chain they are interacting with, not just trust RPC providers. a16z crypto’s Helios can achieve this (for the Ethereum chain itself), but we need to extend this trustlessness to L2. ERC-3668 (CCIP-read) is one strategy to achieve this goal.
How the light client updates its view of the Ethereum header chain. With the header chain, any state object can be verified using Merkle proofs. Once you have the correct L1 state object, you can use Merkle proofs (and signatures, if you want to check pre-commits) to verify any state object on L2. Helios has achieved the former. Extending to the latter is a standardization challenge.
More radical ‘Shared Token Bridge’ concept: Imagine a world where all L2s are validity proof Rollups and each slot submits to the ETH network. Even in such a world, withdrawing and depositing is still required to transfer assets from one L2 to another in its native state, which incurs a large amount of L1 gas fees. One solution to this problem is to create a shared minimal Rollup that has only one function: to maintain which L2 owns each type of token and how much balance each has, and to allow these balances to be batch updated through a series of cross-L2 sending operations initiated by any L2. This will enable cross-L2 transfers without having to pay L1 gas fees for each transfer or use technologies based on Liquidity Providers such as ERC-7683.
Synchronous Composition: Allows synchronous calls between specific L2 and L1 or multiple L2. This helps improve the financial efficiency of the Decentralized Finance protocol. The former can be achieved without any cross-L2 coordination, while the latter requires shared ordering. Rollup-based technology automatically applies to all of these technologies.
What are the links to existing research?
2.ERC-7683:
3.RIP-7755:
Scroll keystore Wallet design style:
Helios:
6.ERC-3668 (sometimes referred to as CCIP Read):
8.L1SLOAD (RIP-7728):
9.REMOTESTATICCALL in Optimism:
What else needs to be done? What are the trade-offs?
Many of the examples above face the dilemma of when to standardize and which layers to standardize. If standardization occurs too early, it may entrench a suboptimal solution. If standardization occurs too late, it may lead to unnecessary fragmentation. In some cases, there is both a short-term solution with weaker attributes that is easier to implement, as well as a long-term solution that is “ultimately correct” but may take years to materialize.
These tasks are not just technical issues, they are also (and perhaps primarily) social issues that require L2 and Wallet, as well as L1 collaboration.
How to interact with other parts of the roadmap?
Most of these proposals are at a “higher-level” structure, so they have little impact on the L1 level. One exception is the shared ordering, which has a significant impact on maximal extractable value (MEV).
Extend Execution on L1
What problem are we solving?
If L2 becomes highly scalable and successful, but L1 can still only handle a very small volume, then there may be many risks for Ethereum (ETH) network.
The economic situation of ETH assets will become more unstable, which in turn will affect the long-term security of the network.
Many L2s benefit greatly from close connections to highly developed financial ecosystems on L1. If this ecosystem is significantly weakened, the incentive to become an L2 (rather than an independent L1) will be weakened.
It takes a long time for L2 to achieve the same level of security as L1.
If L2 fails (e.g., due to malicious behavior or disappearance of the operator), users still need to recover their assets through L1. Therefore, L1 needs to be powerful enough to occasionally handle the complex and chaotic finalization work of L2.
For these reasons, continuing to expand L1 itself and ensuring that it can continue to accommodate more and more use cases is very valuable.
What is it? How does it work?
The simplest way to scale is to directly increase the Gas limit. However, this may lead to centralization of L1, thus weakening another important feature of Ethereum L1: its credibility as a robust base layer. There is still controversy over how much it is sustainable to simply increase the Gas limit, and this will also vary depending on which other technologies are implemented to make the validation of larger blocks easier (e.g., historical expiry, statelessness, L1 EVM validity proof). Another important aspect that needs continuous improvement is the efficiency of Ethereum client software, which is much higher today than five years ago. An effective strategy for increasing the L1 Gas limit will involve accelerating the development of these validation technologies.
EOF: A new EVM bytecode format that is more friendly to static analysis and enables faster implementation. Considering these efficiency improvements, EOF bytecode can achieve lower gas fees.
Multi-dimensional Gas pricing: setting different basic fees and limits for computation, data, and storage, can increase the average capacity of Ethereum L1 without increasing the maximum capacity (thus avoiding new security risks).
Drop specific opcodes and precompiled gas costs - Historically, to prevent denial-of-service attacks, we have repeatedly increased the gas costs of certain operations with prices that are too low. One thing that can be done more is to drop the gas costs of opcodes with prices that are too high. For example, addition is much cheaper than multiplication, but currently the costs of the ADD and MUL opcodes are the same. We can drop the cost of ADD and even make the cost of simpler opcodes like PUSH lower. EOF is more optimized in this regard overall.
These improvements will be discussed in more detail in future Splurge articles.
Finally, the third strategy is native Rollups (or enshrined rollups): essentially, creating many parallel running EVM replicas, thus generating a model equivalent to what Rollup can offer, but more natively integrated into the protocol.
What are the links to existing research?
Polynya’s ETH L1 scalability roadmap:
Multidimensional Gas Pricing:
3.EIP-7706:
4.EOF:
5.EVM-MAX:
6.SIMD:
Native rollups: rollups
Max Resnick’s interview on the value of expanding L1 in China:
Justin Drake on using SNARK and native Rollups for scaling:
What else needs to be done and what are the trade-offs?
L1 has three expansion strategies, which can be carried out separately or in parallel:
Improve technology (e.g., client code, stateless clients, historical expiration) to make L1 easier to verify, and then increase the gas limit.
drop the cost of specific operations, increasing average capacity without increasing the worst-case risk;
Native Rollups (i.e., creating N parallel copies of the EVM).
After understanding these different technologies, we will find that each has different trade-offs. For example, native Rollups have many of the same weaknesses in terms of composability as regular Rollups: you cannot send a single transaction to carry out operations across multiple Rollups synchronously, as you can with contracts on the same L1 (or L2). Increasing the gas limit will weaken other benefits that can be achieved by simplifying L1 verification, such as increasing the proportion of users running verification nodes and increasing the number of solo stakers. Depending on the implementation, making specific operations in the Ethereum Virtual Machine (EVM) cheaper may increase the overall complexity of the EVM.
Any major question that needs to be answered by any L1 expansion roadmap is: What are the ultimate visions of L1 and L2 respectively? Obviously, it is absurd to put all content on L1: potential use cases may involve hundreds of thousands of transactions per second, which would make L1 completely unable to verify (unless we adopt the native Rollup approach). However, we do need some guiding principles to ensure that we do not fall into such a situation: Gas limit increased by 10 times, severely damaging the Decentralization of Ethereum L1.
How to interact with other parts of the roadmap?
Bringing more users into L1 not only means improving scalability, but also improving other aspects of L1. This means that more MEV will stay on L1 (rather than just being an issue for L2), so the need to address MEV explicitly becomes more urgent. This will greatly enhance the value of fast slot time on L1. At the same time, this also heavily relies on the smooth progress of L1 (the Verge) validation.
Related reading: “Vitalik’s new article: What are the areas for improvement in Ethereum’s PoS? How can it be achieved?”
Original Article Link