Two and a half years ago, in the article “Endgame”, I mentioned that the different development paths of blockchain in the future seem very similar in terms of technology. In both cases, there are a large number of transactions on-chain, which require: (1) a large amount of computation; (2) a large amount of data bandwidth. Ordinary Ethereum nodes (such as the 2 TB reth archive node running on my computer now), even with powerful software engineering performance and Verkle trees, are not sufficient to directly verify the huge amount of data and computation. Instead, in the two solutions of “L1 Sharding” and Rollup-centric, ZK-SNARK is used for computation verification, and DAS is used for data availability verification. Whether it is L2 Sharding or Rollup, DAS is the same, and the ZK-SNARKs technology is also the same. They are both smart contract code and a feature of the protocol. In the true technical sense, Ethereum is sharding, and Rollup is sharding.
This naturally leads to a question: what are the differences between the two? One difference is the consequences of code vulnerabilities: in Rollup, tokens can be stolen; in Sharding, consensus can be broken. However, I expect that with the stability of the protocols and improvements in formal verification technology, the impact of code vulnerabilities will become smaller and smaller. So, what other differences are there between these two potentially long-lasting solutions?
Longest of execution environments
In 2019, we briefly discussed the concept of an execution environment in Ethereum. Essentially, Ethereum will have different “zones” that can define different rules for accounts (including completely different methods like UTXO), the operation of the virtual machine, and other functionalities. This allows for diversity in methods across different parts of the stack, but it becomes difficult to achieve if Ethereum tries to integrate multiple functions into one.
Finally, we abandoned some more ambitious plans and only kept EVM. However, Ethereum L2 (including rollups, valdiums, and Plasmas) can be said to have ultimately served as the execution environment. Currently, we usually focus on EVM-equivalent L2, but actually overlook the diversity brought by many other methods.
Arbitrum Stylus, which adds a second WASM-based Oracle Machine in addition to the EVM;
Fuel, which uses a UTXO-based architecture similar to Bitcoin (but more feature-rich);
Aztec, it introduces a new language and programming paradigm that revolves around privacy-protected smart contract design based on ZK-SNARK.
Based on the UTXO architecture, sourced from Fuel documentation.
We can try to build the EVM into a super virtual machine that covers all possible paradigms, but doing so would greatly reduce the efficiency of each function. It is better to let these platforms do their professional work.
Trade-off between Security, Scalability, and Transaction Speed
Ethereum L1 provides very strong security guarantees. If certain data is included in the blocks ultimately finalized on L1, the entire consensus (including social consensus in extreme cases) will strive to ensure that this data cannot be modified, ensuring that any execution triggered by this data cannot be reversed and that the data remains accessible. To achieve this security guarantee, Ethereum L1 is willing to accept high costs. At the time of writing, transaction fees are relatively low: Layer2 transactions cost less than 1 cent, and even basic ETH transfers on L1 cost less than 1 dollar. If technological progress is fast enough and the growth of available block space can keep up with the growth in demand, these fees may remain low in the future, but they may not. For many non-financial applications, such as social media or games, even a transaction fee of 0.01 US dollars is too high.
But social media and gaming do not need the same security model as L1. It doesn’t matter if someone can spend a million dollars to revoke their losing record in a game or make your tweet appear as if it was posted three days later. Therefore, these applications should not pay the same security costs. L2 solutions achieve this by supporting a range of data availability methods from rollups, plasma to validiums.
Different L2 types are suitable for different use cases. Read more.
Another compromise solution arises around the issue of asset transfer from L2 to L2. I expect that in the next 5 to 10 years, all Rollups will be ZK Rollups, and ultra-efficient proof systems like Binius and Circle STARKs with lookups, along with proof aggregation layers, will make it possible for L2 to provide a final state root in every slot. But for now, we can only mix Optimistic Rollup and ZK Rollup together in a complex way and use different proof time windows. If we implement sharding in 2021, the security model to maintain shard honesty will be Optimistic Rollup, not ZK. Therefore, L1 will have to manage the complex on-chain fraud proof logic of the on-chain system, and the withdrawal time will also be as long as one week to transfer assets between shards. However, like code vulnerabilities, I believe this problem will eventually be temporary.
Transaction speed is the third aspect of the trade-off between security and durability. Ethereum produces a block every 12 seconds, and it cannot be faster, otherwise, it would be too centralized. However, many Layer 2 solutions are exploring ways to compress block time to a few hundred milliseconds. 12 seconds is not too bad: users typically have to wait about 6-7 seconds after submitting a transaction to be included in a block (not just 6 seconds, because the next block may not include them). This is comparable to the wait time when making a payment with a credit card. However, many applications require faster speeds, which Layer 2 solutions can achieve.
In order to make it faster, L2 has a preconfirmation mechanism: The validators of L2 promise to include transactions at a specific time through digital signatures. If the transaction is not included, they will be penalized. The StakeSure mechanism further promotes this mechanism.
L2 pre-confirmation
Now, we can try to implement all these features on L1. L1 can include a “fast pre-confirmation” and “slow final confirmation” system. It can include different shards with different security levels. However, this will increase the complexity of the protocol. In addition, there is a risk of overloading consensus by completing all the work on L1, as many larger-scale or higher-throughput approaches have higher centralization risks or require stronger forms of “governance”. If these stronger requirements are met on L1, the impact on other parts of the protocol will be affected. By providing a compromise through L2, Ethereum can largely avoid these risks.
The Benefits of Layer2 for Organization and Culture
Imagine a country divided into two halves: one becomes a capitalist country, the other becomes a government-dominated country (unlike what happens in reality, let’s assume in this thought experiment that this is not the result of any traumatic war, but simply the natural emergence of a border one day). In the capitalist part, restaurants are made up of different decentralized ownership, blockchain, and voting rights. In the government-dominated country, they are all branches of the government, just like police stations. On the first day, there won’t be much change. People will mostly follow existing habits, what is feasible and what is not will depend on labor skills and technological realities such as infrastructure. However, after a year, you will see significant changes because different incentives and control structures will lead to significant changes in behavior, influencing people’s choices, what is built, what is maintained, and what is abandoned.
Industrial organization theory talks about many such differences: it not only discusses the differences between government-managed economy and capitalist economy, but also the differences between an economy dominated by large franchise businesses and an economy where each supermarket is operated by independent entrepreneurs. I believe that the difference between an ecosystem centered around L1 and an ecosystem centered around L2 is also similar.
“The architecture of ‘core developer manages everything’ has a big problem.”
As a L2-centric ecosystem, I believe Ethereum’s main advantages are as follows:
Since Ethereum is an L2-centric ecosystem, you are free to independently build a sub-ecosystem with its own unique features, while also being a part of the larger Ethereum.
If you are just building an Ethereum client, you are part of a larger Ethereum ecosystem, although you have some room for innovation, it is far less than L2. And if you are building a completely independent chain, your creative space will be very large, but you will also lose the benefits of shared security and shared network effects. L2 is a good balance point.
It not only provides technical opportunities to explore new execution environments and security compromises, enabling scalability, flexibility, and speed, but also offers an incentive mechanism that motivates developers to build and maintain, as well as community support.
In fact, each L2 is isolated, which also means that deploying new methods does not require permission: you don’t need to convince all core developers that your new method is “safe” for other parts of the chain. If your L2 fails, it is your responsibility. Anyone can come up with strange ideas (such as Intmax’s Plasma method), even if Ethereum core developers have no interest, they can continue to build and eventually deploy. L1 functions and precompiles are not like this, even in Ethereum, the success or failure of L1 development often depends on politics to a degree higher than we would like. Regardless of what can be theoretically built, the different incentive mechanisms generated by the L1-centric ecosystem and the L2-centric ecosystem will ultimately have a significant impact on the content, quality, and order of the actual construction.
What challenges does the Ethereum L2-centric ecosystem face?
L1 + L2 architecture has major issues.
Image source: Reddit
This L2-centric approach faces a key challenge that the L1-centric ecosystem hardly needs to deal with: coordination. In other words, although Ethereum has many L2 solutions, the challenge is how to make it still feel like “Ethereum” and have the network effects of Ethereum, rather than N independent chains. Today, this situation is unsatisfactory in many aspects.
Cross-chain interaction between L2 usually requires centralized cross-chain bridges, which is very complicated for ordinary users. If you have tokens on Optimism, you cannot paste someone else’s Arbitrum address into your wallet to send funds.
For personal smart contract wallets and organization wallets (including DAOs), cross-chain smart contract wallet support is not very good. If you change a key on an L2, you still need to change the key on each other L2.
Decentralized verification infrastructure is often lacking. Ethereum finally has decent light clients, such as Helios. However, it doesn’t make sense if all activities happen on L2 and require their own centralized RPC. In principle, once you have Ethereum block headers, it’s not difficult to build a light client for L2; but in practice, this point is not given enough attention.
The community is working hard to improve these three aspects. For cross-chain token exchange, the ERC-7683 standard is a new solution that differs from existing “centralized cross-chain bridges” as it does not have any fixed centralized nodes, tokens, or governance. For cross-chain accounts, most wallets adopt the approach of updating the keys with cross-chain replayable messages in the short term and using keystore rollups in the long term. Light clients for L2 are starting to appear, such as Beerus for Starknet. In addition, recent improvements in user experience through next-generation wallets have addressed more fundamental issues, such as allowing users to access DApps without manually switching networks.
Rabby multi-chain asset balance comprehensive view, the previous wallet could not do this!
However, it must be recognized that the L2-centric ecosystem does face challenges in attempting to coordinate. This is because individual L2s lack natural economic incentives to build infrastructure for coordination: small-scale L2s won’t do it because they only stand to gain a small portion of the benefits, and large-scale L2s won’t do it either because they can obtain the same or even greater benefits from strengthening their own local network effects. If each L2 only considers itself without considering how to align with the broader Ethereum system, we will fail, just like the urban utopias depicted in the images above.
It is difficult to say that there is a perfect solution to solve this problem. I can only say that the ecosystem needs to have a better understanding that cross L2 infrastructure is a type of Ethereum infrastructure, just like L1 clients, development tools, and programming languages, and should therefore be given attention and funding. We have the Protocol Guild, perhaps we need a Basic Infrastructure Guild.
Summary
In various public discussions, “L2” and “Sharding” are often seen as two opposing strategies for blockchain scalability. However, when you study the underlying technology, you will discover a dilemma: the actual underlying scalability methods are exactly the same. Whether it is data sharding, fraud validators or ZK-SNARK validators, or solutions for cross-“Rollup, Sharding” communication, the main difference lies in: who is responsible for building and updating these components, and how much autonomy do they have?
A L2-centric ecosystem is essentially Sharding in the true technical sense, but within Sharding, you can build your own Shard with your own rules. This is incredibly powerful, with unlimited creativity, enabling a large amount of independent innovation. However, it also presents some key challenges, especially in terms of coordination. For L2-centric ecosystems like Ethereum to succeed, it is necessary to understand these challenges and address them head-on, in order to gain as many benefits as possible from L1-centric ecosystems and get as close as possible to the optimal state for both.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Vitalik's new work: How is L2 different from performing Sharding?
Words: Vitalik Buterin
Peng Sun, Foresight News
Two and a half years ago, in the article “Endgame”, I mentioned that the different development paths of blockchain in the future seem very similar in terms of technology. In both cases, there are a large number of transactions on-chain, which require: (1) a large amount of computation; (2) a large amount of data bandwidth. Ordinary Ethereum nodes (such as the 2 TB reth archive node running on my computer now), even with powerful software engineering performance and Verkle trees, are not sufficient to directly verify the huge amount of data and computation. Instead, in the two solutions of “L1 Sharding” and Rollup-centric, ZK-SNARK is used for computation verification, and DAS is used for data availability verification. Whether it is L2 Sharding or Rollup, DAS is the same, and the ZK-SNARKs technology is also the same. They are both smart contract code and a feature of the protocol. In the true technical sense, Ethereum is sharding, and Rollup is sharding.
This naturally leads to a question: what are the differences between the two? One difference is the consequences of code vulnerabilities: in Rollup, tokens can be stolen; in Sharding, consensus can be broken. However, I expect that with the stability of the protocols and improvements in formal verification technology, the impact of code vulnerabilities will become smaller and smaller. So, what other differences are there between these two potentially long-lasting solutions?
Longest of execution environments
In 2019, we briefly discussed the concept of an execution environment in Ethereum. Essentially, Ethereum will have different “zones” that can define different rules for accounts (including completely different methods like UTXO), the operation of the virtual machine, and other functionalities. This allows for diversity in methods across different parts of the stack, but it becomes difficult to achieve if Ethereum tries to integrate multiple functions into one.
Finally, we abandoned some more ambitious plans and only kept EVM. However, Ethereum L2 (including rollups, valdiums, and Plasmas) can be said to have ultimately served as the execution environment. Currently, we usually focus on EVM-equivalent L2, but actually overlook the diversity brought by many other methods.
Based on the UTXO architecture, sourced from Fuel documentation.
We can try to build the EVM into a super virtual machine that covers all possible paradigms, but doing so would greatly reduce the efficiency of each function. It is better to let these platforms do their professional work.
Trade-off between Security, Scalability, and Transaction Speed
Ethereum L1 provides very strong security guarantees. If certain data is included in the blocks ultimately finalized on L1, the entire consensus (including social consensus in extreme cases) will strive to ensure that this data cannot be modified, ensuring that any execution triggered by this data cannot be reversed and that the data remains accessible. To achieve this security guarantee, Ethereum L1 is willing to accept high costs. At the time of writing, transaction fees are relatively low: Layer2 transactions cost less than 1 cent, and even basic ETH transfers on L1 cost less than 1 dollar. If technological progress is fast enough and the growth of available block space can keep up with the growth in demand, these fees may remain low in the future, but they may not. For many non-financial applications, such as social media or games, even a transaction fee of 0.01 US dollars is too high.
But social media and gaming do not need the same security model as L1. It doesn’t matter if someone can spend a million dollars to revoke their losing record in a game or make your tweet appear as if it was posted three days later. Therefore, these applications should not pay the same security costs. L2 solutions achieve this by supporting a range of data availability methods from rollups, plasma to validiums.
Different L2 types are suitable for different use cases. Read more.
Another compromise solution arises around the issue of asset transfer from L2 to L2. I expect that in the next 5 to 10 years, all Rollups will be ZK Rollups, and ultra-efficient proof systems like Binius and Circle STARKs with lookups, along with proof aggregation layers, will make it possible for L2 to provide a final state root in every slot. But for now, we can only mix Optimistic Rollup and ZK Rollup together in a complex way and use different proof time windows. If we implement sharding in 2021, the security model to maintain shard honesty will be Optimistic Rollup, not ZK. Therefore, L1 will have to manage the complex on-chain fraud proof logic of the on-chain system, and the withdrawal time will also be as long as one week to transfer assets between shards. However, like code vulnerabilities, I believe this problem will eventually be temporary.
Transaction speed is the third aspect of the trade-off between security and durability. Ethereum produces a block every 12 seconds, and it cannot be faster, otherwise, it would be too centralized. However, many Layer 2 solutions are exploring ways to compress block time to a few hundred milliseconds. 12 seconds is not too bad: users typically have to wait about 6-7 seconds after submitting a transaction to be included in a block (not just 6 seconds, because the next block may not include them). This is comparable to the wait time when making a payment with a credit card. However, many applications require faster speeds, which Layer 2 solutions can achieve.
In order to make it faster, L2 has a preconfirmation mechanism: The validators of L2 promise to include transactions at a specific time through digital signatures. If the transaction is not included, they will be penalized. The StakeSure mechanism further promotes this mechanism.
L2 pre-confirmation
Now, we can try to implement all these features on L1. L1 can include a “fast pre-confirmation” and “slow final confirmation” system. It can include different shards with different security levels. However, this will increase the complexity of the protocol. In addition, there is a risk of overloading consensus by completing all the work on L1, as many larger-scale or higher-throughput approaches have higher centralization risks or require stronger forms of “governance”. If these stronger requirements are met on L1, the impact on other parts of the protocol will be affected. By providing a compromise through L2, Ethereum can largely avoid these risks.
The Benefits of Layer2 for Organization and Culture
Imagine a country divided into two halves: one becomes a capitalist country, the other becomes a government-dominated country (unlike what happens in reality, let’s assume in this thought experiment that this is not the result of any traumatic war, but simply the natural emergence of a border one day). In the capitalist part, restaurants are made up of different decentralized ownership, blockchain, and voting rights. In the government-dominated country, they are all branches of the government, just like police stations. On the first day, there won’t be much change. People will mostly follow existing habits, what is feasible and what is not will depend on labor skills and technological realities such as infrastructure. However, after a year, you will see significant changes because different incentives and control structures will lead to significant changes in behavior, influencing people’s choices, what is built, what is maintained, and what is abandoned.
Industrial organization theory talks about many such differences: it not only discusses the differences between government-managed economy and capitalist economy, but also the differences between an economy dominated by large franchise businesses and an economy where each supermarket is operated by independent entrepreneurs. I believe that the difference between an ecosystem centered around L1 and an ecosystem centered around L2 is also similar.
“The architecture of ‘core developer manages everything’ has a big problem.”
As a L2-centric ecosystem, I believe Ethereum’s main advantages are as follows:
If you are just building an Ethereum client, you are part of a larger Ethereum ecosystem, although you have some room for innovation, it is far less than L2. And if you are building a completely independent chain, your creative space will be very large, but you will also lose the benefits of shared security and shared network effects. L2 is a good balance point.
It not only provides technical opportunities to explore new execution environments and security compromises, enabling scalability, flexibility, and speed, but also offers an incentive mechanism that motivates developers to build and maintain, as well as community support.
In fact, each L2 is isolated, which also means that deploying new methods does not require permission: you don’t need to convince all core developers that your new method is “safe” for other parts of the chain. If your L2 fails, it is your responsibility. Anyone can come up with strange ideas (such as Intmax’s Plasma method), even if Ethereum core developers have no interest, they can continue to build and eventually deploy. L1 functions and precompiles are not like this, even in Ethereum, the success or failure of L1 development often depends on politics to a degree higher than we would like. Regardless of what can be theoretically built, the different incentive mechanisms generated by the L1-centric ecosystem and the L2-centric ecosystem will ultimately have a significant impact on the content, quality, and order of the actual construction.
What challenges does the Ethereum L2-centric ecosystem face?
L1 + L2 architecture has major issues.
Image source: Reddit
This L2-centric approach faces a key challenge that the L1-centric ecosystem hardly needs to deal with: coordination. In other words, although Ethereum has many L2 solutions, the challenge is how to make it still feel like “Ethereum” and have the network effects of Ethereum, rather than N independent chains. Today, this situation is unsatisfactory in many aspects.
The community is working hard to improve these three aspects. For cross-chain token exchange, the ERC-7683 standard is a new solution that differs from existing “centralized cross-chain bridges” as it does not have any fixed centralized nodes, tokens, or governance. For cross-chain accounts, most wallets adopt the approach of updating the keys with cross-chain replayable messages in the short term and using keystore rollups in the long term. Light clients for L2 are starting to appear, such as Beerus for Starknet. In addition, recent improvements in user experience through next-generation wallets have addressed more fundamental issues, such as allowing users to access DApps without manually switching networks.
Rabby multi-chain asset balance comprehensive view, the previous wallet could not do this!
However, it must be recognized that the L2-centric ecosystem does face challenges in attempting to coordinate. This is because individual L2s lack natural economic incentives to build infrastructure for coordination: small-scale L2s won’t do it because they only stand to gain a small portion of the benefits, and large-scale L2s won’t do it either because they can obtain the same or even greater benefits from strengthening their own local network effects. If each L2 only considers itself without considering how to align with the broader Ethereum system, we will fail, just like the urban utopias depicted in the images above.
It is difficult to say that there is a perfect solution to solve this problem. I can only say that the ecosystem needs to have a better understanding that cross L2 infrastructure is a type of Ethereum infrastructure, just like L1 clients, development tools, and programming languages, and should therefore be given attention and funding. We have the Protocol Guild, perhaps we need a Basic Infrastructure Guild.
Summary
In various public discussions, “L2” and “Sharding” are often seen as two opposing strategies for blockchain scalability. However, when you study the underlying technology, you will discover a dilemma: the actual underlying scalability methods are exactly the same. Whether it is data sharding, fraud validators or ZK-SNARK validators, or solutions for cross-“Rollup, Sharding” communication, the main difference lies in: who is responsible for building and updating these components, and how much autonomy do they have?
A L2-centric ecosystem is essentially Sharding in the true technical sense, but within Sharding, you can build your own Shard with your own rules. This is incredibly powerful, with unlimited creativity, enabling a large amount of independent innovation. However, it also presents some key challenges, especially in terms of coordination. For L2-centric ecosystems like Ethereum to succeed, it is necessary to understand these challenges and address them head-on, in order to gain as many benefits as possible from L1-centric ecosystems and get as close as possible to the optimal state for both.