Unblocking and unloading: The speed secrets behind Hedera Smart Contracts 2.0
Mar 10, 2022
by Hedera Team
Hedera is the most used, sustainable, enterprise-grade public network for the decentralized economy.

Hedera launched Smart Contracts 2.0 on the mainnet on February 7th, 2022. Smart Contracts offer Turing-complete layer 1 programmability to applications built on Hedera using Solidity. Developers building on Hedera can expect efficiencies and performance, low costs, and scalability with Smart Contracts 2.0 across DeFi applications, DAOs, oracles, network bridges and many other types of decentralized applications.

The Smart Contracts 2.0 service utilizes the Hyperledger Besu EVM. As part of the preparation a number of performance optimizations were contributed by Hedera to the Hyperleger codebase. Hedera’s implementation is completely open-source but developers of the service have received questions around the specific optimizations implemented and we wanted to summarize what’s been improved.

In order to increase scalability and throughput, while optimizing the Besu EVM for hashgraph consensus, there were two primary focuses for developers who helped contribute: Keeping busy and eliminating busywork.

What does that mean exactly? Let’s find out.

P.S. We’re always looking for core contributors to help solve challenging problems like the ones described below — check out the open roles here: https://hedera.com/careers

Leaderless Consensus: Keeping Busy

Leaderless Consensus, which is an aspect of the hashgraph algorithm Hedera runs on top of, is the main solution to the “keeping busy” problem. Hashgraph belongs to the category of “Asynchronous Byzantine Fault Tolerant” Consensus Algorithms. This provides several changes to the consensus process that provide opportunities for increased performance.

Reduced Block Gaps

There is no gap between transactions reaching final ordering in the ledger and when any other actions may occur. This contrasts with the two different models employed by most proof-of-work (PoW) and non-aBFT proof-of-stake (PoS) chains.

For PoW, there is an indeterminate amount of time between blocks while an acceptable nonce is found by the networks. In most PoS chains, there is a timed schedule by which blocks are emitted. With the hashgraph consensus algorithm, no such waiting occurs as nodes can proceed with transaction processing without waiting for synchronous acknowledgment. There is no processing time wasted waiting for the network to come to consensus.

No Propose Propagate Validate loop

Another side effect of the hashgraph consensus algorithm is that there is no block producer, and hence no process of assembling the block, propagating the block, and then validating a block processed by another node. Transactions will be executed as they are ordered rather than waiting on a block producer to add them and then the rest of the network to validate the transaction. While each node would still evaluate the transaction once the need to have a transaction first processed by a leader then the followers are eliminated. Rather than two distinct and disjoint periods of time when a transaction is processed, there is only one.

No memory pool

The last and most important aspect of the hashgraph consensus algorithm is that there is no memory pool of transactions. A transaction is accepted into the network the moment any node gossips it to other nodes. This provides an important speed-up for client applications looking for prompt transaction execution. Not needing to manage a secondary list of transactions and needing to double-check against it for block production consumes a lot of CPU time and networking bandwidth that could instead be reallocated to transaction processing.

Code-Level Optimizations: Eliminating Busywork

While the higher-level architecture of the network provides a solid framework for performance improvements, there is no substitute for raw speed. Part of these improvements comes from the origin of the Hyperleger Besu EVM. Prior to becoming part of Hyperledger the EVM was part of a Consensys project named Patheon. This project created, from scratch, a fully functional Ethereum client in about a year’s time. One of the techniques used was to map the written code as closely to written specifications as possible to ensure proper conformance and to make fixing compatibility bugs easier for engineers who were not involved in the initial development. While this made the code easy to reason about and easy to read it had a net negative impact on the performance. A lot of duplicate calculations and unneeded processing resulted.

Collapsing Steps

In a number of the hot spots of the evaluation loop steps that could be combined or inlined were separated out into different functions. While this increased the readability of the code it often resulted in excess stack allocations and unneeded function calls that could not always be inlined by the JIT compiler.

One specific example was incrementing the program counter. This may sound simple but it is not, as one class of EVM instructions (PUSH) results in the program counter advancing more than one byte (there are also interactions with JUMP operations outside the scope of this blog post).

[ code sample - https://github.com ]

Manually inlining this function allowed for code refactoring that eliminated several stack allocations and related mega-morphic method calls. In less technical terms, this resulted in a significant speed-up in the evaluation loop.

Reduce Wrapping

Another significant performance issue was related to the copying of internal memory. The effort invested into reducing the unneeded copying of bytes has a positive effect on both the CPU and memory caches close to the CPU.

[ code sample - https://github.com ]

In this example, rather than taking input bytes from an immutable source, slicing out the data, copying the data into an intermediate object, and then copying that intermediate object into a UInt256 type, we only take the initial slice and add that to the stack. Enabling this change was moving the stack from a specialized type of UInt256 to a more generic Bytes type. Operations that need the UInt256 object are converted at the last possible moment, saving a lot of processing time when no math is required.

Duplicate Code Strategically

A third technique used to speed up the code was the strategic use of code duplication. “Don’t Repeat Yourself” is a common patterns for software design. This pattern has maintenance as the driving factor, and sometimes performance requires a less extreme adherence to that principle (see also AHA). Take for example the EXTCODEHASH and EXTCODESIZE operations. They perform exactly the same work in retrieving a contract’s work but only differ in the data they report.

[ code sample - https://sonarcloud.io ]

In this code sample from ExtCodeSizeOperation the grey sidebar represents lines that are duplicated in ExtCodeHashOperation. But the "account == null ? UInt256.ZERO : UInt256.valueOf(account.getCode().size()));" line is the most important part of the operation. There is a fairly trivial way to abstract this out, but it introduces a third class and a multi-morphic method call at runtime. The resulting overhead is almost as much as getting the code for a warmed-up account. Because this is a regularly called opcode in solidity the performance impact cannot be ignored.

Conclusion

While there are lots of aspects going into the increased gas capacity for Smart Contracts 2.0 the consensus algorithm and lower-level code optimizations represent two of the more important areas that enable it.

The third key change, the purpose-built database JasperDB that’s been open sourced under Apache 2.0 license, deserves a special deep dive of its own in the future. There are also more optimizations that will be explored in the future, such as compiling frequently used contracts into Java bytecode.