What is Byzantine fault tolerance?
Let’s begin by understanding what being Byzantine fault tolerant actually means. The term Byzantine fault tolerance is derived from a hypothetical scenario called the “Byzantine General’s Problem”.
This hypothetical scenario was developed to describe a situation in which, in order to avoid failure of a distributed system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable. It uses a planned attack on an enemy city by four Byzantine generals. These generals and their armies are each occupying a different side of the city and so are totally separated from each other, making direct, coordinated communication impossible. In order to plan their attack – or retreat – they must use their own messengers to handle communication and so they each send their messengers to share their plan with each other.
But there are serious flaws in this approach — there are questions about the trustworthiness of the messages shared, such as: what if any of the messengers are captured and replaced with enemy messengers, who share intentionally false information? What if one (or even two) of the generals themselves is actually a traitor and sends a messenger with information meant to misinform? What if one of the messengers themselves are traitors and intentionally shares false information?
You quickly begin to see that there is a trust issue here that it is impossible to ignore the way these Byzantine generals are trying to come to a consensus (or total agreement) on whether to attack or retreat from the city they are surrounding. And by having a solution to this problem of trust, known as byzantine fault tolerance, you can trust that the network is going to act fairly.
How does Byzantine fault tolerance (BFT) apply to decentralized networks?
All decentralized networks (distributed ledger, such as a blockchain) have independent nodes that act like the byzantine generals in the hypothetical situation above.
These nodes must come to an agreement, or consensus, on things like: transactions submitted to the network, the ordering of those transactions, and the state of the network itself. These network nodes are likely separated from each other geographically, virtually, architecturally, and sometimes even ethically.
So not only do nodes need to be able to communicate, in order to achieve their end goal of consensus, but they also need to account for some nodes being malicious; trust is a key issue here, as some nodes may be acting dishonestly on the network: They may be taken over by hackers or be intentionally sending incorrect information. Communication between what can sometimes be thousands of nodes, makes coming to a consensus, or agreement, on a decision very challenging.
In a decentralized network, there is massive value in knowing that the timing and order of transactions taking place on the network have been reached by consensus, and that every transaction is recorded, verified, and shared by participating nodes – even if some of those nodes are not trustworthy, or may even be trying to negatively affect consensus. In a decentralized network, this entire process has the additional benefit of having a mathematical guarantee of consensus, or that the same decision is reached by honestly participating nodes. This ability for the nodes in a decentralized network to come to the correct agreement on transactions without needing to trust each other – at scale – is what truly sets distributed ledger technology apart.
What’s “asynchronous” Byzantine fault tolerance (ABFT)?
When a decentralized network is Byzantine fault tolerant, it means that the honest members, or nodes, of a network can be guaranteed to agree on the timing and order (consensus) of a set of transactions. Regardless as to whether there are some nodes maliciously trying to prevent that consensus — even if as many as 1/3 of nodes are trying to negatively affect consensus by delaying transactions or otherwise corrupting things. This is the ‘fault tolerance’ of the network, meaning how many nodes can the network tolerate acting maliciously, but still come to an honest consensus.
The ‘asynchronous’ property of Byzantine fault tolerance overcomes a challenge of fault tolerance, which is that of timing. Many forms of Byzantine fault tolerance assume there is a maximum threshold of message latency when coming to a consensus. An asynchronous byzantine fault tolerant (ABFT) network eliminates this assumption and allows for some messages to be lost or indefinitely delayed.
An ABFT network allows for messages to be lost or indefinitely delayed and assumes only that at some point an honest node’s messages will eventually get through. It is much more challenging for an honest node to assess whether another node is not following the rules, if that node’s messages can be indeterminately delayed, but this scenario much better reflects that network reliability in the real world.