In a Distributed Ledger Technology (DLT), the consensus algorithm is the method by which the nodes (computers) are able to come to an agreement (consensus) on the order of a set of transactions, and potentially other information such as a timestamp for each transaction. For some consensus algorithms, it is possible to prove that they are Asynchronous Byzantine Fault Tolerant (ABFT).
For example, the hashgraph consensus algorithm has a proof that it is ABFT. But what does that mean, and what are the practical implications? This blog will explore what an ABFT proof can guarantee about the correctness, finality, and liveness of a consensus algorithm.
Correctness refers to the ability of a network of nodes to prevent forks developing, where different nodes disagree on the value for which consensus is sought. Correctness is sometimes referred to as safety.
Finality refers to a model of consensus in which, once a node determines a value for which consensus is sought, there is no chance the node will subsequently reevaluate that decision.
Liveness refers to the ability of a network of nodes running a consensus algorithm being able to continue towards establishing consensus for transactions, that is the algorithm’s ability to proceed.
Correctness and liveness are not separable – it is trivial to guarantee correctness if liveness is not a requirement, and trivial to progress towards consensus if correctness is sacrificed.
A consensus algorithm will generally make some sort of guarantees over correctness, finality, and safety if certain conditions are true. Generally, the stronger the guarantees, and the fewer and less restrictive the conditions, the more secure the algorithm. A consensus algorithm may have a mathematical proof that makes those conditions clear, and demonstrates how the guarantees emerge.
If a consensus algorithm has an ABFT proof, then there will exist a mathematical proof that if certain conditions are true, then certain results for correctness, finality, and liveness are guaranteed.
The following is one way of writing the conditions and results for a typical ABFT proof:
ABFT Conditions
ABFT Results
There are other forms of ABFT conditions, such as using stake, and requiring that more than 2/3 of the stake be owned by honest nodes. But for simplicity, we will only consider this form of ABFT here.
Condition 1 requires that more than 2/3 of the nodes are non-Byzantine, which means they follow the protocol and do not act maliciously. It also requires that they are non-faulty, which means a node can crash and be offline for a while, but it must eventually come back online and sync with the other nodes. An honest node can’t crash and stay offline forever.
Condition 2 requires that any two honest nodes will eventually sync. And then they will eventually sync again. And so on forever.
Conditions 3 and 4 simply say that an attacker can be powerful enough to completely control nodes and the internet itself, as long as the first two Conditions still hold.
Proofs of this sort also assume that secure cryptography exists. In other words, the attacker can’t break the hash algorithm or the digital signature algorithm.
Result 1 is a guarantee of finality - there is a moment in time when each node knows the consensus value with certainty, and will not thereafter reconsider that value. It’s important to note that Result 1 does not stipulate a particular time frame for honest nodes to determine a consensus value – it says only that honest nodes will eventually do so.
Result 2 is a guarantee of correctness – every node will reach the same conclusion.
The guarantees of results 1 and 2 only apply if the conditions are met. Of course, if all of the nodes are malicious, then they can do anything. But that would violate Condition 1, so the results would no longer be guaranteed. Similarly, if all nodes are unable to communicate with each other, then they won’t be able to come to consensus on anything. But that would violate Condition 2. So, as long as the Conditions are met, the Results are guaranteed, which give strong results for correctness and finality.
An ABFT proof makes different guarantees about liveness than the guarantees it makes for correctness or finality.
There are some broad guarantees that it makes about liveness, and other guarantees that it does not make about liveness. It is useful to go through a number of classes of liveness attacks, and see which are ruled out and which aren’t.
A liveness attack is an attempt to prevent honest nodes from reaching consensus.
In general, a liveness attack will consist of one (or a combination) of the following mechanisms:
One way to reason about the liveness guarantees an ABFT proof will or will not make is to consider an attacker making a claim about their ability to prevent liveness and to determine if that claim is valid or not.
We will see that an ABFT proof will allow us to refute some claims about liveness attacks, but not others.
Each claimed attack will be labelled as either FALSE or TRUE.
Attack A – Byzantine nodes – FALSE CLAIM
The attacker makes the following claim:
I control malicious nodes that are less than 1/3 of all the nodes, and can direct those nodes to prevent the other honest nodes from progressing towards consensus, either by having my nodes refusing to communicate or otherwise breaking the rules of the protocol. I can freeze the network for as long as the attack continues.
If this claim were true and the attack were possible, it would mean that an attacker could freeze the network’s progress towards consensus for as long as the attack continues, so the network would not be live during that time. And the attack could continue forever, and so freeze it forever, and prevent it from ever coming to consensus.
But the attack is impossible if there is an ABFT proof. Because the ABFT proof says that it will come to consensus eventually. The details of the attack don’t matter. It doesn’t matter how the malicious nodes are violating the protocol. No matter how clever they are, the attack will fail and so the claim is false. The ABFT proof guarantees that.
The attack doesn’t stop the honest nodes from syncing so Condition 2 is true.
Additionally, because less than 1/3 nodes are Byzantine, Condition 1 is true as well.
The ABFT proof then guarantees Result 1, which guarantees that all honest nodes will reach consensus on every transaction.
But the attacker was claiming that they could prevent the above, that is that, if the attack continued forever, the network would be frozen for all time.
Consequently the claim is refuted by the ABFT proof.
Attack B – DDoS against single node - FALSE CLAIM
I can launch a DDoS attack to shut down a single honest node at a time. While a node is being DDoSed, no other nodes can sync with it. I also control malicious nodes (less than 1/3), which know what is going on, and can help me direct the attack to repeatedly change which node is being DDoSed. I can freeze the network for as long as the attack continues.
This is a Distributed Denial of Service (DDoS) attack. The attacker has compromised many computers on the internet, and can use them to flood a single computer with so many packets that it shuts down for as long as the attack continues. It doesn’t matter how clever the attacker is when choosing the DDoS target. It doesn’t matter that the attacker has a malicious node as a spy to help choose that target.
The attack is still impossible. As long as every pair of honest nodes eventually syncs, and more than 2/3 of the nodes are honest nodes, then it isn’t possible for an attacker to freeze the network for as long as the attack continues. Because if it were continued forever, all four Conditions would be true, but the Results would not be true. That’s ruled out by the ABFT proof. So this form of liveness attack is declared impossible by the ABFT proof and the claim must be false.
This is particularly interesting, because the ABFT proof guarantees resilience to a Follow The Leader attack in which the attacker performs the above DDoS attack against different nodes in sequence.
Some consensus algorithms have a ‘leader’, which is one node that is treated differently with respect to contribution to consensus from the other nodes – typically for some period of time. The protocol might have the nodes take turns being leader. Or it might allow one node be the leader until it crashes, then another node is elected to become the new leader. It is possible that some protocols using leaders would be vulnerable to a liveness attack where the leader is DDoSed, and as soon as a new node becomes the leader, the DDoS attack switches to attacking the new leader. That is a Follow The Leader attack. For leader-based protocols in which the leader can be predicted, a Follow The Leader attack could freeze the entire network for as long as the attack continues, while only DDoSing a single computer at a time, and with only a single malicious node acting as a spy.
It is even possible that a protocol vulnerable to such a Follow the Leader attack could have a mathematical proof that it is BFT. But it can’t have a proof that it is ABFT because an ABFT proof would guarantee that it is safe from that attack. This is one important way in which ABFT is stronger than just BFT. A protocol with a BFT proof has guaranteed correctness, but if the proof can be upgraded to ABFT, then the correctness guarantee can be expanded to include some liveness guarantees, as well.
Attack C – Dynamic Partitioning - FALSE CLAIM
I can partition the network, where some nodes cannot sync with other nodes. Unfortunately, I am forced to constantly change the partitioning, so that any two honest nodes are occasionally in the same partition and sync with each other. I have a malicious node to act as a spy and tell me what is going on. I can freeze the network as long as the attack continues.
Again, this attack will fail if there is an ABFT proof.
At a given moment in time, the partition will divide the nodes – potentially into two equal halves such that neither half has the requisite 2/3 honest nodes – even though there are that many honest nodes as a whole.
But, once the partition changes, then the mix of nodes between the two halves will also change. Two nodes that were previously unable to sync will now be able to do so.
Any two honest nodes will eventually be able to sync so Condition 2 is satisfied.
If the attack were continued forever, then all of the Conditions would be satisfied, and so the ABFT proof guarantees Result 1 – which refutes the claim.
Attack D – Static partitioning - TRUE CLAIM
I can partition the network such that 1/3 of the nodes cannot sync with the other 2/3. I can freeze the network as long as the attack continues.
This is an attack on liveness that will succeed, even if there is an ABFT proof.
If the partition does not change, then there always be some honest nodes that are unable to sync with some other nodes during the partition. If the attack were to continue forever, then Condition 2, that any two honest nodes would eventually be able to sync, would be violated.
Consequently, the ABFT Proof cannot guarantee Result 1 and liveness can be compromised by this attack.
In fact, there is a simple math proof that shows no consensus algorithm can be secure in the case where almost 1/3 of the nodes are malicious and an attacker can partition the network. In such a case, you either have to sacrifice correctness or liveness. You can’t have both. In an ABFT algorithm, correctness will be chosen and liveness sacrificed. The network will be frozen until the partition is healed, at which point it will start reaching consensus on new transactions again.
There are many similar attacks that will also be successful on liveness. Such as:
Attack E – Broad DDoS - TRUE CLAIM
I can DoS 1/3 of the nodes and keep then offline for the duration of the attack. I can freeze the network as long as the attack continues.
This attack on liveness also succeeds, for the same reason as above.
Attack F – Malware - TRUE CLAIM
I know of bugs in Linux, Windows, and MacOS that allow me to shut down that computer, and every node runs on one of those operating systems, so I can shut down all the nodes. I can freeze the network as long as the attack continues.
Again, this succeeds for the same reason as above, even if there is an ABFT proof. And it might be better to say it this way:
This liveness Attack F succeeds because it isn’t an attack on the consensus algorithm, and the ABFT proof was only about the resilience of the consensus algorithm.
Attack G – Corrupted node code - TRUE CLAIM
The attacker makes the following claim
I know of a bug in the code the nodes run, and control a malicious node that can send a special message to any honest node that triggers the bug and crashes the machine. And I can continue to exploit more such bugs as the software is patched. I can freeze the network as long as the attack continues.
Of course, this also succeeds – AFBT proof notwithstanding - because it has nothing to do with the consensus algorithm.
The best solution in the long run is to mathematically prove that the code is correct, with a computer checking the math proof. This is called formal methods. The first step is to formally define the algorithm itself and have a computer check that it is truly ABFT. This was done for hashgraph in Coq. The next step is to expand it so that the computer checks a proof that the implementation code matches the definition. Or have a computer write the implementation code directly from the formal definition (proof by construction). Either way, that is how formal methods can ensure correct software. This can be extended to proving the compiler is correct, the operating system is correct, and the CPU is correct. Research on formal methods is ongoing at all these layers of the stack. But it is not yet in widespread use.
Attack H – Firewall - TRUE CLAIM
I control the internet and can delete every other bit passing through each router on the internet. I can freeze the network as long as the attack continues.
In this case, each pair of honest nodes still manages to send an infinite number of bits to each other. But if they are using TCP/IP, it can’t handle the loss of 50% of the bits in every packet, and so they can never sync. Again, this attack on liveness succeeds, despite the ABFT proof, because Condition 2 fails.
Liveness could be regained by switching to an internet protocol that uses error correcting codes capable of handling the 50% bit loss. But again, it is best described as an attack on something other than the consensus algorithm itself, so the ABFT proof provides no guarantees on liveness here.
For a consensus algorithm, an ABFT proof makes a number of strong guarantees about correctness, finality, and liveness. Each guarantee is limited in some ways, but still broad in other ways.
An ABFT proof makes some broad guarantees about liveness – it rules out a number of liveness attacks that could succeed against some protocols that are only BFT (not ABFT). But there are limits on how much ABFT can say about liveness: it doesn't help if 1/3 of the nodes are DDoSed or partitioned off. And it doesn’t help with attacks that prevent all the honest nodes from syncing, or that somehow shut down all those nodes. But for attacks on the consensus algorithm itself, the ABFT proof provides useful guarantees related to liveness.