Hedera Technical Insights: State Proofs on Hedera
1516208829798
Jul 23, 2019
by Paul Madsen
Head of Identity, The HBAR Foundation

Introduction

A distributed ledger has multiple nodes maintain the state created by a set of business transactions. Clients send transactions to nodes for consensus and subsequent application to state.

Clients can then query the state by asking a particular node for some aspect of that state. For instance, clients could ask for the contents of a particular file, the balance of an account, or the state of a smart contract.

We have to recognize that a node, particularly if anonymous, might lie when responding. Or, even if the node is honest and does not lie, the client might do so when they present that data to others. How can we know the data we receive is valid?

Hedera’s state proofs will give participants that confidence. In a previous post, we covered a variety of confirmation mechanisms from which a client can choose when querying for the status of a transaction or some aspect of state – varying in both the level of detail returned and the reliability of the response. State proofs provide the highest level of confidence.

A state proof is a cryptographically secure and portable data structure that is optionally returned as part of a query response. A state proof proves that the returned data is valid, that is, is part of the state of Hedera. Simply put, it allows any actor presented with the data and the corresponding state proof to be able to verify the data.

Following Open Access of the Hedera mainnet, we will look to make state proofs available to everyone as a stronger confirmation option.

State proofs by example

Alice is applying for a loan from Bob and must demonstrate to him she has sufficient collateral. To do so Alice will present to Bob her hbar balance as returned to her from a Hedera node (or mirror node).

Alice can request her balance from a Hedera node and then pass that balance on to Bob. But the node might have lied about the balance. Or Alice might have edited the value to exaggerate her wealth to Bob. Having the node digitally sign the balance would prevent the latter attack but not the former. And for all Bob knows, the node might be colluding with Alice, or even be run by Alice herself.

In order to convince Bob of the veracity of the balance, she will also present to Bob a state proof for her hbar balance – the state proof returned along with her balance from the node, and then both presented together to Bob.

The flow is shown below:

SP1.jpg#asset:1081679

Note: not shown is the fee that Alice pays the node to acknowledge the costs in processing the response, that is, looking up the balance, building the state proof, and consuming bandwidth. In general, Alice will pay a fee for every query to a node, asking for a state proof increases that fee.

Importantly, in the above:

  • The node can be either on the mainnet or mirror net
  • Creation of the state proof requires no interaction with the rest of the network – the responding node already has the necessary components

Before we discuss how Bob uses the state proof to validate Alice’s hbar balance, we need to cover what the state proof includes.

State proof components

A state proof for Alice’s balance includes the following information:

  • A timestamp
  • A Merkle path to Alice’s balance
  • A set of node signatures of the root hash of the above Merkle path
  • The current “address book” of the network, which is a list of the current
  • public key and stake of each mainnet node
  • The address book history

A timestamp indicates the time for which the state proof is valid. Periodically a node writes to memory a snapshot (a Merkle tree) of the current state. The timestamp of the state proof will be the timestamp of the last transaction that contributed to that state.

A Merkle tree is an efficient and scalable way to demonstrate that a given piece of data is contained within some larger set.

Suppose X is a large set of data. The simplest mechanism for proving that A is within X would be to provide X, and allow the party performing the validation to find A within X. If X however is a large dataset, this would be very inefficient – most of the data being shared would be irrelevant to the fundamental validation.

With a Merkle tree, rather than directly demonstrating that A is within X, we demonstrate that the hash of A is logically ‘contained’ within the ‘hash tree of X’. A Merkle tree is created by hashing the elements of X, combining those hashes into pairs, and hashing those pairs. The process is repeated until a final hash is calculated – this the Merkle root or root hash. If any of the data elements within X were different, then the root hash would change as well – the root hash is consequently in a sense a fingerprint of all the data elements of X.

If you can demonstrate that a given data element A does not result in a different root hash for X, then you have proved that A must have been part of the calculation of the root hash for X or, in other words, that A is part of X.

The magic of Merkle trees is that if the other party knows and trusts the Merkle root, then to demonstrate that A is within X, you need only communicate to the party you are trying to convince of that fact minimal information:

  • The data A
  • The hashes in the sibling nodes of the ancestors of A. This is the Merkle Path.

For large data sets, the above set of information can be many times smaller than the actual data set X.

For our scenario, to prove that Alice’s balance is valid the state proof will contain:

  • Alice’s balance
  • The Merkle path for the balance
  • Signatures of the Merkle root from most of the nodes

The Merkle path for Alice’s balance is represented by the following diagram. The balances of each account are shown at the bottom of the diagram. The Merkle path of Alice’s balance is shown in green. The validator will use the (green) provided hashes of the Merkle path to calculate the (yellow) hashes all the way up to the Merkle root.

SP2.jpg#asset:1081680

The Merkle root itself is not provided as part of the state proof. Instead, the nodes’ signatures of the Merkle root are included to give to the validator the necessary trust.

The node signatures of the Merkle root are collected as part of routine processing. At the end of each round, each node calculates the shared state after processing all transactions that were received in that round and before. Each node then creates the Merkle tree of that state, digitally signs the Merkle root, puts that signature in a transaction, and gossips it out to the community. Each node collects the equivalent signatures from all other nodes.

This process is represented by the following graphic:

SP3.jpg#asset:1081681

Transactions cause the state to transition. Snapshots of the state are calculated periodically, and signatures of the Merkle root are collected by all nodes.

The signed Merkle root can be used in a state proof for any aspect of that current state – it need not be created on demand. The only component of the state proof that is constructed in real time is the Merkle path to the data in question.

To verify that Alice’s balance is within the Merkle tree with a given set of signatures for the root hash, the verifier performs the following check:

1. Calculate the hash of the provided balance
2. Use the provided sibling hashes (green boxes) to calculate a root hash (yellow)
3. Verify the signatures for the calculated root hash
4. For the valid signatures, count the stake of the signing nodes and determine if it exceeds 2/3 of total stake

If #4 is true, then the verifier knows that Alice’s balance was indeed within the signed data set and that the signatures represent the consensus of the network as a whole and not just a particular node.

It is from the address book that nodes can determine the stake of other nodes, and so be able to perform the calculation of #4.

The address book is a list of public keys and stakes of all nodes in the network at the moment of the creation of the snapshot. The address book is maintained as part of the state like any other aspect. Transactions that change it (adding or deleting a node, updating stake balances, etc) are gossiped, reach consensus, and are then applied. Consequently, all nodes have a consistent view of the address book at a given moment in time.

The state proof also includes an “address book history”. This is a sequence of address books, where each address book is signed by members from the previous address book. Any given address book must be signed by a set of members that own more than 2/3 of the stake, according to the membership and stake from the previous address book.

This chain of address books extends back to the genesis address book, which is the initial members who created the ledger at the beginning.

This chain of address books does not need to include every address book in history. If address book X is changed to give Y, and later Y is changed to give Z, then the history might include all 3. But if X and Y are similar enough, then the list of signatures for Z might still have more than 2/3 of the keys in X represented, so there is no need to keep Y. The history can jump from X to Z. So we can delete Y whenever it is sufficiently similar to X.

The hash of the genesis address book is important. It serves as a unique identifier of the ledger. This Ledger ID is effectively the “name” of the ledger.

No hacker would be able to fool Bob with a state proof for any corrupted state, as they would be unable to recreate the signatures and address book history that matches the known and trusted Ledger ID – this to be published by Hedera in prominent and known channels.

State proof assembly

As represented above, all nodes collect most of the components required for a state proof as part of normal operations. It is only when a state proof is requested by a client that those components are assembled to create the state proof for a particular part of the state.

Critically, the assembly operation is purely local – the node creating the state proof already has all the necessary components – it need not interact with the rest of the network to create the state proof. It already has the necessary signatures, collected as part of normal gossiping as described above.

Additionally, no special trust need be placed in the node assembling a state proof. Consequently, mirror nodes can create state proofs in addition to mainnet nodes. Mirror nodes receive all the necessary components to create state proofs from the mainnet.

When Alice requested a state proof for her balance, the node she sent the CryptoGetAccountBalance API call will assemble the state proof and send the state proof in its response to the query – along with Alice’s balance.

State proof validation

Alice hands to Bob her balance and the corresponding state proof returned to her from the node.

The validation process that Bob will perform consists of the following checks:

  • Is Alice actually the owner of the account in question? (Alice could have asked for the balance of anybody’s account)
  • Is the timestamp of the state proof appropriate? (Alice could have a state proof from year past when she had a much large account balance)
  • Is Alice’s purported balance contained within the purported current state?
    • Calculate Merkle root from balance and Merkle path
  • Is the purported state endorsed by a super majority of the nodes of the current network?
    • Retrieve current address book, use public keys to validate node signatures
  • Is the current set of nodes consistent with the history of the Hedera network
    • Retrieve address book history and confirm signature chain
  • Check that the Ledger ID is the true Ledger ID of Hedera (this retrieved via some other trusted channel)

If the above checks are true, then Bob can be confident that the balance reported for Alice is indeed her actual balance and can make an informed decision on approving her loan request. To check the state proof, Bob only needed to know the Ledger ID. He did not need to know any public keys or stakes or anything else ahead of time. All state proofs can be checked given just the Ledger ID.

Summary

The fundamental value proposition of a distributed ledger is that participants need not have special trust in any single node maintaining the state. Hedera’s state proofs ensure that clients querying the state can be given the necessary confidence that any data, even if returned by a single node, does indeed accurately represent the consensus state as maintained by the full network.

Without the option of a state proof for Alice’s balance, Alice could ask multiple nodes and gain confidence if their responses matched. But, even if she herself believed the claimed balance was valid, Alice would be unable to convince anybody else of that fact – her trust would not be transferable.

Hedera’s state proofs are a cryptographically secure, transferable and storable mechanism for enabling trust and confidence in representations of the Hedera state.

State proofs will be a supported confirmation option on Hedera sometime following the mainnet’s Open Access event.