Zero-knowledge proofs, a board game, and leaky abstractions: how I learned to write zk-SNARKs

Image source: https://commons.wikimedia.org/wiki/File:Mastermind_game_pieces.png

A great way to learn a new skill is to build something with it. This is particularly true in the cryptocurrency and blockchain space, where accessible documentation and usable software libraries often lag behind research and development. In fact, the more esoteric a field is, the higher its barriers to entry. I discovered this when I started to learn to write zero-knowledge proofs, a field which to beginners like myself, can seem like a riddle wrapped in mystery.

Zero-knowledge proofs nevertheless hold great potential. Their applications include untraceable cryptocurrencies such as ZCash, better scalability for Ethereum via off-chain transaction batching, and privacy-preserving identity proofs. Developers in the crypto ecosystem would therefore do well to learn about ZKPs. They may, however, easily feel deterred by the steep learning curve inherent in this rather niche field — one based on advanced mathematics. Fortunately, in recent months, ZKP software libraries become relatively more accessible. In this post, I will showcase a bite-sized application I built using one form of ZKPs known as zk-SNARKs (henceforth referred to as snarks), and journal the learning process I went through as a developer with no prior experience in this niche. This post will therefore be part technical explainer, and part reflection on a learning process and my takeaways.

Choosing the right libraries

I believe that the ability to verify snark proofs in a web browser has important implications for decentralised apps. Although by the time I discovered these libraries, Ethereum smart contracts which could verify snarks on-chain had already existed, I found no other libraries which could do the same in web apps. Since many dApps are web-based frontends for smart contracts, and snark-enabled smart contracts have a great deal of potential, it follows that the ability for Javascript code in the browser to verify snarks could similarly unlock interesting use cases for dApps.

Use case ideation

How Mastermind works

  1. Codebreaker makes a guess (e.g. YRGB)
  2. Codemaster gives a clue (3 black, 0 white)
  3. Repeat from (1) until the game ends, or the codebreaker runs out of turns.

The clue is made of zero to four black or white pieces; each black piece means that there exists a peg in the guess which exactly matches a peg in the secret of the same colour and in the same position, while each white peg means that there exists a peg in the secret of the same colour but in a different location.

For example, if the secret is:

R R G B

and the guess is

Y R G B

then the clue is

3 black pegs and 0 white pegs

Note since clue pegs are only counted once, and black pegs are counted first, so the first R in the guess doesn’t count towards a white peg, as the second R in the clue already has an exact match.

Applied to the Mastermind board game, snarks could thereby prove that a clue about a secret combination of colours is correct, without revealing the secret itself.

This is trivial in real life because the physical setup of the gameboard prevents cheating, but this is not easy online, as a remote server or client could fraudulently manipulate the game state. Additionally, to simply hash the solution and reveal it at the end of the game is insufficient, since there is no way for the codebreaker to be sure, mid-game, that the codemaster is not cheating. In fact, mid-game clue verification is a step up above real-world gameplay, where the solution is always hidden from view.

Learning through trial and error

The acronym zk-SNARK stands for “Zero-Knowledge Succinct Non-Interactive Argument of Knowledge,” and refers to a proof construction where one can prove possession of certain information, e.g. a secret key, without revealing that information, and without any interaction between the prover and verifier.

Christian Lundkvist’s introductory blog post was also extremely helpful. (For the sake of brevity, I will not repeat the basic concepts around snarks in this blog post, as they are easily discoverable via web searches.)

Next, I dove into the snarkjs and circom libraries, and wrote very simple circuits to get a sense of how the circom’s domain-specific language (DSL) worked. This is where the learning curve started to steepen. I mainly relied on circuits which Jordi had already written, as well as trial and error, to infer the syntax of the DSL. In doing so, I encountered various unexpected issues such as if statements not working, a lack of support for variable block scope, and out-of-memory errors with complex circuits. I filed bug reports and was happy to see Jordi resolve most of these issues.

With these issues fixed, I wrote the first version of a circuit for the Mastermind game. Expressed in pseudocode, the circuit does the following:

1. Given the following public inputs:
a. The guess — pubGuess
b. A salted hash of the secret combination — pubSolnHash
c. The number of black pegs in the clue — nb
d. The number of white pegs in the clue — nw
2. Given the following secret inputs:
a. The secret and salted combination — privSoln
3. Calculate the number of black pegs and white pegs given pubGuess and privSoln
4. Establish a constraint that the number of black pegs is equal to nb; likewise for white pegs and nw
5. Hash the secret combination
6. Establish a constraint that the hash of the secret combination is equal to pubSolnHash

In programming parlance, a snark constraint is like an assertion; for a circuit to be valid, all the constraints established in a circuit must be true. If any fail, then the whole circuit is invalid. Since the circuit establishes the above constraints, the circuit is valid if and only if the clue is correctly computed for the given solution and guess.

Additionally, the hash of the solution, which is made known to the codebreaker at the start of the game, commits the codemaster to the same solution every time, and thereby prevents them from generating a valid but fraudulent proof using a different secret colour combination.

Finally, I wrote a quick-and-dirty web frontend and backend server for the Mastermind game, and published the source code on Github. I was then able to share what I learned with other developers in ConsenSys, as well as present at the ETHSingapore hackerthon and at the ETHKL meetup.

Performance improvements

It was only until @therealyingtong connected me with barryWhiteHat that I learned of other hash functions with better performance in snarks. These were Pedersen and MiMC. Fortunately, Jordi had already published these circuits in the circomlib repository, so I could relatively easily replace SHA256 with Pedersen.

(This process took slightly longer than I had anticipated; as the Pedersen hash output is made of an x- and y-coordinate on a particular elliptic curve, I had to write another circuit to encode this point into a single big-integer by retaining the first byte of the x-point and the last 31 bytes of y, as the curve is symmetrical across the y-axis.)

When I was done, I had cut the proof generation time to about 18 seconds, and the setup phrase to about 30 seconds — a very satisfying 6x performance improvement.

Takeaways

All non-trivial abstractions, to some degree, are leaky.

With snarks, this is absolutely the case. Without knowing how snarks work under the hood, one might assume, as I did, that the slowness of a SHA256 circuit was unavoidable. Yet it turned out that the Pedersen hash function, which is conventionally inefficient, works very efficiently as a snark circuit. As Zooko Wilcox wrote:

Ian Miers noticed that a pedersen hash is an exceptionally good application for this fast ECC. Pedersen hashes have been mentioned in papers spanning decades, but have been largely ignored due to inefficiency. However, in our case they are perfect.

The law of leaky abstractions applies to many things in the cryptocurrency and blockchain space. Are you used to centralised databases? Get ready to be surprised by how slow smart contract platforms are; the tradeoffs inherent in the distributed-consistent-scale triangle will force you to not think of blockchains as neat abstractions, but as complex mechanisms whose features are deeply tied to their implementation.

Think the rhetoric that blockchains are “trustless” is true? Think again — what that actually means is that they reconfigure and perhaps, as Vitalik Buterin argues, increase the motives that people can have and exercise while keeping the probability of failure low. Or perhaps not — it really depends on how the system is set up.

In the case of snarks, their performance characteristics intimately depend on their underlying mathematical primitives. Perhaps I could have avoided using SHA256 in the first place had I done deeper research into snark math. The tradeoff, however, was that would take more time to reach a working demonstration. Without building something and talking to other developers, however, I would not have learnt that.

This leads me to the next takeaway: getting hands-on is a great way to learn a new skill. When one doesn’t know much about a new subject, one has to face plenty of unknown unknowns. By jumping straight into building, as opposed to trying to learn everything from scratch, one can more learn the most important unknowns with less time and effort, simply because the process of building forces one to tackle them along a natural progression.

Future work

Credits

More resources for learning about zk-SNARKs can be found in this webpage by Coda Protocol.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store