Non-interactive zero-knowledge proofs of knowledge for general NP statements are a powerful cryptographic primitive, both in theory and in practical applications. Recently, much research has focused on achieving an additional property, succinctness, requiring the proof to be very short and easy to verify. Such proof systems are known as zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs), and are desired when communication is expensive, or the verifier is computationally weak.
Contents

Introduction
Non-interactive zero-knowledge proofs of knowledge [BFM88, NY90, BDSMP91] are a powerful tool, studied extensively both in theoretical and applied cryptography. Recently, much research has focused on achieving an additional property, succinctness, that requires the proof to be very short and easy to verify. A proof system with this additional property is called a zero-knowledge Succinct Non-interactive ARgument of Knowledge (zk-SNARK). Because succinctness is a desirable, sometimes critical, property in numerous security applications, prior work has investigated zk-SNARK implementations. Unfortunately, all implementations to date suffer from severe scalability limitations, due to high space complexity, as we now explain.
What we know from theory
Ideally, we would like to implement a zk-SNARK that does not suffer from either of the scalability limitations mentioned in the previous section, i.e., a zk-SNARK where:
• Key generation is cheap (i.e., its running time only depends on the security parameter) and suffices for all computations (of polynomial size). Such a zk-SNARK is called fully succinct.
• Proof generation is carried out incrementally, alongside the original computation, by updating, at each step, a proof of correctness of the computation so far. Such a zk-SNARK is called incrementally computable. Work in cryptography tells us that the above properties can be achieved in theoretical zk-SNARK constructions. Namely, building on the work of Valiant on incrementally-verifiable computation [Val08] and the work of Chiesa and Tromer on proof-carrying data [CT10, CT12] , Bitansky et al. [BCCT13] showed how to construct zk-SNARKs that are fully-succinct and incrementally-computable.
Concretely, the approach of [BCCT13] consists of a transformation that takes as input a preprocessing zk-SNARK (such as one from existing implementations), and bootstraps it, via recursive proof composition, into a new zk-SNARK that is fully-succinct and incrementally-computable. In recursive proof composition, a prover produces a proof about an NP statement that, among other checks, also ensures the accepting computation of the proof system's own verifier. In a zk-SNARK, proof verification is asymptotically cheaper than merely verifying the corresponding NP statement; so recursive proof composition is viable, in theory. In practice, however, this step introduces concretely enormous costs: even if zk-SNARK verifiers can be executed in just a few milliseconds on a modern desktop [PGHR13, BCTV14] , zk-SNARK verifiers still take millions of machine cycles to execute. Hence, known zk-SNARK implementations cannot achieve even one step of recursive proof composition in practical time. Thus, whether recursive proof composition can be realized in practice, with any reasonable efficiency, has so far remained an intriguing open question.
Remark 1.2 (PCPs)
. Suitably instantiating Micali's "computationally-sound proofs" [Mic00] yields fullysuccinct zk-SNARKs. However, it is not known how to also achieve incremental computation with this approach (without also invoking the aforementioned approach of Bitansky et al. [BCCT13] ). Indeed, [Mic00] requires probabilistically-checkable proofs (PCPs) [BFLS91] , where one can achieve a prover that runs in quasilinear-time [BCGT13b] , but only by requiring space-intensive computations -again due to the need to write down the entire computation and conducting global operations on it.
Contributions
We present the first prototype implementation that practically achieves recursive composition of zk-SNARKs. This enables us to achieve the following results: (i) Scalable zk-SNARKs. We present the first implementation of a zk-SNARK that is fully succinct and incrementally computable. Our implementation follows the approach of Bitansky et al. [BCCT13] .
Our zk-SNARK works for proving/verifying computations on a general notion of random-access machine. The key generator takes as input a machine specification, consisting of settings for random-access memory (number of addresses and number of bits at each address) and a CPU circuit, defining the machine's behavior. The keys sampled by the key generator support proving/verifying computations, of any polynomial length, on this machine. Thus, our zk-SNARK implementation directly supports many architectures (e.g., floating-point processors, SIMD-based processors, etc.) -one only needs to specify memory settings and a CPU circuit.
Compared to the original machine computation, our zk-SNARK only imposes a constant multiplicative overhead in time and an essentially-constant additive overhead in space. Indeed, the proving process steps through the machine's computation, each time producing a new proof that the computation is correct so far, by relying on the prior proof; each proof asserts the satisfiability of a constant-size circuit, and requires few resources in time and space to produce. Our zk-SNARK scales, on today's hardware, to any computation size.
(ii) Proof-carrying data. The main tool in [BCCT13] 's approach is proof-carrying data (PCD) [CT10, CT12] , a cryptographic primitive that encapsulates the security guarantees provided by recursive proof composition. Thus, as a stepping stone towards the aforementioned zk-SNARK implementation, we also achieve the first implementation of PCD, for arithmetic circuits. (iii) Evaluation on vnTinyRAM. We evaluate our zk-SNARK on a specific choice of random-access machine: vnTinyRAM, a simple RISC von Neumann architecture that is supported by the most recent preprocessing zk-SNARK implementation [BCTV14] . The evaluation confirms our expectations that our approach is slower for small computations but achieves scalability to large computations.
We evaluated our prototype on 16-bit and 32-bit vnTinyRAM with 16 registers (as in [BCTV14] ). For instance, for 32-bit vnTinyRAM, our prototype incrementally proves correct program execution at the cost of 26.2 seconds per program step, using a 55 MB proving key and 993 MB of additional memory. In contrast, for a T -step program, the system of [BCTV14] requires roughly 0.05 · T seconds, provided that roughly 3.1 · T MB of main memory are available. Thus for T > 321 our system is more space-efficient, and the savings in space continue to grow as T increases. (These numbers are for an 80-bit security level.) The road ahead. Obtaining scalable zk-SNARKs is but one application of PCD. More generally, PCD enables efficient "distributed theorem proving", which has applications ranging from securing the IT supply chain, to information flow control, and to distributed programming-language semantics [CT10, CT12, CTV13] . Now that a first prototype of PCD has been achieved, these applications are waiting to be explored in practice.
Remark 1.3 (parametrization)
. In this work we describe a concrete implementation of a cryptographic system, whose efficiency scales with the security parameter and other quantities (e.g., wordsize of a machine, size of random-access memory, and so on). Since we make several concrete choices (e.g., fixing the security level at 80 bits, fixing vnTinyRAM's wordsize to 16 or 32 bits as in [BCTV14] ) many asymptotic dependencies "collapse" to constants. We focus on scalability as a function of the computation size, i.e., the number of steps and amount of memory in the original program's execution on the concrete random-access machine.
Summary of challenges and techniques
As we recall in Section 2, bootstrapping zk-SNARKs involves two main ingredients: a collision-resistant hash function and a preprocessing zk-SNARK. Practical implementations of both ingredients exist. So one may conclude that "practical bootstrapping" is merely a matter of stitching together implementations of these two ingredients. As we now explain, this conclusion is mistaken, because bootstrapping a zk-SNARK in practice poses several challenges that must be tackled in order to obtain any reasonable efficiency. Common theme: leverage field structure. The techniques that we employ to overcome efficiency barriers leverage the fact that the "native" NP language whose membership is proved/verified by the zk-SNARK is the satisfiability of F-arithmetic circuits, for a certain finite field F. While any NP statement can be reduced to F-arithmetic circuits, the proof system is most efficient for statements expressible as F-arithmetic circuits of small size. Prior work only partially leveraged this fact, by using circuits that conduct large-integer arithmetic or "pack" bits into field elements for non-bitwise checks (e.g., equality) [PGHR13, BCGTV13a, BFRS + 13, BCTV14] . In this paper, we go further and, for improved efficiency, use circuits that conduct field operations.
1.4.1 Challenge: how to efficiently "close the loop"?
By far the most prominent challenge is efficiently "closing the loop". In the bootstrapping approach, each step requires proving a statement that (i) verifies the validity of previous zk-SNARK proofs; and (ii) checks another execution step. For recursive composition, this statement needs to be expressed as an F-arithmetic circuit C pcd , so that it can be proved using the very same zk-SNARK. In particular, we need to implement the verifier V as an F-arithmetic circuit C V (a subcircuit of C pcd ).
In principle, constructing C V is possible, because circuits are a universal model of computation. And not just in principle: much research has been devoted to improve the efficiency and functionality of circuit generators in practice [SVPB + 12, BCGT13a, SBVB + 13, PGHR13, BCGTV13a, BCTV14]. Hence, a reasonable approach to construct C V is to apply a suitable circuit generator to a suitable software implementation of V .
However, such an approach is likely to be inefficient. Circuit generators strive to support complex program computations, by providing ways to efficiently handle data-dependent control flow, memory accesses, and so on. Instead, verifiers in preprocessing zk-SNARK constructions are "circuit-like" programs, consisting of few pairing-based arithmetic checks that do not use complex data-dependent control flow or memory accesses.
Thus, we want to avoid circuit generators, and somehow directly construct C V so that its size is not huge. As we shall explain (see Section 3), this is not merely a programmatic difficulty, but there are mathematical obstructions to constructing C V efficiently. Main technique: PCD-friendly cycles of elliptic curves. In our underlying preprocessing zk-SNARK, the verifier V consists mainly of operations in an elliptic curve over a field F , and is thus expressed, most efficiently, as a F -arithmetic circuit. We observe that if this field F is the same as the aforementioned native field F of the zk-SNARK's statement, then recursive composition can be orders of magnitude more efficient than otherwise. Unfortunately, as we shall explain, the "field matching" F = F is mathematically impossible.
In contrast, we show how to circumvent this obstruction by using multiple, suitably-chosen elliptic curves, that lie on a PCD-friendly cycle. For example, a PCD-friendly 2-cycle consists of two curves such that the (prime) size of the base field of one curve equals the group order of the other curve, and vice versa. Our implementation uses a PCD-friendly cycle of elliptic curves (found at a great computational expense) to attain zk-SNARKs that are tailored for recursive proof composition. Additional technique: nondeterministic verification of pairings. The zk-SNARK verifier involves, more specifically, several pairing-based checks over its elliptic curve. Yet, each pairing evaluation is very expensive, if not carefully performed. To further improve efficiency, we exploit the fact that the zk-SNARK supports NP statements, and provide a hand-optimized circuit implementation of the zk-SNARK verifier that leverages nondeterminism for improved efficiency. For instance, in our construction, we make heavy use of affine coordinates for both curve arithmetic and divisor evaluations [LMN10] , because these are particularly efficient to verify (as opposed to computing, for which projective or Jacobian coordinates are known to be faster).
Challenge: how to efficiently verify collision-resistant hashing?
Bootstrapping zk-SNARKs uses, at multiple places, a collision-resistant hash function H and an arithmetic circuit C H for verifying computations of H. If not performed efficiently, this would be another bottleneck.
For instance, the aforementioned circuit C pcd , besides verifying prior zk-SNARK proofs, is also tasked with verifying one step of machine execution. This involves not only checking the CPU execution but also the validity of loads and stores to random-access memory, done via memory-checking techniques based on Merkle trees [BEGKN91, BCGT13a] . Thus C pcd also needs to have a subcircuit to check Merkle-tree authentication paths. Constructing such circuits is straightforward, given a circuit C H for verifying computations of H. But the main question here is how to pick H so that C H can be small. Indeed, if random-access memory consists of A addresses, then checking an authentication path requires at least log A · |C H | gates. If C H is large, this subcircuit dwarfs the CPU, and "wastes" most of the size of C pcd for a single load/store.
Merely picking some standard choice of hash function H (e.g., SHA-256 or Keccak) yields C H with tens of thousands of gates [PGHR13, BCGG + 14], making hash verifications very expensive. Is this inherent? Additional technique: field-specific hashes. We select a hash H that is tailored to efficient verification in the field F. In our setting, F has prime order p, so its additive group is isomorphic to Z p . Thus, a natural approach is to let H be a modular subset-sum function over Z p . For suitable parameter choices and for random coefficients, subset-sum functions are collision-resistant [Ajt96, GGH96] . In this paper we base all of our collision-resistant hashing on suitable subset sums, and thereby greatly reduce the burden of hashing. 3 
Roadmap
The rest of this paper is organized as follows. In Section 2 we recall the main ideas of [BCCT13] 's approach. Then we discuss our construction in more detail, in the following three steps: In Section 7, we evaluate our system on the machine vnTinyRAM. In Section 8, we discuss open problems.
Preliminaries
We give here the essential definitions needed for the technical discussions in the body of the paper; more detailed definitions can be found in the appendices (where some definitions are taken verbatim from [BCTV14] ). We denote by F a field, and by F n the field of size n. Throughout, we assume familiarity with finite fields; for background on these, see the book of Lidl and Niederreiter [LN97] .
Preprocessing zk-SNARKs for arithmetic circuits
Given a field F, the circuit satisfaction problem of an F-arithmetic circuit C :
A preprocessing zk-SNARK for F-arithmetic circuit satisfiability (see, e.g., [BCIOP13] ) is a triple of polynomial-time algorithms (G, P, V ), called key generator, prover, and verifier. The key generator G, given a security parameter λ and an F-arithmetic circuit C : F n × F h → F l , samples a proving key pk and a verification key vk; these are the proof system's public parameters, which need to be generated only once per circuit. After that, anyone can use pk to generate non-interactive proofs for the language L C , and anyone can use the vk to check these proofs. Namely, given pk and any (x, a) ∈ R C , the honest prover P (pk, x, a) produces a proof π attesting that x ∈ L C ; the verifier V (vk, x, π) checks that π is a valid proof for x ∈ L C . A proof π is a proof of knowledge, as well as a (statistical) zero-knowledge proof. The succinctness property requires that π has length O λ (1) and V runs in time O λ (|x|), where O λ hides a (fixed) polynomial in λ.
See Appendix C for details.
Proof-carrying data
Proof-carrying data (PCD) [CT10, CT12] is a cryptographic primitive that encapsulates the security guarantees obtainable via recursive composition of proofs. Since recursive proof composition naturally involves multiple (physical or virtual) parties, PCD is phrased in the language of a dynamically-evolving distributed computation among mutually-untrusting computing nodes, who perform local computations, based on local data and previous messages, and then produce output messages. Given a compliance predicate Π to express local checks, the goal of PCD is to ensure that any given message z in the distributed computation is Π-compliant, i.e., is consistent with a history in which each node's local computation satisfies Π. This formulation includes as special cases incrementally-verifiable computation [Val08] and targeted malleability [BSW12] . Concretely, a proof-carrying data (PCD) system is a triple of polynomial-time algorithms (G, P, V), called key generator, prover, and verifier. The key generator G is given as input a predicate Π (specified as an arithmetic circuit), and outputs a proving key pk and a verification key vk; these keys allow anyone to 3 We note that subset-sum functions were also used in [BFRS + 13], but, crucially, they were not tailored to the field. This is a key difference in usage and efficiency. (E.g., our hash function can be verified in ≤ 300 gates, while [BFRS + 13] report 13,000.)
prove/verify that a piece of data z is Π-compliant. This is achieved by attaching a short and easy-to-verify proof to each piece of data. Namely, given pk, received messages z in with proofs π in , local data z loc , and a claimed outgoing message z, P computes a new proof π to attach to z, which attests that z is Π-compliant; the verifier V(vk, z, π) verifies that z is Π-compliant. A proof π is a proof of knowledge, as well as a (statistical) zero-knowledge proof; succinctness requires that π has length O λ (1) and V runs in time O λ (|z|). Finally, note that since Π is expressed as an F-arithmetic circuit for a given field F, the size of messages and local data are fixed; we denote these sizes by n msg , n loc ∈ N. Similarly, the number of input messages is also fixed; we call this the arity, and denote it by s ∈ N. Moreover, for convenience, Π also takes as input a flag b base ∈ {0, 1} denoting whether the node has no predecessors (i.e., b base is a "base-case" flag). Overall,
See Appendix D for details.
The bootstrapping approach
Our implementation follows [BCCT13] , which we now review. The approach consists of a transformation that, on input a preprocessing zk-SNARK and a collision-resistant hash function, outputs a scalable zk-SNARK. Thus, the input zk-SNARK is bootstrapped into one with improved scalability properties. So fix a preprocessing zk-SNARK (G, P, V ) and collision-resistant function H. The goal is to construct a fully-succinct incrementally-computable zk-SNARK (G , P , V ) for proving/verifying the correct execution on a given random-access machine M. Informally, we describe the transformation in four steps.
Step 1: from zk-SNARKs to PCD. The first step, independent of M, is to construct a PCD system (G, P, V), by using the zk-SNARK (G, P, V ). This step involves recursive composition of zk-SNARK proofs.
Step 2: delegate the machine's memory. The second step is to reduce the footprint of the machine M, by delegating its random-access memory to an untrusted storage, via standard memory-checking techniques based on Merkle trees [BEGKN91, BCGT13a] . We thus modify M so that its "CPU" receives values loaded from memory as nondeterministic guesses, along with corresponding authentication paths that are checked against the root of a Merkle tree based on the hash function H. Thus, the entire state of M only consists of a (short) CPU state, and a (short) root of the Merkle tree that "summarizes" memory. 4 Step 3: design a predicate Π M,H for step-wise verification. The third step is to design a compliance predicate Π M,H that ensures that the only Π M,H -compliant messages z are the ones that result from the correct execution of the (modified) machine M, one step at a time; this is analogous to the notion of incremental computation [Val08] . Crucially, because Π M,H is only asked to verify one step of execution at a time, we can implement Π M,H 's requisite checks with a circuit of merely constant size.
Step 4: construct new proof system. The new zk-SNARK (G , P , V ) is constructed as follows. The new key generator G is set to the PCD generator G invoked on Π M,H . The new prover P uses the PCD prover P to prove correct execution of M, one step at a time and conducting the incremental distributed computation "in his head". The new verifier V simply uses the PCD verifier V to verify Π M,H -compliance. In sum, since Π M,H is small and suffices for all computations, the new zk-SNARK is scalable: it is fully succinct; moreover, because the new prover computes a proof for each new step based on the previous one, it is also incrementally computable. (See Appendix E for definitions of these properties.)
Our goal is to realize the above approach in a practical implementation. Security of recursive proof composition. Security in [BCCT13] is proved by using the proof-of-knowledge property of zk-SNARKs; we refer the interested reader to [BCCT13] for details. One aspect that must be addressed from a theoretical standpoint is the depth of composition. Depending on assumption strength, one 4 Similarly to [BCCT13] and our realization thereof, Braun et al. [BFRS + 13] leverage memory-checking techniques based on Merkle trees [BEGKN91] for the purpose of enabling a circuit to "securely" load from and store to an untrusted storage. However, the systems' goals (delegation of MapReduce computations via a 2-move protocol) and techniques are different (cf. Footnote 3). may have to recursively compose proofs in "proof trees above the message chain", rather than along the chain. From a practical perspective we make the heuristic assumption that depth of composition does not affect security of the zk-SNARK, because no evidence suggests otherwise for the constructions that we use.
PCD-friendly preprocessing zk-SNARKs
We first construct preprocessing zk-SNARKs that are tailored for efficient recursive composition of proofs. Later, in Section 5, we discuss how we use such zk-SNARKs to construct a PCD system.
PCD-friendly cycles of elliptic curves
Let F be a finite field, and (G, P, V ) a preprocessing zk-SNARK for F-arithmetic satisfiability. The idea of recursive proof composition is to prove/verify satisfiability of an F-arithmetic circuit C pcd that checks the validity of previous proofs (among other things). Thus, we need to implement the verifier V as an F-arithmetic circuit C V , to be used as a sub-circuit of C pcd .
How to write C V depends on the algorithm of V , which in turn depends on which elliptic curve is used to instantiate the pairing-based zk-SNARK. For prime r, in order to prove statements about F r -arithmetic circuit satisfiability, one instantiates (G, P, V ) using an elliptic curve E defined over some finite field F q , where the group E(F q ) of F q -rational points has order r = #E(F q ) (or, more generally, r divides #E(F q )). Then, all of V 's arithmetic computations are over F q , or extensions of F q up to degree k, where k is the embedding degree of E with respect to r (i.e., the smallest integer k such that r divides
We motivate our approach by first describing two "failed attempts". Attempt #1: pick curve with q = r. Ideally, we would like to select a curve E with q = r, so that V 's arithmetic is over the same field for which V 's native NP language is defined. Unfortunately, this cannot happen: the condition that E has embedding degree k with respect to r implies that r divides q k − 1, which implies that q = r. The same implication holds even if E(F q ) has a non-prime order n and the prime r (with respect to which k is defined) only divides n. So, while appealing, this idea cannot even be instantiated. 5 Attempt #2: long arithmetic. Since we are stuck with q = r, we may consider doing "long arithmetic": simulating F q operations via F r operations, by working with bit chunks to perform integer arithmetic, and modding out by q when needed. Alas, having to work at the "bit level" implies a blowup on the order of log q compared to native arithmetic. So, while this approach can at least be instantiated, it is very expensive. 6 Our approach: cycle through multiple curves. We formulate, and instantiate, a new property for elliptic curves that enables us to completely circumvent long arithmetic, even with q = r. In short, our idea is to base recursive proof composition, not on a single zk-SNARK, but on multiple zk-SNARKs, each instantiated on a different elliptic curve, that jointly satisfy a special property.
For the simplest case, suppose we have two primes q α and q β , and elliptic curves E α /F qα and E β /F q β such that q α = #E β (F q β ) and q β = #E α (F qα ), i.e., the size of the base field of one curve equals the group order of the other curve, and vice versa. We then construct two preprocessing zk-SNARKs (G α , P α , V α ) and (G β , P β , V β ), respectively instantiated on the two curves E α /F qα and E β /F q β . Now note that (G α , P α , V α ) works for F q β -arithmetic circuit satisfiability, but all of V α 's arithmetic computations are over F qα (or extensions thereof); while (G β , P β , V β ) works for F qα -arithmetic circuits, but V β 's arithmetic computations are over F q β (or extensions thereof). Instead of having each zk-SNARK handle statements about its own verifier, as in the prior attempts (i.e., writing V α as a F q β -arithmetic circuit, or V β as a F qα -arithmetic circuit), we instead let each zk-SNARK handle statements about the verifier of the other zk-SNARK. That is, we write V α as a F qα -arithmetic circuit C Vα , and V β as a F q β -arithmetic circuit C V β .
We can then perform recursive proof composition by alternating between the two proof systems. Roughly, one can use P α to prove successful verification of a proof by C V β and, conversely, P β to prove successful verification of a proof by C Vα . Doing so in alternation ensures that fields "match up", and no long arithmetic is needed. (This sketch omits key technical details; see Section 4.)
Since E α and E β facilitate constructing PCD, we say that (E α , E β ) is a PCD-friendly 2-cycle of elliptic curves. More generally, the idea extends to cycling through curves satisfying this definition: Definition 3.1. Let E 0 , . . . , E −1 be elliptic curves, respectively defined over finite fields F q 0 , . . . , F q −1 , with each q i a prime. We say that (E 0 , . . . , E −1 ) is a PCD-friendly cycle of length if each E i is pairing friendly and, moreover, ∀ i ∈ {0, . . . , − 1}, q i = #E i+1 mod (F q i+1 mod ) .
To our knowledge this notion has not been explicitly sought before. 7 Though, fortunately, a family that satisfies this notion is already known, as discussed in the next subsection.
Remark 3.2 (relaxation). One can relax Definition 3.1 to require a weaker, but still useful, condition: for each i ∈ {0, . . . , − 1}, q i divides #E(F i+1 mod ). Even if weaker, this condition is still very strong. For instance, it implies that each curve E i has ρ-value ≈ 1, i.e., that each E i has near-prime order. 8 Constructing pairing-friendly curves with such good ρ-values is challenging even without the cycle condition! Hence, generic methods such as Cocks-Pinch [CP01] and Dupont-Enge-Morain [DEM05] , which yield (with high probability) curves with ρ-values > 1 (specifically, ≈ 2), cannot be used to construct PCD-friendly cycles. 9 This also applies to generalizations of the Cocks-Pinch method [BLS03, BW05, SB06] that improve the ρ-value to be 1 < ρ < 2. In this work we do not investigate the above relaxation because, we can fulfill Definition 3.1 with = 2, the minimal length possible.
Two-cycles based on MNT curves
We construct pairs of elliptic curves, E 4 and E 6 , that form PCD-friendly 2-cycles (E 4 , E 6 ). These are MNT curves [MNT01] of embedding degrees 4 and 6. Our construction also ensures that E 4 and E 6 are sufficiently 2-adic (see below), a desirable property for efficient implementations of preprocessing zk-SNARKs. MNT curves and the KT correspondence. Miyaji, Nakabayashi, and Takano [MNT01] characterized prime-order elliptic curves with embedding degrees k = 3, 4, 6; such curves are now known as MNT curves. Given an elliptic curve E defined over a prime field F q , they gave necessary and sufficient conditions on the pair (q, t), where t is the trace of E over F q , for E to have embedding degree k = 3, 4, 6. We refer to an MNT curve with embedding degree k as an MNTk curve. Karabina and Teske [KT08] proved an explicit 1-to-1 correspondence between MNT4 and MNT6 curves:
). Let r, q > 64 be primes. Then the following two conditions are equivalent: 1. r and q represent an elliptic curve E 4 /F q with embedding degree k = 4 and r = #E(F q ); 2. r and q represent an elliptic curve E 6 /F r with embedding degree k = 6 and q = #E(F r ).
7 Definition 3.1 is reminiscent, but different from, the notion of an aliquot cycle of elliptic curves by Silverman and Stange [SS11] . An aliquot cycle considers a single curve (over Q) reduced at primes, rather than curves, and does not require pairing-friendliness. 8 For each i ∈ {0, . . . , −1}, the condition that qi−1 divides #E(Fi) implies, via the Hasse bound, that hiqi−1 ≤ qi ·(1+2/ √ qi)
for some cofactor hi ∈ N; hence log qi−1 ≤ log qi + log(1 + 2/ √ qi) − log hi, and thus
. Note that each ai is exponentially small. Therefore, for each i ∈ {0, . . . , − 1}, we can upper bound the ρ-value of Ei, equal to log q i log q i−1 , as follows:
In sum, the ρ-value of Ei is upper bounded by 1 + i, where the quantity i := j =i aj + j =i aj is exponentially small. 9 At best, such methods can be used to construct PCD-friendly "chains", which can be used to reduce the space complexity of preprocessing zk-SNARKs via a limited application of recursive proof composition. But the large ρ-values would imply that each recursive composition roughly doubles the cost of the zk-SNARK so that long chains do not seem to be advantageous.
PCD-friendly 2-cycles on MNT curves. The above theorem implies that:
Each MNT6 curve lies on a PCD-friendly 2-cycle with the corresponding MNT4 curve (and vice versa).
Thus, a PCD-friendly 2-cycle can be obtained by constructing an MNT4 curve and its corresponding MNT6 curve. Next, we explain at high level how this can be done. Constructing PCD-friendly 2-cycles. First, we recall the only known method to construct MNTk curves [MNT01] . It consists of two steps:
• Step I: curve discovery. Find suitable (q, t) ∈ N 2 such that there exists an ordinary elliptic curve E/F q of prime order r := q + 1 − t and embedding degree k.
• Step II: curve construction. Starting from (q, t), use the Complex-Multiplication method (CM method)
[AM93] to compute the equation of E over F q . The complexity of Step II depends on the discriminant D of E, which is the square-free part of 4q − t 2 . At present, the CM method is feasible for discriminants D up to size 10 16 [Sut12] . Thus, Step I is conducted in a way that results in candidate parameters (q, t) inducing relatively-small discriminants, to aid Step II. (Instead, "most" (q, t) induce a discriminant D of size √ q, which is too large to handle.) Concretely, [MNT01] derived, for k ∈ {3, 4, 6} and discriminant D, Pell-type equations whose solutions yield candidate parameters (q, t) for MNTk curves E/F q of trace t and discriminant D. So Step I can be performed by iteratively solving the MNTk Pell-type equation, for increasing discriminant size, until a suitable (q, t) is found.
The above strategy can be extended, in a straightforward way, to construct PCD-friendly 2-cycles. First perform Step I to obtain suitable parameters (q 4 , t 4 ) for an MNT4 curve E 4 /F q 4 ; the parameters (q 6 , t 6 ) for the corresponding MNT6 curve E 6 /F q 6 are q 6 := q 4 + 1 − t 4 and t 6 := 2 − t 4 . Then perform Step II for (q 4 , t 4 ) to compute the equation of E 4 , and then also for (q 6 , t 6 ) to compute that of E 6 . The complexity in both cases is the same: one can verify that E 4 and E 6 have the same discriminant. The two curves E 4 and E 6 form a PCD-friendly 2-cycle (E 4 , E 6 ). Suitable cycle parameters. We now explain what "suitable (q 4 , t 4 )" means in our context, by specifying a list of additional properties that we wish a PCD-friendly cycle to satisfy.
• Bit lengths. In a 2-cycle (E 4 , E 6 ), the curve E 4 is "less secure" than E 6 , because E 4 has embedding degree 4 while E 6 has embedding degree 6. Thus, we use E 4 to set lower bounds on bit lengths. Since we aim at a security level of 80 bits, we need r 4 ≥ 2 160 and q 4 ≥ 2 240 (so that √ r 4 ≥ 2 80 and q 4 4 ≥ 2 960 [FST10] ). Since log r 4 ≈ log q 4 for MNT4 curves, we only need to ensure that q 4 has at least 240 bits. 10
• Towering friendliness. We restrict our focus to moduli q 4 and q 6 that are towering friendly (i.e., congruent to 1 modulo 6) [BS10] ; this improves the efficiency of arithmetic in F 4 q 4 and F 6 q 6 (and their subfields).
is instantiated with an elliptic curve E/F q of prime order r (or with #E(F q ) divisible by a prime r), it is important, for efficiency reasons, that r − 1 is divisible by a large power of 2, i.e., ν 2 (r − 1) is large.
(Recall that ν 2 (n), the 2-adic order of n, is the largest power of 2 dividing n.) Concretely, if G is invoked on an F r -arithmetic circuit C, it is important that ν 2 (r − 1) ≥ log |C| . We call ν 2 (r − 1) the 2-adic order of E, or the 2-adicity of E. (See Appendix C.2 for more details.)
So let 4 and 6 be the target values for ν 2 (r 4 − 1) and ν 2 (r 6 − 1). One can verify that, for any MNTbased PCD-friendly 2-cycle (E 4 , E 6 ), it holds that ν 2 (r 4 − 1) = 2 · ν 2 (r 6 − 1); in other words, E 4 is always "twice as 2-adic" as E 6 . Thus, to achieve the target 2-adic orders, it suffices to ensure that ν 2 (r 4 − 1) ≥ max{ 4 , 2 6 } (where, as before, r 4 := q 4 + 1 − t 4 ). As we shall see (in Section 5), in this paper it will suffice to take ν 2 (r 4 − 1) ≥ 34.
10 Alas, since E4 has a low embedding degree, the ECDLP in E(Fq 4 ) and DLP in F 4 q 4 are "unbalanced": the former provides 120 bits of security, while the latter only 80. Moreover, the same is true for E6: the ECDLP in E(Fq 6 ) provides 120 bits of security, while the DLP in F Of the above properties, the most restrictive one is 2-adicity, because it requires seeing enough curves until, "by sheer statistics", one finds (q 4 , t 4 ) with a high-enough value for ν 2 (r 4 − 1). Collecting enough samples is costly because, as discriminant size increases, the density of MNT curves decreases: empirically, one finds that the number MNT curves with discriminant D ≤ N is (approximately) less than
An extensive computation for a suitable cycle. Overall, finding and constructing a suitable cycle required a substantial computational effort.
• Cycle discovery. In order to find suitable parameters for a cycle, we explored a large space: all discriminants up to 1.1 · 10 15 , requiring about 610,000 core-hours on a large cluster of modern x86 servers. Our search algorithm is a modification of [KT08, Algorithm 3] . Among all the 2-cycles that we found, we selected parameters (q 4 , t 4 ) and (q 6 , t 6 ) for a 2-cycle (E 4 , E 6 ) of curves such that: (i) q 4 , q 6 each have 298 bits; (ii) q 4 , q 6 are towering friendly; and (iii) ν 2 (r 4 − 1) = 34 and ν 2 (r 6 − 1) = 17. The bit length of q 4 , q 6 is higher than the lower bound of 240; we entail this cost so to pick a rare cycle with high 2-adicity, which helps the zk-SNARK's efficiency more than the slowdown incurred by the higher bit length.
• Cycle construction. Both E 4 and E 6 have discriminant 614144978799019, whose size requires state-ofthe-art techniques in the CM method [Sut11, ES10, Sut12] in order to explicitly construct the curves. 11 Below, we report the parameters and equations for the 2-cycle (E 4 , E 6 ) that we selected.
Security. One may wonder if curves lying on PCD-friendly cycles are weak (e.g., in terms of DL hardness). Yet, MNT4 and MNT6 curves of suitable parameters are widely believed to be secure, and they all fall in PCD-friendly 2-cycles. The additional requirement of high 2-adicity is not known to cause weakness either.
A matched pair of preprocessing zk-SNARKs
Based on the PCD-friendly cycle (E 4 , E 6 ), we designed and constructed two preprocessing zk-SNARKs for arithmetic circuit satisfiability: (G 4 , P 4 , V 4 ) based on the curve E 4 , and (G 6 , P 6 , V 6 ) on E 6 . The software implementation follows [BCTV14] , the fastest preprocessing zk-SNARK implementation for circuits at the time of writing. We thus adapt the techniques in [BCTV14] to our algebraic setting, which consists of the two MNT curves E 4 and E 6 , and achieve efficient implementations of (G 4 , P 4 , V 4 ) and (G 6 , P 6 , V 6 ).
The implementation itself entails many algorithmic and engineering details, and we refer the reader to [BCTV14] for a discussion of these techniques. We only provide a high-level efficiency comparison between the preprocessing zk-SNARK of [BCTV14] based on Edwards curves (also at 80-bit security), and our implementations of (G 4 , P 4 , V 4 ) and (G 6 , P 6 , V 6 ); see Figure 1 . Our implementation is slower, because of two main reasons: (i) MNT curves do not enjoy advantageous properties that Edwards curves do; and (ii) the modulus sizes are larger (298 bits in our case vs. 180 bits in [BCTV14] ). On the other hand, the fact that MNT curves lie on a PCD-friendly 2-cycle is crucial for the PCD construction described next. 11 The authors are grateful to Andrew V. Sutherland for generous help in running the CM method on such a large discriminant. Figure 1 : Comparison of (G Ed , P Ed , V Ed ), (G4, P4, V4), and (G6, P6, V6), on a circuit C with 2 17 gates and inputs of 10 field elements. The size of C was chosen so that the 2-adicity of each zk-SNARK's curve is high enough (i.e., ν2(ri − 1) ≥ 17 for i = Ed, 4, 6). The experiment was conducted on our benchmarking machine (described in Section 7), running in single-thread mode. (The reported times are the average of 10 experiments, with standard deviation less than 1%.)
bits of security
(G Ed , P Ed , V Ed ) (G4, P4, V4) (G6,
Proof-carrying data from PCD-friendly zk-SNARKs
In Section 3 we formulated, and instantiated, PCD-friendly cycles of elliptic curves (see Definition 3.1); this notion was motivated by efficiency considerations arising when recursively composing zk-SNARK proofs. Roughly, given two zk-SNARKs based on elliptic curves forming a PCD-friendly 2-cycle, one can alternate between the two proof systems, and the 2-cycle property ensures that fields "match up" at each recursive verification, allowing for an efficient circuit implementation of the verifier of both proof systems.
The discussion so far, however, is only a sketch of the approach and omits key technical details. We now spell out these by describing how to construct a PCD system, given the two zk-SNARKs. So let (E α , E β ) be a PCD-friendly 2-cycle of elliptic curves, and let (G α , P α , V α ) and (G β , P β , V β ) be two preprocessing zk-SNARKs respectively instantiated with the two elliptic curves E α /F qα and E β /F q β . Note that:
• (G α , P α , V α ) works for F rα -arithmetic circuit satisfiability, while V α 's computations are over F qα ; and • (G β , P β , V β ) works for F r β -arithmetic circuit satisfiability, while V β 's computations are over F q β . Due to the 2-cycle property, F rα equals F q β , and F r β equals F qα . Our goal is to use (G α , P α , V α ) and (G β , P β , V β ), along with other ingredients, to construct a PCD system (G, P, V).
Remark 4.1 (longer cycles). As we have PCD-friendly cycles of length = 2, the PCD construction described in this section (including our code) is specialized to this case. One can extend the construction to work with (preprocessing zk-SNARKs based on) PCD-friendly cycles of length > 2.
Intuition
We begin by giving the intuition behind our construction of the PCD generator G, prover P, and verifier V. For simplicity, for now, we focus on the case where each node receives a single input message (i.e., the special case of "message chains" having arity s = 1). Starting point. A natural first attempt is to construct two arithmetic circuits, C pcd,α over F rα and C pcd,β over F r β , that, for a given compliance predicate Π, work as follows. In other words, C pcd,α checks Π-compliance at a node and also verifies a previous proof, relative to V β ; while C pcd,β does the same, but verifies a previous proof relative to V α . Also note that the input x, but not the witness a (over which we have no control), specifies the choice of verification keys.
More precisely, on input Π, the PCD generator G would work as follows: (i) construct C pcd,α and C pcd,β from Π; (ii) sample two key pairs, (pk α , vk α ) ← G 4 (C pcd,α ) and (pk β , vk β ) ← G 6 (C pcd,β ); and (iii) output pk := (pk α , pk β , vk α , vk β ) and vk := (vk α , vk β ). On input proving key pk, outgoing message z, local data z loc , and incoming message z in , the PCD prover P would invoke P α (pk α , x, a) if π in is relative to V β or P β (pk β , x, a) if π in is relative to V α , where x := (vk α , vk β , z) and a := (z loc , z in , π in ). Finally, on input verification key vk, message z, and proof π, the PCD verifier V would either invoke V α (vk α , x, π) or V β (vk β , x, π), where x := (vk α , vk β , z).
However, the above simple sketch suffers from two main problems, which we now describe. Problem #1. The compliance predicate Π is an arithmetic circuit. However, should Π be defined over F rα or F r β ? If Π is defined over F rα , then the F r β -arithmetic circuit C pcd,β will be very inefficient, because it has to evaluate Π over the "wrong" field; conversely, if Π is defined over F r β , then C pcd,α will be very inefficient.
Problem #2. In known preprocessing zk-SNARK constructions, including the one underlying (G α , P α , V α ) and (G β , P β , V β ), a verification key has length (n) > n, where n is the size of the input to the circuit with respect to which the key was created. Thus, it is not possible to obtain either vk α or vk β that works for inputs of the form x = (vk α , vk β , z). Our solution (at high level). To address the first problem, we simply "pick one side": only one of C pcd,α and C pcd,β evaluates Π, while the other circuit merely enables the PCD prover to translate a proof relative to one zk-SNARK verifier to one relative to the other zk-SNARK verifier. Arbitrarily, we pick C pcd,α to be the one that evaluates Π; in particular, Π will be an F rα -arithmetic circuit. 12 (The choice of C pcd,α is without loss of generality, since we can always relabel: if (E α , E β ) is PCD-friendly 2-cycle, so is (E β , E α ).)
To address the second problem, the ideal solution is to simply hardcode vk β in C pcd,α and vk α in C pcd,β (and let an input x consist only of a message z). However, this is not possible: vk β depends on C pcd,β , while vk α depends on C pcd,α (i.e., there is a circular dependency). We thus proceed as follows. We hardcode vk α in C pcd,β . Then, for vk β , we rely on collision-resistant hashing. Namely, inputs x have the form (χ β , z) where, allegedly, χ β is the hash of vk β . We modify C pcd,α to check that this holds: C pcd,α 's witness is extended to (allegedly) contain vk β and then C pcd,α checks that H α (vk β ) = χ β , where
is a suitable collision-resistant hash function.
The above modifications to C pcd,α and C pcd,β yield the following construction.
Further details. The above discussion omits various technical details and optimizations. For instance, thus far we have ignored the fact that, while C pcd,α expects inputs over F rα , C pcd,β expects inputs over F r β . Since x = χ β lies in F d H,α rα (as it is the output of H α ), we cannot use the same representation of it for both C pcd,α and C pcd,β ; instead, we need two representations: x α ∈ F nα rα for C pcd,α , and x β ∈ F n β r β for C pcd,β . Naturally, for the first, we can set n α := d H,α , and let x α := χ β . For the second, merely letting x β be the list of n α · log r α bits in χ β is not efficient: it would cause vk β to have length vk,β (n α · log r α ).
Instead, we let x β store these bits into as few elements of F r β as possible; specifically, n β := nα· log rα log r β of them. So let:
denote the function that maps x α to (the binary representation of) x β ; and
denote the function that maps x β back to (the binary representation of) x α . The above implies that we need to further modify C pcd,α and C pcd,β , and include explicit subcircuits C S,α→β and C S,α←β to carry out these "type conversions"; both of these circuits are simple to construct, and have size |C S,α→β | = |C S,α←β | = n α · log r α .
Moreover, we leverage precomputation techniques [BCTV14] . A zk-SNARK verifier V can be viewed as two functions: an "offline" function V offline that, given the verification key vk, computes a processed verification key pvk; and an "online" function V online that, given pvk, an input x, and proof π, computes the decision bit. (I.e., V (vk, x, π) := V online V offline (vk), x, π).) Precomputation offers a tradeoff: while V online is cheaper to compute than V , pvk is larger than vk (in each case, the difference is an additive constant). In our setting, it turns out that it pays off to use precomputation techniques only in C pcd,β but not in C pcd,α . We address all the details in the next subsection, where we give the construction of the PCD system.
Construction
We now describe in more detail our construction of the PCD generator G, prover P, and verifier V. Throughout, we fix a message size n msg ∈ N, local-data size n loc ∈ N, and arity s ∈ N. The construction will then work for F rα -arithmetic compliance predicates Π :
, for message size n msg , local-data size n loc , arity s, and some output size l ∈ N). In terms of ingredients, we make use of the following arithmetic circuits:
such that m H,α ≥ vk,β (n β ) + n msg · log r α , where n α := d H,α and n β := nα· log rα log r β .
• An F rα -arithmetic circuit C S,α→β , implementing S α→β :
for inputs of n α elements in F rα ; an input x α ∈ F nα rα is given to C online V,α as a string of n α · log r α elements in F r β , each carrying a bit of x α .
• An F rα -arithmetic circuit C V,β , implementing V β for inputs of n β elements in F r β ; an input x β ∈ F n β r β is given to C V,β as a string of n β · log r β elements in F rα , each carrying a bit of x β . For now we take the above circuits as given; later, in Section 5 we discuss our concrete instantiations of them. Also, we generically denote by bits α a function that, given an input y in F rα (for some ), outputs y's binary representation; the corresponding F rα -arithmetic circuit is denoted C bits,α , and has · log r α gates. 14 For reference, pseudocode for the triple (G, P, V) is given in Figure 2. 14 More precisely, for each Fr α -element yi in the vector y, bitsα outputs bits b1, . . . , b log rα such that log rα −1 j=0 bj2 j = yi, where arithmetic is conducted over Fr α . Due to wrap around, some elements in Fr α have two such representations; if so, bitsα outputs the lexicographically-first one. None the less, we construct C bits,α to only check for either of these two representations, because: (i) discriminating between representations costs an additional · log rα gates; and (ii) doing so does not affect completeness or soundness of our construction.
The PCD generator. The PCD generator G takes as input an F rα -arithmetic compliance predicate Π, and outputs a key pair (pk, vk) for proving/verifying Π-compliance. The PCD generator works as follows: (i) it uses C H,α , C S,α→β , C V,β , Π to construct the circuit C pcd,α ; (ii) it samples a key pair, (pk α , vk α ) ← G 4 (C pcd,α ); (iii) it uses vk α , C S,α←β , C online V,α to construct the other circuit C pcd,β ; (iv) it samples another key pair, (pk β , vk β ) ← G 6 (C pcd,β ); and (v) it outputs pk := (pk α , pk β , vk α , vk β ) and vk := (vk α , vk β ).
We now describe C pcd,α and C pcd,β . The circuit C pcd,β acts as a "proof converter": it takes an input x β ∈ F n β r β and a witness a β ∈ F h β r β , parses a β as a zk-SNARK proof π α for V α , and simply checks that C online V,α vk α , C S,α←β (x β ), π α = 1. (The verification key vk α is hardcoded in C pcd,β .) In contrast, the circuit C pcd,α verifies Π-compliance: it takes an input x α ∈ F nα rα and a witness a α ∈ F hα rα , parses a α as (vk β , z, z loc , z in , b base , π in , b res ), and verifies that x α = C H,α (C bits,α (vk β ) C bits,α (z)) and that Π(z, z loc , z in , b base ) = 0. Moreover, if b base = 0 (not the base case), C pcd,α also recursively verifies Π-compliance of previous messages: for each corresponding pair
See Figure 2 for details. Overall, the two circuits have the following sizes:
The PCD prover. The PCD prover P takes as input a proving key pk, outgoing message z, local data z loc , and incoming messages z in ; when not in the base case, it also takes as input proofs π in , each attesting that a message in z in is Π-compliant. The PCD prover outputs a proof π attesting to the fact that z is Π-compliant. At high level, the PCD prover performs not one, but two, steps of recursive composition, "going around the PCD-friendly 2-cycle". The first step is relative to C pcd,α and checks Π-compliance; the second step is relative to C pcd,β and merely converts the proof produced by the first step to the other verifier. More precisely, the PCD prover constructs x α := H α (bits α (vk β ) bits α (z)) ∈ F nα rα and then uses P α to produce a proof π α attesting that x α ∈ L C pcd,α . In the base case, C pcd,α only verifies that Π(z, z loc , z in , 1) = 0; but, when previous proofs π in are supplied, C pcd,α verifies instead that Π(z, z loc , z in , 0) = 0 and, for each pair
Next, the PCD prover uses P β to convert π α into a proof π attesting that x β ∈ L C pcd,β , where x β := S α→β (x α ) ∈ F n β r β ; this is merely a translation because C pcd,β only verifies that π α is valid. The proof π is P's output. The PCD verifier. The PCD verifier V takes as input a verification key vk, message z, and proof π. Proofs relative to (G α , P α , V α ) are never "seen" outside the PCD prover, because the prover converts them to proofs relative to (G β , P β , V β ). Hence, the proof π is relative to (G β , P β , V β ), and the PCD verifier checks that z is Π-compliant by checking that V β (vk β , x β , π) = 1, where
Security
The intuition for the security of (G, P, V) is straightforward. Suppose that a malicious polynomial-size prover P outputs a message z and proof π that are accepted by the PCD verifier V. Our goal is to deduce that z is Π-compliant. By construction of V, we deduce that S α→β (H α (bits α (vk β ) bits α (z))) ∈ L C pcd,β . In turn, by construction of C pcd,β , we deduce that H α (bits α (vk β ) bits α (z)) ∈ L C pcd,α . In turn, by construction of C pcd,α , we deduce that there is local data z loc and previous messages z in such that one of the following holds: (i) Π(z, z loc , z in , 1) = 0, which is the base case; or (ii) Π(z, z loc , z in , 0) = 0 and, for each incoming message z in , S α→β (H α (bits α (vk β ) bits α (z in ))) ∈ L C pcd,β (and thus, by induction, that each z in is Π-compliant). In either case, we conclude that z is Π-compliant.
The above argument can be formalized by using the proof-of-knowledge property of zk-SNARKs. Yet, as explained in Section 2.3, a formal argument lies beyond the scope of this paper, which instead focuses on practical aspects of PCD systems; see [BCCT13] for more details.
Output the Fr α -arithmetic circuit C pcd,α that, given input xα ∈ F nα rα and witness aα ∈ F hα rα , works as follows: 1. Parse aα as (vk β , z, z loc , z in , b base , π in , bres). 2. Compute σ vk,β := C bits,α (vk β ). 3. Check that xα = CH,α(σ vk,β C bits,α (z)). PARAMETERS. Message size nmsg ∈ N, local-data size n loc ∈ N, and arity s ∈ N.
• OUTPUTS: proving key pk and verification key vk 1. Set nα := dH,α and n β := nα· log rα log r β .
2. Construct CH,α, the Fr α -arithmetic circuit implementing Hα : {0, 11. Compute (pk β , vk β ) := G β (C pcd,β ). 12. Set pk := (pk α , pk β , vkα, vk β ) and vk := (vkα, vk β ). 13. Output (pk, vk).
PCD prover P • INPUTS:
-proving key pk -outgoing message z ∈ F nmsg rα -local data z loc ∈ F n loc rα -incoming messages z in ∈ F s·nmsg rα -previous proofs π in ( π in = ⊥ in the base case, as there is no previous proofs) • OUTPUTS: proof π for the outgoing message z 1. Compute xα := Hα(bitsα(vk β ) bitsα(z)) ∈ F nα rα and x β := S α→β (xα) ∈ F n β · log r β rα , and parse x β as lying in F n β r β . 2. If base case (i.e., π in = ⊥), then set aα := (vk β , z, z loc , z in , 1, * , * ), where * is any assignment (of the correct length). 3. If not base case (i.e., π in = ⊥), then set aα := (vk β , z, z loc , z in , 0, π in , 1). 4. Compute πα := Pα(pk α , xα, aα). 5. Set a β := (πα). 6. Compute π := P β (pk β , x β , a β ). 7. Output π. 
Constructions of arithmetic circuits
In Section 4 we gave a construction of a PCD system (G, P, V) in terms of two preprocessing zk-SNARKs, based on a PCD-friendly 2-cycle (E α , E β ), and various arithmetic circuits. We now discuss concrete implementations of these arithmetic circuits, which determine the sizes of C pcd,α and C pcd,β (see Equation 1 ).
In our code implementation, (E α , E β ) equals (E 4 , E 6 ), a specific 2-cycle based on MNT curves of embedding degree 4 and 6, selected to have high 2-adicity (see Section 3.2). Thus, in the text below, "α = 4 and β = 6". 15 We obtain the following efficiency for the two circuits C pcd,4 and C pcd,6 .
Lemma 5.1 (informal)
Total 32027
Next, we discuss the various subcircuits: for the zk-SNARK verifiers and for collision-resistant hashing.
Remark 5.2. We selected the PCD-friendly 2-cycle (E 4 , E 6 ) to have high 2-adicity: it has ν 2 (r 4 − 1) = 34 and ν 2 (r 6 − 1) = 17. These values are not accidental, but were chosen so that the 2-cycle (E 4 , E 6 ) suffices for "essentially all practical uses" of our PCD system. Specifically, recall that we would like, for efficiency reasons, that (i) ν 2 (r 4 − 1) ≥ log |C pcd,4 | and (ii) ν 2 (r 6 − 1) ≥ log |C pcd,6 | (see Section 3.2 and Appendix C.2). First, since log |C pcd,6 | = 15, Condition (ii) holds always. As for Condition (i), it depends on Π; however, since ν 2 (r 4 − 1) = 34 is so large, it is not a limitation for practically-feasible choices of Π. 16
Arithmetic circuits for zk-SNARK verifiers
We seek arithmetic circuits for the two zk-SNARK verifiers: an F r 6 -arithmetic circuit C V,4 implementing V 4 and an F r 4 -arithmetic circuit C V,6 implementing V 6 . Note the field characteristics: V 4 's arithmetic operations are over F q 4 (which is equal to F r 6 ) and V 6 's operations are over F q 6 (which is equal to F r 4 ). We design and construct C V,4 and C V,6 , each consisting of two subcircuits for the "offline" and "online" parts of the verifier (see Section 4.1), and achieve the following efficiency:
• There is an F q 4 -arithmetic circuit C V,4 with size
that implements V 4 for all inputs x ∈ F n r 4 such that each x i has at most l bits. (Naturally, l ≤ log r 4 .) Moreover, C V,4 consists of two subcircuits, C 15 The choice (Eα, E β ) = (E4, E6), rather than (Eα, E β ) = (E6, E4), is intentional. We expect C pcd,α to be larger than C pcd,β (due to a larger number of checks), so that Eα should be the curve with the higher 2-adicity. In this case, E4 is twice as 2-adic as E6.
16 More precisely, if we take |Π| + s · 89412 to be the leading terms in |C pcd,4 | (which we expect to be the case), we obtain that log(|Π| + s · 89412) ≤ 1 + max{log |Π|, 17 + log s}, which is likely to be well below 34.
• There is an F q 6 -arithmetic circuit C V,6 with size (10 · l − 4) · n + 83,181 that implements V 6 for all inputs x ∈ F n r 6 such that each x i has at most l bits. (Naturally, l ≤ log r 6 .) Moreover, C V,6 consists of two subcircuits, C offline V, 6 [BCTV14] address this second part by (i) obtaining optimized implementations of sub-components of a pairing, and then (ii) combining these in a way that is tailored to V 's protocol. In short, after breaking a pairing into its two main parts, the Miller loop and the final exponentiation, and implementing both (using optimal pairings [Ver10] Our techniques for fast circuit verification of V . The high-level structure of our construction of C V,4 and C V,6 mirrors that of our software implementation of V 4 and V 6 , itself based on techniques from [BCTV14] . Namely, both C V,4 and C V,6 also break an (optimal) pairing into a Miller loop and final exponentiation, and combine these components in a way that is tailored to the verifier protocol.
However, our construction differs in how these two components are implemented, especially with regard to the Miller loop. This is because, in our setting, two main operations come "for free": (a) field operations over the circuit's field, and (b) nondeterministic guessing (i.e., auxiliary advice). In particular, field divisions cost the same as field multiplications (since we can guess the answer and check it).
Traditional software implemantions go to great lengths to avoid expensive field divisions (e.g., by use of projective coordinates instead of affine ones, and in "addition" and "doubling" steps in the Miller loop). By contrast, both C V,4 and C V,6 perform the Miller loop by using affine coordinates for both curve arithmetic and divisor evaluations [LMN10] , which can be done very efficiently by nondeterministic arithmetic circuits.
Moreover, sharing Miller loop subcomputation traditionally only applies to products of pairings, of which there are only two in the verifier. Instead, in our setting, such techniques extend to ratios of products of pairings, and can thus be applied to every pairing check in the verifier, to further improve efficiency.
Overall, in our software implementation, the number of field multiplications used to compute the checks of V 4 , V 6 is 3.8×, 3.2× more than the number of those used by C V,4 , C V,6 to verify them, respectively.
Arithmetic circuits for collision-resistant hashing
We also require arithmetic circuits for hashing: an F r 4 -arithmetic circuit C H,4 for a collision-resistant function
) + n msg · log r 4 ; indeed, C pcd,α uses H 4 to hash (the binary representation of) both the verification key vk 6 and a message z.
We base collision-resistant hashing on subset-sum functions [Ajt96, GGH96] , chosen to have an especially compact representation as arithmetic circuits over the zk-SNARK's "native field". Subset sums. We have fixed the prime of the subset sum to be r 4 ; this ensures that C H,4 , which is defined over F r 4 , works over the correct ring, and only requires d H,4 gates. Next, for any given dimension d H,4 and PCD message length n msg , we set the input length to m H,4 := vk,6 ( d H,4 · log r 4 log r 6 ) + n msg · log r 4 , to ensure the aforementioned condition on the input and output lengths. There remains to fix the output length d H, 4 . This is delicate, because it affects security (and recall we aim at 80-bit security). Since r 4 is a 298-bit prime, it appears heuristically sufficient to fix d H,4 = 1 [JJ98] . 18 In particular, this yields |C H,4 | = 1.
Remark 5.4 (boolean input)
. Ideally, we would like a collision-resistant function whose "natural" domain is strings of F r 4 -elements, rather than strings of bits (as in subset-sum functions). Indeed, converting a string x ∈ F m p to its binary representation s ∈ {0, 1} m· log p (in order to "prepare" the function's input) costs m · log p gates, which is a nontrivial contribution to the size of C pcd,α unless one keeps m quite small (see Section 4). While a subset-sum function H M : {0, 1} m → Z d p continues to remain collision-resistant even for domains consisting of "small-norm" vectors (of which binary strings are a special case), H M is not collision-resistant (or even one-way) when the domain is enlarged to include all elements in Z m p (simply because, being a linear function, it can be efficiently inverted). It is an open question whether the cost of converting to binary strings can be avoided, via some other choice of hash function.
Scalable zk-SNARKs
Having constructed a PCD system (see Section 4 and Section 5), we use it to obtain a new zk-SNARK that is scalable (i.e., fully succinct and incrementally computable).
Specifying a machine
A notable feature of our zk-SNARK is generality: it can prove/verify correctness of executions on any given random-access machine M, specified by a memory configuration and a corresponding CPU circuit. For instance, M may encode a floating-point-arithmetic processor for running quantitative analysis programs; or, M may encode a SIMD-based architecture for running multimedia programs.
Parameters. More precisely, a machine M is specified by a tuple (A, W, N, CPU exe , CPU ver ) where:
• A, W ∈ N specify that (random-access) memory contains A addresses each storing W bits, i.e., that memory is a function M : [A] → {0, 1} W ; • N ∈ N specifies the length, in bits, of a CPU state;
• CPU exe is a (stateful) function for executing the CPU; • CPU ver is an F-arithmetic circuit for verifying the CPU's execution. We now elaborate on the above parameters. For more details, see Appendix A.2. 17 We do not require the hash function to be universal, so we do not need to add a random vector to the subset sum. 18 Recent works [LM06, PR06, LMPR08, ADLM + 08, BLPRS13] use a small modulus and larger dimension, but our "native" modulus is already a large one.
Execution on M.
A computation on M proceeds in steps, as determined by CPU exe , which can be thought of as M's "processor": step after step, CPU exe takes the previous state and instruction (and its address), executes the instruction, communicates with random-access memory, and produces the next state and instruction address. More precisely, each step consists of two phases:
• Instruction fetch. Given the current CPU state s cpu ∈ {0, 1} N and address a pc ∈ [A] of the instruction to be executed, the new instruction to be executed is fetched: v pc := M(a pc ) ∈ {0, 1} W .
• Instruction execution. For an auxiliary input g ∈ {0, 1} W , CPU exe receives (s cpu , a pc , v pc , g) and outputs (a mem , v st , f st ), where a mem ∈ [A] is an address, v st ∈ {0, 1} W a value, and f st ∈ {0, 1} a store flag. Afterwards, CPU exe receives v ld := M(a mem ) ∈ {0, 1} W (i.e., the value at the address) and outputs a new CPU state s cpu ∈ {0, 1} N , an address a pc ∈ [A] for the next instruction, and a flag f acc ∈ {0, 1} denoting whether the machine has accepted. Meanwhile, if a store was requested, it is performed: if f st = 1 then M(a mem ) := v st . Finally, at the end of every step, CPU exe 's state is reset.
See Figure 3 for a diagram of these two phases.
Verification of the CPU. The circuit CPU ver verifies the correct input/output relationship of CPU exe (but not memory consistency). In other words CPU exe satisfies the following property:
, 1}, and let x ver be the concatenation of all these. There is a witness a ver such that CPU ver (x ver , a ver ) = 0 iff (a mem , v st , f st ) ← CPU exe (s cpu , a pc , v pc , g) and, afterwards, (s cpu , a pc , f acc ) ← CPU exe (v ld ). Moreover, a ver can be efficiently computed from x ver .
While we do not care about how the function CPU exe is specified (e.g., it can be a computed program), CPU ver must be an arithmetic circuit; if CPU ver is defined over F, we say that M has verification over F. 
Construction summary
The construction of the new zk-SNARK consists of the following transformation:
proof-carrying data system (G, P, V) for F-arithmetic compliance predicates ⇓ scalable zk-SNARK (G , P , V ) for any random-access machine M = (A, W, N, CPU exe , CPU ver ) with verification over F The transformation's outline is as follows (see Section 2.3). First, given M, we design a compliance predicate Π M,H for the incremental verification of M's execution, when its random-access memory M is delegated via memory-checking techniques based on a collision-resistant hash H [BEGKN91, BCGT13a] . Then, we use the PCD system (G, P, V) to enforce the compliance predicate Π M,H , and thereby construct the algorithms zk-SNARK (G , P , V ) of the new zk-SNARK, which is fully-succinct and incrementally-computable.
For the first part, we again use field-specific subset-sum functions for constructing circuits that verify authentication paths (Section 6.3), and then combine these, together with M's CPU circuit, to construct Π M,H (Section 6.4). For the second part, the construction of the new zk-SNARK's three algorithms is fairly straightforward in light of previous work, and we include its details for completeness (Section 6.5).
Later, in Section 7, we evaluate our scalable zk-SNARK when the machine M equals vnTinyRAM.
Arithmetic circuits for secure loads and stores
We construct arithmetic circuits for checking loads/stores of an untrusted random-access memory, relative to a (trusted) root of a Merkle tree over the memory; this task is known as memory checking (see Remark 6.1). Let A, W ∈ N specify that memory contains A addresses each storing W bits, i.e., that memory is a function M : [A] → {0, 1} W . Let H : {0, 1} m → {0, 1} be a collision-resistant function suitable for building binary Merkle trees over M (i.e., m ≥ W and m/ ≥ 2); we say that H is (A, W )-good. For a field F, let C H be an F-arithmetic circuit that verifies H; we construct the following two F-arithmetic circuits. Secure load. A secure-load circuit C SecLd that, for a given address a, checks the validity of a loaded value v against a Merkle-tree root ρ. More precisely, the circuit C SecLd satisfies the following property: for any root ρ ∈ {0, 1} , address a ∈ [A], value v ∈ {0, 1} W , and authentication path p ∈ {0, 1} W +( log A −1) , C SecLd (ρ, a, v, p) = 0 if and only if p is a valid authentication path for the value v as the a-th leaf in a Merkle tree of root ρ. One can verify that the size of such a circuit is log A · (|C H | + 2 ), because the check can be performed via log A invocations of H plus 2 gates per level. 19 Secure load-then-store. A secure-load-then-store circuit C SecLdSt that, for a given address a, checks: (i) the validity of a loaded value v ld against a Merkle-tree root ρ; and (ii) the validity of storing v st , to the same address, against a (possibly different) Merkle-tree root ρ . More precisely, the circuit C SecLdSt satisfies the following property: for any two roots ρ, ρ ∈ {0, 1} , address a ∈ [A], two values v ld , v st ∈ {0, 1} W , and authentication path p ∈ {0, 1} W +( log A −1) , C SecLdSt (ρ, ρ , a, v ld , v st , p) = 0 if and only if:
• p is a valid authentication path for the value v ld as the a-th leaf in a Merkle tree of root ρ, AND • p is a valid authentication path for the value v st as the a-th leaf in a Merkle tree of root ρ . One can verify that the size of such a circuit is log A · (2|C H | + 4 ). Instantiation with subset-sum functions. We are left to choose the function H and construct C H , required to obtain the two circuits C SecLd and C SecLdSt .
As in Section 5.2, subset-sum functions are a natural candidate, for efficiency considerations. Namely, since F has prime order p, its additive group is isomorphic to Z p ; hence, for M ∈ Z d×m p , the subset-sum function H M : {0, 1} m → Z d p can be computed with only d gates over F. Unlike Section 5.2, however, both C SecLd and C SecLdSt require H's outputs to be inputs to other invocations of H. Thus, here we treat a subset-sum function as having binary output: H M : {0, 1} m → {0, 1} where := d · log p . Doing so requires additional gates, summing up to a total of d + = d · (1 + log p ) gates over F to compute H M . Moreover, the condition on input length is different: here we need to ensure that the function is (A, W )-good, which requires that m ≥ max{W, 2 } = max{W, 2d log p }.
Overall, if we set H := H M (for a random M ), we can achieve circuit sizes that are:
In terms of concrete numbers, recalling from Section 5.2 that d = 1 and log p = 298, we get that |C SecLd | = log A · 895 and |C SecLdSt | = log A · 1,790.
Remark 6.1 (memory checking). Memory checking was introduced by Blum et al. [BEGKN91] ; they showed how to use Merkle hashing to delegate a machine's memory to an untrusted storage, and dynamically verify its consistency using only a small poly(λ)-size "trusted" memory. Blum et al. instantiated Merkle hashing with universal one-way hash functions [NY89, Rom90] . Yet, in general, a machine's computation includes a "nondeterministic component", e.g., an auxiliary input. In such a case (as in this paper), Merkle hashing must be based on hash functions that are collision resistant.
Memory checking techniques have found numerous practical applications for securing untrusted storages [MVS00, MS01, GSCvDD03, GSMB03, KRSWF03]. Ben-Sasson et al. [BCGT13a] suggested that verification of memory via Merkle hashing can be a useful computational alternative to the informationtheoretic use of nondeterministic routing for efficient circuit generators. For instance, the recent circuit generator of Braun et al. [BFRS + 13] uses memory checking to verify accesses to an untrusted storage.
The RAM compliance predicate
Given a random-access machine M and a (suitable) collision-resistant function H, we construct a compliance predicate Π M,H that checks a step of execution of M. The transformation is:
Briefly, a message z for Π M,H encodes a short representation of M's state at a given time step. Then, at a node with input message z in and output message z out , the compliance predicate Π M,H checks that the transition from the state in z in to the state in z out is a valid transition of the machine M. Below, we make this plan more concrete by describing the format of messages and local data for Π M,H , and by describing the checks performed by Π M,H . See Figure 5 for details (and Appendix A.2 as reference). Format of messages. A message z summarizes M's entire state at a given time t, by storing the following:
• a timestamp t, denoting how many computation steps have occurred;
• a root ρ of a Merkle tree of random-access memory (after t computation steps); • a CPU state s cpu (after t computation steps); and • a flag f acc denoting whether the machine has accepted (after t computation steps). Furthermore, z also stores ρ 0 , the root of a Merkle tree over initial memory, so to "remember" M's input. Note that a message z is short because the large memory is "summarized" by the short root of a Merkle tree. Format of local data. Now consider a node with input message z in and output message z out . The goal of Π M,H is to verify M's transition from z in to z out , using two main tools:
• CPU ver for checking CPU transitions, given consistent memory accesses ("what you store is what you get");
• C SecLd and C SecLdSt for checking memory accesses. Thus, in the local data z loc provided at a node, we store whatever auxiliary information is needed by Π M,H to evaluate CPU ver (e.g., requested memory addresses, memory values, flags, etc.) and C SecLd and C SecLdSt (e.g., addresses, values, and authentication paths). Furthermore, z loc includes a flag f halt specifying whether the computation should halt or not (as, in such a case, Π M,H will perform a different set of checks). Construction. The compliance predicate Π M,H takes an input (z out , z loc , z in , b base ), where z out is the outgoing message, z loc the local data, z in the incoming message, and b base the base-case flag, and must verify M's transition from z in to z out . (Thus, Π M,H has arity s = 1.) Our construction of Π M,H goes as follows.
In the base case (i.e., b base = 1), Π M,H ensures that z in is correctly initialized: its timestamp, CPU state, instruction address, accept flag should all be set to zero; Π M,H also checks that the root of the Merkle tree of memory is equal to that of the Merkle tree of initial memory.
Moreover, regardless of base case or not, Π M,H always checks that the root of the Merkle tree of initial memory is preserved from z in to z out , in order to not "forget" what the initial state of the machine was.
When the computation does not halt (i.e., f halt = 0), Π M,H checks that the timestamp is incremented by 1 and that CPU ver (on the appropriate inputs) accepts; furthermore, it uses C SecLd to check that the instruction was correctly loaded and C SecLdSt to check that the memory access (a load or a store) was correctly performed.
When the computation does halt (i.e., f halt = 1), Π M,H first of all ensures that the computation has in fact accepted so far; then it clears out the root of the Merkle tree over memory and the CPU state (as these may leak information about the private auxiliary input) and ensures that the time step in z out is at least as large as the number of steps so far. (Again for privacy reasons, Π M,H does not force z out to carry the exact number of computation steps, but only a number that is at least that much.) Overall, the above checks suffice for Π M,H to ensure that any Π M,H -compliant distributed computation corresponds to correctly initializing, stepping through, and halting an accepting computation of M. Efficiency. By implementing Π M,H as an F-arithmetic circuit, we obtain the following efficiency:
where ε is a "small but ugly" term, depending on d, N, F, that can be upper bounded as follows ε ≤ 2 · 301 + 4N + 2 + 301 + 4N + 2 log |F| + 24 N log |F| + 2N + 12 log |F| + + 10 .
Crucially, |Π M,H | only depends (nicely) on M and H, but is independent of the computation length on M:
the term |CPU ver | is the cost of verifying M's CPU (and depends on "how complex" is the CPU); while the term |C SecLd | + |C SecLdSt | = log A · d · (3 + 9 log p ) is the per-cycle cost to ensure memory consistency via collision-resistant hashing (see Section 6.3). In Section 7, we consider the case when M equals vnTinyRAM (a simple RISC von Neumann machine), with wordsizes w ∈ {16, 32} and k = 16 registers. In Figure 4 we report, for these cases, the size of Π M,H , its sub-circuits, and the resulting PCD circuits C pcd,4 and C pcd,6 (which affect the PCD system's efficiency). Remark 6.2. The per-cycle cost of ensuring memory consistency via collision-resistant hashing (i.e., |C SecLd | + |C SecLdSt |) is typically much larger than that incurred when using nondeterministic routing (in [BCTV14] , it is less than 1000). However, collision-resistant hashing ultimately enables scalability, whereas nondeterministic routing is not known to be useful for scalability (also see Section 8).
16-bit vnTinyRAM 32-bit vnTinyRAM
In particular, while in Section 7 we focus on the case M = vnTinyRAM, for which |CPU ver | ≈ 10 3 , we could have chosen more complex machines. Indeed, even if CPU ver had a few tens of thousands of gates, the size of Π M,H would remain on the order of 10 5 gates. In other words, our scalable zk-SNARK can accommodate much more complex machines at a relatively small additional cost.
Format of messages for ΠM,H . A message z for the compliance predicate Π M,H is a tuple z = (ρ0, t, ρ, scpu, facc) where:
• ρ0 ∈ {0, 1} is an output of H; allegedly, it is the root of a Merkle tree whose leaves are a program P (i.e., initial memory).
• t ∈ {0, 1} 300 is a timestamp; allegedly, it is the number of computation steps so far.
(For concreteness, we bound all computations to 2 300 steps, which is good enough for a long while.)
• ρ ∈ {0, 1} is an output of H; allegedly, it is the root of a Merkle tree whose leaves are Mt (memory after t steps).
• scpu ∈ {0, 1} N is a CPU state; allegedly, it is the machine's CPU state after t steps of computation.
• facc ∈ {0, 1} is a flag which denotes whether the machine has accepted so far or not. The length nmsg of a message is equal to 2 + 300 + N + 1.
Format of local data for ΠM,H . Local data z loc for the compliance predicate Π M,H is a tuple z loc = (apc, amem, a pc , vpc, vst, v ld , g, fst, f halt , aver, ppc, pmem) where:
• apc, amem, a pc ∈ [A] are memory addresses.
• vpc, vst, v ld ∈ {0, 1} W are memory values.
W is a non-deterministic guess.
• fst, f halt ∈ {0, 1} are flags.
• aver is a witness for the F-arithmetic circuit CPUver.
• ppc, pmem ∈ {0, 1} W +( log A −1) are authentication paths for Merkle trees over memory. The length n loc of local data is equal to (3 + 2 ) · log A + 6W + 2 − 2 + |aver|.
Compliance predicate ΠM,H .
• INPUTS:
-output PCD message zout = (ρ 0 , t , ρ , s cpu , f acc ) ∈ {0, 1} nmsg -local data z loc = (apc, amem, a pc , vpc, vst, v ld , g, fst, f halt , aver, ppc, pmem) ∈ {0, 1} New zk-SNARK prover P • INPUTS: proving key pk, program P, time bound T , and auxiliary input G = (g0, g1, . . . , gT −1)
• OUTPUTS: proof π for the instance (P, T )
1. Use H to compute ρ0, the root of the Merkle tree over P. • a new CPU state (scpu,i+1 ∈ {0, 1} N ), • an address for the next instruction (apc,i+1 ∈ [A]), and • a flag denoting whether the machine has accepted (facc,i+1 ∈ {0, 1}). Reset CPUexe's state. (g) Create the next message: zmsg,i+1 := (ρ0, i + 1, ρi+1, scpu,i+1, facc,i+1). (h) Deduce aver from xver := (scpu,i, scpu,i+1, apc,i, amem,i, apc,i+1, vpc,i, vst,i, v ld,i , gi, fst,i, facc,i+1). (i) Let ppc,i (resp., pmem,i) be the authentication path for address apc,i (resp., amem,i) in Mi. (j) Create local data: z loc,i+1 := (apc,i, amem,i, apc,i+1, vpc,i, vst,i, v ld,i , gi, facc,i+1, 0, aver, ppc,i, pmem,i). (k) Compute the next proof: πi+1 := P(pk pcd , zmsg,i+1, z loc,i+1 , zi, πi). 7. Prepare the final message: z msg,fin := (ρ0, T, 0 , 0 N , 1). 8. Prepare the final local data: z loc,fin := ( * , * , * , * , * , * , * , * , 1, * , * , * ), where * can be set to anything of the right length. 9. Compute the final proof: π := P(pk pcd , z msg,fin , z loc,fin , zT , πT ). 10. Output π. 
Evaluation on vnTinyRAM
We evaluate our scalable zk-SNARK when the given random-access machine M equals vnTinyRAM, a simple RISC von Neumann architecture [BCTV14, BCGTV13b] . For comparison, we also compare [BCTV14] 's preprocessing zk-SNARK (which also supports vnTinyRAM) with our scalable zk-SNARK.
We ran our experiments on a desktop PC with a 3.40 GHz Intel Core i7-4770 CPU and 16 GB of RAM available. Unless otherwise specified, all times are in single-thread mode; as for our multi-core experiments, we enabled one thread for each of the CPU's 4 cores (for a total of 4 threads). Recalling vnTinyRAM. The architecture vnTinyRAM is parametrized by the word size, denoted w, and the number of registers, denoted k. In terms of instructions, vnTinyRAM includes load and store instructions for accessing random-access memory (in byte or word blocks), as well as simple integer, shift, logical, compare, move, and jump instructions. Thus, vnTinyRAM can efficiently implement control flow, loops, subroutines, recursion, and so on. Complex instructions (e.g., floating-point arithmetic) are not directly supported and can be implemented "in software". See Appendix A.3 for how vnTinyRAM can be expressed in our random-access machine formalism (i.e., given w, k, how to construct M to express w-bit vnTinyRAM with k registers). Costs on vnTinyRAM. The performance of our zk-SNARK (G , P , V ) on vnTinyRAM is easy to characterize, because it is determined by few quantities. For the key generator G , the relevant quantities are:
• the constant time and space complexity of G , when given as input a description of vnTinyRAM; and • the constant sizes of the generated proving key pk and verification key vk. For the proving algorithm P , which proceeds step by step alongside the original computation, they are:
• the constant time necessary to incrementally compute the new (constant-size) proof at each step; and • the constant space needed to compute the new proof (on top of the space needed by the original program). 21 Finally, the verifier V takes as input a program P and a time bound T , and runs in time O(|P| + log T ); in our implementation, we fix T ≤ 2 300 (plenty enough), so that V runs in time O(|P|).
In Figure 7 , we report our measurements for two settings of vnTinyRAM: (w, k) = (16, 16) and (w, k) = (32, 16), i.e., 16-bit and 32-bit vnTinyRAM with 16 registers. (The same settings as in [BCTV14] .)
Comparison with [BCTV14] . In Figure 8 , we compare the efficiency of [BCTV14] 's preprocessing zk-SNARK and our scalable zk-SNARK, for a (random) program P of 10 4 instructions, as a function of T (the number of vnTinyRAM computation steps).
The (approximate) asymptotic efficiency for [BCTV14] was obtained by linearly interpolating [BCTV14] 's measurements (which were collected on a machine with similar characteristics as our benchmarking machine). As for our measurements, we use the relevant numbers from Figure 7 . Conclusion. Our experiments demonstrate that, as expected, our approach is slower for small computations but, on the other hand, offers scalability to large computations by avoiding any space-intensive computations.
Indeed, [BCTV14] (as well as other preprocessing zk-SNARK implementations [PGHR13, BCGTV13a]) require space-intensive computations to maintain their efficiency. As T grows, such approaches simply run out of memory, and must resort to "computing in blocks", sacrificing time complexity. 22 In contrast, our zk-SNARK, while requiring more time per execution step, merely requires a constant amount of memory to prove any number of execution steps. In particular, our zk-SNARK becomes more space-efficient than [BCTV14] 's zk-SNARK when T > 422 for 16-bit vnTinyRAM, and when T > 321 for 32-bit vnTinyRAM; moreover, these savings in space grow unbounded as T increases. key generator key sizes prover verifier TIME SPACE |pk| |vk| TIME SPACE TIME 16-bit vnTinyRAM VIPS. Finally, being scalable, our zk-SNARK implementation is the first to achieve a well-defined clock rate of verified instructions per second (VIPS). For vnTinyRAM, we obtain the following VIPS values:
16-bit vnTinyRAM
16-bit vnTinyRAM 32-bit vnTinyRAM 
Hz
While perhaps too slow for most applications, our prototype empirically demonstrates the feasibility of the bootstrapping approach as a way to achieve scalability of zk-SNARKs and, more generally, to achieve the rich functionality of proof-carrying data.
Open problems
Higher clock rate. There are ample opportunities for improving the clock rate of "verified instructions per second". Besides potential improvements in the cryptographic protocol and elliptic curves, there is also an engineering challenge. In particular, the algorithms are highly amenable to parallelism and hardware support. Since each step of proof generation in our zk-SNARK is a constant-size operation, it could even be carefully optimized and wholly implemented in a fixed-sized, general-purpose "proving processor" hardware.
Other PCD-friendly cycles. The PCD-friendly 2-cycle proposed in this paper facilitates a great improvement in the efficiency of recursively composing pairing-based zk-SNARKs. Do there exist any other PCD-friendly 2-cycles, not based on MNT curves? Or cycles of length greater than 2? Are these easier to find, achieve smaller bit size and higher 2-adicity, or admit faster nondeterministic pairing verification? Investigating these questions may lead to further efficiency improvements to recursive proof composition. Another consideration is that, with MNT-based PCD-friendly cycles, increasing the security level is costly, since one of the curves has low embedding degree (k = 4, for which 128-bit security requires q 4 ≥ 2 750 [FST10]). Alternative zk-SNARKs constructions. What are the advantages or disadvantages of pairing-based zk-SNARKs in which the pairing is not instantiated via a pairing-friendly elliptic curve, but instead via lattice techniques [GGH13] ? Moreover, are there preprocessing zk-SNARKs that are not based on pairings? (E.g., can they be based on groups without bilinear maps?)
A Computation models
We introduce notions and notations for two computation models used in this paper: arithmetic circuits (see Appendix A.1) and random-access machines (see Appendix A.2).
A.1 Arithmetic circuits
We work with circuits that are not boolean but arithmetic. Given a field F, an F-arithmetic circuit takes inputs that are elements in F, and its gates output elements in F. We naturally associate a circuit with the function it computes. The circuits we consider only have bilinear gates, 23 and a circuit's size is defined as the number of gates. To model nondeterminism we consider circuits with an input x ∈ F n and an auxiliary input a ∈ F h , called a witness. Arithmetic circuit satisfiability is analogous to the boolean case, as follows.
Definition A.1. Let n, h, l ∈ N respectively denote the input, witness, and output size. The circuit satisfaction problem of an F-arithmetic circuit C : F n × F h → F l (with bilinear gates) is defined by the relation
At times, we also write C(x, a) = 0 to mean C(x, a) = 0 l for an unspecified l. All the arithmetic circuits we consider are over fields F p with p prime.
A.2 Random-access machines
There are many possible definitions of random-access machines [CR72, AV77]. Here we formulate a concrete, yet relatively flexible, definition that suffices for the purposes of this paper. Informally, a machine is specified by a configuration for random-access memory (number of addresses, and number of bits stored at each address) and a CPU. At each step, the CPU gets the current state and the next instruction from memory; executes the instruction; communicates with memory (by storing or loading data); and then outputs the next state and the address for the next instruction. (Thus, random-access memory contains both program and data.) More precisely, a (non-deterministic) random-access machine with verification over a finite field F is a tuple M = (A, W, N, CPU exe , CPU ver ) where:
• A, W ∈ N specify that (random-access) memory M contains A addresses each storing W bits (i.e., that memory is a function M : [A] → {0, 1} W ).
• N ∈ N specifies the length, in bits, of a CPU state.
• CPU exe is a (stateful) function for executing the CPU (see below).
• CPU ver is an F-arithmetic circuit for verifying the CPU's execution (see below). The machine M takes as input a program P and an auxiliary input G, and computes on them. More precisely:
• A program for M is a function P : [A] → {0, 1} W that specifies the initial memory contents. The program P is typically represented in sparse form, by listing the (few) addresses and values for non-zero memory entries, which may store any code and data to the machine.
• An auxiliary input for M is a sequence G = (g 0 , g 1 , g 2 , . . . ). Each g i consists of W bits and is accessed at the i-th computation step. The auxiliary input is treated as a nondeterministic guess. Then, the computation of M on program P and auxiliary input G, denoted M(P; G), proceeds as follows. Initialize the CPU state and instruction address to zero: s cpu,0 := 0 N , a pc,0 := 0. Next, for i = 0, 1, 2 . . . :
1. CPU exe is given the current CPU state (s cpu,i ∈ {0, 1} N ), address of the instruction to be executed (a pc,i ∈ [A]), instruction to be executed (v pc,i := M i (a pc,i ) ∈ {0, 1} W ), and guess (g i ∈ {0, 1} W ).
2. CPU exe outputs an address (a mem,i ∈ [A]), a value (v st,i ∈ {0, 1} W ), and a store flag (f st,i ∈ {0, 1}).
Thus CPU exe can be thought of as M's "processor": step after step, CPU exe takes the previous state and instruction (and its address), executes the instruction, communicates with random-access memory, and produces the next state and instruction address. In contrast, CPU ver is a predicate that verifies the correct input/output relationship of CPU exe . In other words CPU exe satisfies the following property:
, 1}, and let x ver be the concatenation of all these. There is a witness a ver such that CPU ver (x ver , a ver ) = 0 iff (a mem , v st , f st ) ← CPU exe (s cpu , a pc , v pc , g) and, afterwards, (s cpu , a pc , f acc ) ← CPU exe (v ld ).
Moreover, a ver can be efficiently computed from x ver .
Of course, CPU ver may simply internally execute CPU exe to perform its verification; but, having access to additional advice a ver , CPU ver may instead perform "smarter", and more efficient, checks. We are not concerned about how the function CPU exe is specified (e.g., it can be a computed program), but CPU ver must be specified as an F-arithmetic circuit (for an appropriate F that we will discuss). The language of accepting computations. We define the language of accepting computations on M. A program P is treated as "given", while the auxiliary input G is treated as a nondeterministic advice.
Definition A.2. For a random-access machine M, the language L M consists of pairs (P, T ) such that:
• P is a program for M;
• T is a time bound;
• there exists an auxiliary input G such that M(P; G) accepts in at most T steps. We denote by R M the relation corresponding to L M .
In this paper we obtain an implementation of scalable zk-SNARKs for proving/verifying membership in the above language (see Appendix E for a definition). We evaluate our system for a specific choice of machine: vnTinyRAM, a simple RISC von Neumann architecture introduced by [BCTV14] (see below). Of course, other choices of random-access machines are possible, and our implementation supports them.
A.3 The architecture vnTinyRAM
We evaluate our scalable zk-SNARK on an architecture that previously appeared in (preprocessing) zk-SNARK implementations: vnTinyRAM [BCTV14] . (See Section 7.) We explain how to set "M = vnTinyRAM", i.e., how to specify the architecture vnTinyRAM via the formalism introduced above (and used by our prototype).
Given w, k, we want to construct a tuple M = (A, W, N, CPU exe , CPU ver ) that implements w-bit vnTinyRAM with k registers. First we need to specify the parameters A, W ∈ N for random access memory. vnTinyRAM accesses memory, consisting of 2 w bytes, either as bytes or as words; moreover, vnTinyRAM instructions (which are stored in memory) take two words to encode in memory. Thus, we set A, W so that memory consists of A := 8·2 w 2w addresses, each storing W := 2w bits. Next, we set the CPU state length to N := (1 + k)w + 1 because, in vnTinyRAM, a CPU state consists of the program counter (w bits), k general-purpose registers (each of w bits), and a (condition) flag (1 bit). Finally, CPU exe can be chosen to be any program implementation of vnTinyRAM's CPU, while CPU ver can be chosen to be any F-arithmetic circuit for verifying the input-output relationship of CPU exe . In our implementation, F is a prime field of 298 bits (since F = F r 4 ), and we get the following sizes for the two settings we consider:
• for (w, k) = (16, 16), |CPU ver | = 766; and • for (w, k) = (32, 16), |CPU ver | = 1108.
B Pairings and elliptic curves
The cryptographic primitives we study are based on pairings, which we briefly recall in Appendix B.1. Pairings can, in turn, be based on pairing-friendly elliptic curves; in Appendix B.2 we review basic notions about these.
B.1 Pairings
Let G 1 and G 2 be cyclic groups of a prime order r. We denote elements of G 1 , G 2 via calligraphic letters such as P, Q. We write G 1 and G 2 in additive notation. Let P 1 be a generator of G 1 , i. e., G 1 = {αP 1 } α∈Fr ; let P 2 be a generator for G 2 . (We also view α as an integer, so that αP 1 is well-defined.)
A pairing is an efficient map e : G 1 × G 2 → G T , where G T is also a cyclic group of order r (which we write in multiplicative notation), satisfying the following properties:
• BILINEARITY. For every nonzero elements α, β ∈ F r , it holds that e(αP 1 , βP 2 ) = e(P 1 , P 2 ) αβ .
• NON-DEGENERACY. e(P 1 , P 2 ) is not the identity in G T . When describing cryptographic primitives at high level, the choice of instantiation of G 1 , G 2 , G T , e often does not matter. In this paper, however, we discuss implementation details, and such choices matter a great deal. Typically, pairings are based on (pairing-friendly) elliptic curves, discussed next.
B.2 Elliptic curves
We assume familiarity with elliptic curves; here, we only recall the basic definitions in order to fix notation. See, e.g., [Was08, Sil09, FST10, CFAD + 12] for more details. Definition and curve groups. Given a field K, an elliptic curve E defined over K, denoted E/K, is a smooth projective curve of genus 1 (defined over K) with a distinguished K-rational point. We denote by E(K) the group of K-rational points on E; when finite, we denote the cardinality of this group by #E(K). For any r ∈ N, E[r] denotes the group of r-torsion points in E(K), and E(K)[r] the group of r-torsion points in E(K). In this paper, we only consider elliptic curves where K is a finite field F q ; so the definitions below are specific to this case. Trace and CM discrminant. The trace of E/F q is t := q + 1 − #E(F q ). The Hasse bound states that |t| ≤ 2 √ q. If gcd(q, t) = 1, then E/F q is ordinary; otherwise, it is supersingular. If E/F q is ordinary, the CM discriminant of E is the square-free part D of the integer 4q − t 2 , non-negative by the Hasse bound. 24 ECDLP. The elliptic-curve discrete logarithm problem (ECDLP) is the following: given E/F q , P ∈ E(F q ), and Q ∈ P , find a ∈ N such that Q = aP. There are several known methods to solve, with different time and space complexities, the ECDLP. Cryptographic uses require the ECDLP to be hard. For points P of large prime order r, this is widely believed to be the case. Thus, one only considers curves E with trace t = 1 and having cyclic subgroups of E(F q ) of large prime order r. So #E(F q ) is either a prime r, or hr for a small cofactor h.
Pairings. For cryptographic uses that require efficient computation of pairings (such as the uses considered in this paper), suitable elliptic curves need to satisfy additional requirements, as we now recall. 24 Alternatively, some authors define the discriminant to be −D, or the discriminant of the imaginary quadratic field Q( √ −D).
For any r ∈ N with gcd(q, r) = 1, the embedding degree k of E/F q (with respect to r) is the smallest integer such that r divides q k − 1; for such r, a bilinear map e r : E[r] × E[r] → µ r can be defined, where µ r ⊂ F * q k is the subgroup of r-th roots of unity in F q . The map e r is known as the Weil pairing. The Weil pairing is not the only bilinear map that can be defined. Depending on properties of the curve E other, sometimes more efficient, pairings can be defined, e.g., the Tate pairing [FR94, FMR06] , the Eta pairing [BGOhM07] , and the Ate pairing [HSV06] . In each of these cases, the pairing computation requires arithmetic in F q k , so that k cannot be too large. On the other hand, the ECDLP can be translated (via the pairing itself [MOV91, FR94] ) to the discrete logarithm problem over F * q k , which is susceptible to subexponential-time attacks via index calculus [Odl85] , so that k has to be large enough to achieve the desired level of hardness for the DLP in F * q k . In light of the above considerations, an (ordinary) elliptic curve E/F q is said to be pairing friendly if (i) E(F q ) contains a subgroup of large prime order r, and (ii) E has embedding degree k (with respect to r) that is not too large (i.e., computations in the field F q k are feasible) and not too small (i.e., the DLP in F * q k is hard enough). The ideal case is when E has prime order r, and the embedding degree k is such that the ECDLP in E(F q ) and the DLP in F * q k have approximately the same hardness, i.e., are balanced. Instantiations of pairings. A pairing is specified by a prime r ∈ N, three cyclic groups G 1 , G 2 , G T of order r, and an efficient bilinear map e :
(See Appendix B.1.) Suppose one uses a curve E/F q with embedding degree k to instantiate the pairing. Then G T is set to µ r ⊂ F * q k . The instantiation of G 1 and G 2 depends on the choice of e; typically, G 1 is instantiated as an order-r subgroup of E(F q ), while, for efficiency reasons [BKLS02, BLS04] , G 2 as an order-r subgroup of E (F k/d ) where E is a d-th twist of E.
C Preprocessing zk-SNARKs for arithmetic circuit satisfiability
At high-level, a preprocessing zk-SNARK for arithmetic-circuit satisfiability is a cryptographic primitive that provides short and easy-to-verify non-interactive zero-knowledge proofs of knowledge for the satisfiability of arithmetic circuits. A public proving key is used to generate proofs, and a public verification key is used to verify them; the two keys are jointly generated once, and can then be used any number of times. The adjective "preprocessing" denotes the fact that the key pair depends on the arithmetic circuit C whose satisfiability is being proved/verified; in particular, the time to generate a key pair for C is at least linear in the size of C. Below, we informally define this primitive; we refer the reader to, e.g., [BCIOP13] for a formal definition.
Given a field F, a preprocessing zk-SNARK for F-arithmetic circuit satisfiability (see Appendix A.1) is a triple of polynomial-time algorithms (G, P, V ), with V deterministic, 25 working as follows.
• G(1 λ , C) → (pk, vk). On input a security parameter λ (presented in unary) and an F-arithmetic circuit C, the key generator G probabilistically samples a proving key pk and a verification key vk. We assume, without loss of generality, that pk contains (a description of) the circuit C.
The keys pk and vk are published as public parameters and can be used, any number of times, to prove/verify membership in the language L C , as follows.
• P (pk, x, a) → π. On input a proving key pk and any (x, a) ∈ R C , the prover P outputs a non-interactive proof π for the statement "x ∈ L C ".
• V (vk, x, π) → b. On input a verification key vk, an input x, and a proof π, the verifier V outputs b = 1 if he is convinced by π that x ∈ L C .
The triple (G, P, V ) satisfies the following properties.
Completeness. The honest prover can convince the verifier for any instance in the language. Namely, for every security parameter λ, F-arithmetic circuit C, and instance x ∈ L C with a witness a,
Succinctness. For every security parameter λ, F-arithmetic circuit C, and (pk, vk) ∈ G(1 λ , C),
• an honestly-generated proof π has O λ (1) bits;
Proof of knowledge (and soundness). If the verifier accepts a proof for an instance, the prover "knows" a witness for that instance. (Thus, soundness holds.) Namely, for every constant c > 0 and every polynomialsize adversary A there is a polynomial-size witness extractor E such that, for every large-enough security parameter λ, for every F-arithmetic circuit C of size λ c ,
Statistical zero knowledge. An honestly-generated proof is statistical zero knowledge. Namely, there is a polynomial-time stateful simulator S such that, for all stateful distinguishers D, the following two probabilities are negligibly-close:
Remark C.1. All known preprocessing zk-SNARK constructions can in fact be made perfect zero knowledge, at the only expense of a negligible probability of error in completeness. ] also investigate and provide implementations of preprocessing zk-SNARKs. As we discuss in Section 3.3, in this work we follow the implementation of [BCTV14] , which, at the time of writing, is the fastest one.
C.1 Known constructions and security
Security of zk-SNARKs is based on knowledge-of-exponent assumptions and variants of Diffie-Hellman assumptions in bilinear groups [Gro10, BB04, Gen04] . Knowledge-of-exponent assumptions are fairly strong, but there is evidence that such assumptions may be inherent for constructing zk-SNARKs [GW11, BCCT12].
Remark C.2 (auxiliary input). More generally, the security of zk-SNARKs relies on the extractability of certain functions. Extractability is a delicate property that, depending on how it is stated, yields conditions of different relative strength. One aspect that affects this is the choice of auxiliary input (a discussion of which was omitted in the informal definition above). For instance, if the adversary is allowed any auxiliary input, extraction may be difficult because the auxiliary input may encode an obfuscated strategy [BCCT12] ; such intuition can in fact be formalized to yield limitations to extractability [BCPR13] . On the other end of the spectrum, certain notions of extractability can be achieved [BCP13] .
The focus of this paper is practical aspects of zk-SNARKs so our perspective on extractability here is that, similarly to the Fiat-Shamir paradigm [FS87] , knowledge-of-exponent assumptions, despite not being fully understood, provide solid heuristics in practice since no effective attacks against them are known.
C.2 Instantiations via elliptic curves
Known preprocessing zk-SNARK constructions are based on pairings (see Appendix B.1), which can in turn be based on pairing-friendly elliptic curves (see Appendix B.2). We recall two facts, used in this paper. Field for the circuit language. Let E be an elliptic curve that is defined over a finite field F q , has a group E(F q ) of F q -rational points with a prime order r (or order divisible by a large prime r), and has embedding degree k with respect to r. Suppose that a preprocessing zk-SNARK (G, P, V ) is instantiated with E. Then, (G, P, V ) works for F r -arithmetic circuit satisfiability, but all of V 's arithmetic computations are over F q (or extensions of F q up to degree k). 26 This fact motivates most of the discussions in Section 3.1. 2-adicity of a curve. Prior work identified the 2-adicity of a curve as an important ingredient for efficient implementations of the generator and, especially, the prover [BCGTV13a, BCTV14] .
An elliptic curve E/F q has 2-adicity 2 if the large prime r dividing #E(F q ) is such that 2 divides r − 1. This property ensures that the multiplicative group of F r contains a 2 -th root of unity, which significantly improves the efficiency of interpolation and evaluation of functions defined over certain domains in F r .
When instantiating a preprocessing zk-SNARK (G, P, V ) with E, the zk-SNARK works for F r -arithmetic circuit satisfiability, and both G and P need to solve interpolation/evaluation problems over domains of size |C|, where C is the F r -arithmetic circuit given as input to G. Thus, efficiency can be improved if E is sufficiently 2-adic. Concretely, to fully take advantage of the efficiency benefits of 2-adicity, one requires that 2 ≥ |C|, i.e., ν 2 (r − 1) ≥ log |C| where ν 2 (·) denotes the 2-adic order function.
This fact motivates much of the extensive search for suitable curve parameters, described in Section 3.2.
Remark C.3 (lack of 2-adicity). One can consider other/weaker requirements (e.g., ν 3 (r−1) ≥ log 3 |C| , or r −1 is divisible by a smooth number M ≥ |C|) which would still somewhat simplify interpolation/evaluation problems over |C|-size domains in F r . The above requirement that ν 2 (r − 1) ≥ log |C| is, in a sense, the "ideal" one. Moreover, even if E does not satisfy these other/weaker requirements, it is still possible to instantiate the zk-SNARK, but at a higher computational cost (both asymptotically and in practice), due to the necessary use of "heavier" techniques applying to "generic" fields [PGHR13] .
C.3 The zk-SNARK verifier protocol
The (pairing-based) preprocessing zk-SNARKs that we use follow those of [BCTV14] (see Section 3.3); in turn, these improve upon and implement those of [PGHR13] . In this paper, we construct arithmetic circuits for verifying the evaluation of the zk-SNARK verifier V : a circuit C V,4 for an instantiation based on the curve E 4 , and a circuit C V,6 for one based on the curve E 6 (see Section 5.1). For completeness, in Figure 9 we summarize V 's abstract protocol. We see that V 's protocol consists of two main parts: (a) use the verification key vk and input x ∈ F n r to compute vk x (see Step 1); and (b) use the verification key vk, value vk x , and proof π, to compute 12 pairings and perform the required checks (see Step 2, Step 3, Step 4). Thus, the first part requires O(n) scalar multiplications in G 1 , while the second part requires O(1) pairing evaluations. For additional details regarding V (and, more generally, the preprocessing zk-SNARK construction), we refer the reader to [BCTV14, PGHR13] . Indeed, our focus in this work is not why V executes these checks, but how we can efficiently verify its checks via suitable arithmetic circuits. ALGEBRAIC SETUP. A prime r, two cyclic groups G1 and G2 of order r with generators P1 and P2 respectively, and a pairing e : G1 × G2 → GT , where GT is also cyclic of order r. (See Appendix B.1 for a pairing's definition.) zk-SNARK verifier V for inputs of size n
2. Check validity of knowledge commitments: e(π A , vk A ) = e(π A , P2) , e(vk B , π B ) = e(π B , P2) , e(π C , vk C ) = e(π C , P2). 3. Check same coefficients were used: e(π K , vkγ) = e(vk x + π A + π C , vk 
D Proof-carrying data for arithmetic compliance predicates
We define a proof-carrying data system (PCD system), which is a cryptographic primitive that captures the notion of proof-carrying data [CT10, CT12] . More precisely, we define preprocessing PCD systems [BCCT13] . The definitions here are somewhat informal; for details, we refer the reader to [BCCT13] . Proof-carrying data at a glance. Fix a predicate Π. Consider a distributed computation where nodes perform computations; each computation takes as input messages and outputs a new output message. The security goal is to ensure that each output message is compliant with the predicate Π. Proof-carrying data ensures this goal by attaching short and easy-to-verify proofs of Π-compliance to each message.
Concretely, a key generator G first sets up a proving key and a verification key. Anyone can then use a prover P, which is given as input the proving key, prior messages z in with proofs π in , and an output message z, to generate a proof π attesting that z is Π-compliant. Anyone can use a verifier V, which is given as input the verification key, a message z, and a proof, to verify that z is Π-compliant.
Crucially, proof generation and proof verification time are "history independent": the first only depends on the time to execute Π on input a node's messages, while the second only on the message length.
We now spell out more details, by first specifying the notion of distributed computation, and then that of compliance with a predicate Π. Our discussion is specific to predicates specified as F-arithmetic circuits. Transcripts. Given n msg , n loc , s ∈ N and field F, an F-arithmetic transcript (for message size n msg , local-data size n loc , and arity s) is a triple T = (G, loc, data), where G = (V, E) is a directed acyclic graph G, loc : V → F n loc are node labels, and data : E → F nmsg are edge labels. The output of T, denoted out(T), equals data(ũ,ṽ) where (ũ,ṽ) is the lexicographically-first edge withṽ a sink.
Intuitively, the label loc(v) of a node v represents the local data used by v in his local computation; the edge label data(u, v) of a directed edged (u, v) represents the message sent from node u to node v. Typically, a party at node v uses the local data loc(v) and "input messages" data(u, v) u∈parents(v) to compute an "output message" data(v, w) for each child w ∈ children(v). Compliance. Given field F and n msg , n loc , s ∈ N, an F-arithmetic compliance predicate Π (for message size n msg , local-data size n loc , and arity s) is an F-arithmetic circuit with domain F nmsg × F n loc × F s·nmsg × F. The compliance predicate Π specifies whether a given transcript T is compliant or not, as follows. Consider any transcript T with message size n msg , local-data size n loc , and arity s. We say that T = (G, loc, data) is Π-compliant, denoted Π(T) = 0, if, for every v ∈ V and w ∈ children(v), it holds that Π data(v, w), loc(v), data(u, v) u∈parents(v) , b base = 0 , where b base ∈ {0, 1} is the base case flag (i.e., equals 1 if and only if v is a source). Furthermore, we say that a message z is Π-compliant if there is T such that Π(T) = 0 and out(T) = z.
We are now ready to describe the syntax, semantics, and security of a proof-carrying data system. Given a field F, a (preprocessing) proof-carrying data system (PCD system) for F-arithmetic compliance predicates is a triple of polynomial-time algorithms (G, P, V) working as follows.
• G(1 λ , Π) → (pk, vk). On input a security parameter λ (presented in unary) and an F-arithmetic compliance predicate Π, the key generator G probabilistically samples a proving key pk and a verification key vk. We assume, without loss of generality, that pk contains (a description of) the predicate Π.
The keys pk and vk are published as public parameters and can be used, any number of times, to prove/verify Π-compliance of messages.
• P(pk, z, z loc , z in , π in ) → π. On input a proving key pk, outgoing message z, local data z loc , and incoming messages z in with proofs π in , the prover P outputs a proof π for the statement "z is Π-compliant".
• V(vk, z, π) → b. On input a verification key vk, a message z, and a proof π, the verifier V outputs b = 1 if he is convinced by π that z is Π-compliant.
The triple (G, P, V) satisfies the following properties.
Completeness. The honest prover can convince the verifier that the output of any compliant transcript is indeed compliant. Namely, for every security parameter λ, F-arithmetic compliance predicate Π, and distributed-computation generator S (see below), Pr Π(T) = 0 V vk, out(T), π = 1 (pk, vk) ← G(1 λ , Π) (T, π) ← ProofGen(S, pk, P) = 0 .
Above, ProofGen as an interactive protocol between a distributed-computation generator S and the PCD prover P, in which both are given the compliance predicate Π and the proving key pk. Essentially, at every time step, S chooses to do one of the following actions: add a new unlabeled vertex to the computation transcript so far (this corresponds to adding a new computing node to the computation), label an unlabeled vertex (this corresponds to a choice of local data by a computing node), or add a new labeled edge (this corresponds to a new message from one node to another). In case S chooses the third action, the PCD prover P produces a proof for the Π-compliance of the new message, and adds this new proof as an additional label to the new edge. When S halts, the interactive protocol outputs the distributed computation transcript T, as well as T's output and corresponding proof. Intuitively, the completeness property requires that if T is compliant with Π, then the proof attached to the output (which is the result of dynamically invoking P for each message in T, as T was being constructed by S) is accepted by the verifier. Succinctness. For every security parameter λ, F-arithmetic predicate Π, and (pk, vk) ∈ G(1 λ , Π),
• V(vk, z, π) runs in time O λ (|z|). Above, O λ hides a (fixed) polynomial factor in λ.
Proof of knowledge (and soundness). If the verifier accepts a proof for a message, the prover "knows" a compliant transcript T with output z. (Thus, soundness holds.) Namely, for every constant c > 0 and every polynomial-size adversary A there is a polynomial-size witness extractor E such that, for every large-enough security parameter λ, for every F-arithmetic compliance predicate Π of size λ c , Statistical zero knowledge. An honestly-generated proof is statistical zero knowledge. 27 Namely, there is a polynomial-time stateful simulator S such that, for all stateful distinguishers D, the following two probabilities are negligibly-close: 
E Scalable zk-SNARKs for random-access machines
At high-level, a zk-SNARK for random-access machines is a cryptographic primitive that provides short and easy-to-verify non-interactive zero-knowledge proofs of knowledge for the correct execution of programs. A public proving key is used to generate proofs, and a public verification key is used to verify them; the two keys are jointly generated once, and can then be used any number of times.
In this work, we seek, and obtain an implementation of, zk-SNARKs that are scalable, i.e., that are:
• Fully succinct. This property requires that a single pair of keys suffices for computations of any (polynomial) size. In particular, the time to generate a key pair is short (i.e., bounded by a fixed polynomial in the security parameter) and so is the key length.
• Incrementally computable. This property requires that proof generation is carried out incrementally, along the original computation, by updating, at each step, a proof of correctness of the computation so far.
Below, we informally define fully-succinct zk-SNARKs for random-access machines, as well as the additional property of incremental computation. We refer the reader to, e.g., [BCCT13] for a formal treatment. (Also see Remark C.2 for a technical comment that applies here too.)
A fully-succinct zk-SNARK for random-access machines (see Appendix A.2) is a triple of polynomial-time algorithms (G , P , V ) working as follows.
• G (1 λ , M) → (pk, vk). On input a security parameter λ (presented in unary) and a random-access machine M, the key generator G probabilistically samples a proving key pk and a verification key vk. We assume, without loss of generality, that pk contains (a description of) the machine M.
The keys pk and vk are published as public parameters and can be used, any number of times, to prove/verify membership of instances in the language L M of accepting computations on M (see Definition A.2). The key generator G is thus succinct and universal (i.e., it does not depend on the program P, or even computation size, but only on the machine M used to run programs). The keys pk and vk are used as follows. 28
• P (pk, P, T, G) → π. On input a program P, time bound T , and auxiliary input G such that M(P; G) accepts in ≤ T steps, the prover P outputs a non-interactive proof π for the statement " (P, T ) ∈ L M ".
• V (vk, P, T, π) → b. On input a program P, time bound T , and proof π, the verifier V outputs b = 1 if he is convinced by π that (P, T ) ∈ L M .
The triple (G , P , V ) satisfies the following properties.
Completeness. The honest prover can convince the verifier for any instance in the language. Namely, for every security parameter λ, random-access machine M, and instance (P, T ) ∈ L M with a witness G, Pr V (vk, P, T, π) = 1 (pk, vk) ← G (1 λ , M) π ← P (pk, P, T, G) = 1 .
Succinctness. For every security parameter λ, random-access machine M, and (pk, vk) ∈ G (1 λ , M),
• an honestly-generated proof π has O λ,M (1) bits;
• V (vk, P, T, π) runs in time O λ,M (|P| + log T ).
Above, O λ,M hides a (fixed) polynomial factor in λ and |M|. (In our implementation, these will be constants.) Proof of knowledge (and soundness). If the verifier accepts a proof for a polynomial-size computation, the prover "knows" a witness for the instance. (Thus, soundness holds.) Namely, for every constant c > 0 and every polynomial-size adversary A there is a polynomial-size witness extractor E such that, for every large enough security parameter λ, for every random-access machine M, Pr   T ≤ λ c V (vk, P, T, π) = 1 (P, T ), G / ∈ R M (pk, vk) ← G (1 λ , M) (P, T, π) ← A(pk, vk) G ← E(pk, vk)   ≤ negl(λ) .
Statistical zero knowledge. An honestly-generated proof is statistical zero knowledge. 29 Namely, there is a polynomial-time stateful simulator S such that, for all stateful distinguishers D, the following two probabilities are negligibly-close:
π ← P (pk, P, T, G)
Finally, a fully-succinct zk-SNARK is also incrementally computable if there exist two algorithms, a computation supervisor SV and a sub-prover SP, such that, for every security parameter λ, random-access machine M, instance (P, T ) ∈ L M with a witness G = (g 0 , . . . , g T −1 ), key pair (pk, vk) ∈ G (1 λ , M), and letting π T := P (pk, P, T, G), the following holds.
• For i = 1, . . . , T , π i = SP(pk, aux i , π i−1 ).
• For i = 1, . . . , T , aux i is the final state of memory when SV(M, g i ) has read-write random access to a memory initialized to the state aux i−1 . Moreover, each aux i has size O λ,M (S i ), where S i is the space usage of M(P; G) at time i. 30 • The proof π 0 is defined as ⊥, and aux 0 as P. In particular, SV and SP have time and space complexity O λ,M (1); these costs are incurred each time a new proof is generated from an old one.
E.1 Known constructions and security
Theoretical constructions of fully-succinct zk-SNARKs are known, based on various cryptographic assumptions [Mic00, Val08, BCCT13] . Despite achieving essentially-optimal asymptotics [BFLS91, BGHSV05, BCGT13b, BCGT13a, BCCT13] no implementations of them have been reported in the literature to date.
Of the above, the only approach that also achieves incremental computation is the one of Bitansky et al. [BCCT13] , which we follow in this paper. Security in [BCCT13] is based on the security of preprocessing zk-SNARKs (see Appendix C.1) and collision-resistant hash functions.
