Secure 2-party computation (2PC) is becoming practical for some applications. However, most approaches are limited by the fact that the desired functionality must be represented as a boolean circuit. In response, random-access machines (RAM programs) have recently been investigated as a promising alternative representation.
Introduction
General secure two-party computation (2PC) allows two parties to perform "arbitrary" computation on their joint inputs without revealing any information about their private inputs beyond what is deducible from the output of computation. This is an extremely powerful paradigm that allows for applications to utilize sensitive data without jeopardizing its privacy.
From a feasibility perspective, we know that it is possible to securely compute any function, thanks to seminal results of [Yao82, GMW87] . The last decade has also witnessed significant progress in design and implementation of more practical/scalable secure computation techniques, improving performance by orders of magnitude and enabling computation of circuits with billions of gates.
These techniques, however, are largely restricted to functions represented as Boolean or arithmetic circuits, whereas the majority of applications we encounter in practice are more efficiently captured using random-access memory (RAM) programs that allow constant-time memory lookup. Modern algorithms of practical interest (e.g., binary search, Dijkstra's shortest-paths algorithm, and the Gale-Shapely stable matching algorithm) all rely on fast memory access for efficiency, and suffer from major blowup in running time otherwise. More generally, a circuit computing a RAM program with running time T requires Θ(T 2 ) gates in the worst case, making it prohibitively expensive (as a general approach) to compile RAM programs into a circuit and then apply known circuit 2PC techniques.
A promising alternative approach uses the building block of oblivious RAM, introduced by Goldreich and Ostrovsky [GO96] . ORAM is an approach for making a RAM program's memory access pattern inputoblivious while still retaining fast (polylogarithmic) memory access time. Recent work in 2PC has begun to investigate direct computation of ORAM computations as an alternative to RAM-to-circuit compilation [GKK + 12, LO13, KS14, GHL + 14, LHS + 14]. These works all follow the same general approach of evaluating a Hence, instead of secret-sharing the internal state of the RAM program between the parties, we simply "re-use" the garbled wire labels from the output of one circuit into the input of the next circuit. These wire labels already inherit the required authenticity properties, so no oblivious transfers or consistency checks are needed.
Similarly, we also encode the RAM's memory via wire labels. When the RAM reads from memory location , we simply reuse the appropriate output wire labels from the most recent circuit to write to location (not necessarily the previous instruction, as is the case for the internal state). Since the wire labels already hide the underlying logical values, we only require an oblivious RAM that hides the memory access pattern and not the contents of memory. More concretely, this means that we do not need to add encryption/decryption and MAC/verify circuitry inside the circuit that is being garbled or perform oblivious transfers on shared intermediate secrets. Importantly, if the RAM program being evaluated is "non-cryptographic" (i.e., has a small circuit description) then the circuits garbled at each round of our protocols will be small.
Of course, it is a delicate task to make these intuitive ideas work with the state of art techniques for cut-and-choose. We present two protocols, which use different approaches for reusing wire labels.
The first protocol uses ideas from the LEGO paradigm [NO09, FJN + 13] for 2PC and other recent works on batch-preprocessing of garbled circuits [HKK + 14, LR14] . The idea behind these techniques is to generate all the necessary garbled circuits in an offline phase (before inputs are selected), open and check a random subset, and randomly assign the rest into buckets, where each bucket corresponds to one execution of the circuit. But unlike the setting of [HKK + 14, LR14] , where circuits are processed for many independent evaluations of a function, we have the additional requirement that the wire labels for memory and state data should be directly reused between various garbled circuits. Since we cannot know which circuits must have shared wire labels (due to random assignment to buckets and run-time memory access pattern), we use the "soldering" technique of [NO09, FJN + 13] that directly transfers garbled wire labels from one wire to another, after the circuits have been generated. However, we must adapt the soldering approach to make it amenable to soldering entire circuits as opposed to soldering simple gates as in [NO09, FJN + 13] . For a discussion of subtle problems that arise from a direct application of their soldering technique, see Section 3.
Our second approach directly reuses wire labels without soldering. As a result, garbled circuits cannot be generated offline, but the scheme does not require the homomorphic commitments required for the LEGO soldering technique. At a high level, we must avoid having the cut-and-choose phase reveal secret wire labels that are shared in common with other garbled circuits. The technique recently proposed in [MGFB14] allows us to use a single cut-and-choose for all steps of the RAM computation (rather than independent cut-andchoose steps for each time step), and further hide the set of opened/evaluated circuits from the garbler using an OT-based cut-and-choose [KMR12, KsS12] . We observe that this approach is compatible with the state of the art techniques for input-consistency check [MR13, sS13] .
We also show how to incorporate the input-recovery technique of [Lin13] for reducing the number of circuits by a factor of three. The naive solution of running the cheating recovery after each timestep would be prohibitively expensive since it would require running a malicious 2PC for the cheating recovery circuit (and the corresponding input-consistency checks) at every timestep. We show a modified approach that only requires a final cheating recovery step at the end of the computation.
Based on some concrete measurements in Appendix A (see table 1), the "extra overhead" of achieving malicious security for RAM programs (i.e. the additional cost beyond what is needed for malicious security of the circuits involved in the computation), is at least an order of magnitude smaller than the naive solutions and this gap grows as the running time of the RAM program increases.
Related work. Starting with seminal work of [Yao86, GMW87] , the bulk of secure multiparty computation protocols focus on functions represented as circuits (arithmetic or Boolean). More relevant to this work, there is over a decade's worth of active research on design and implementation of practical 2PC protocols with malicious security based on garbled circuits [MF06, KS06, LP07, LP11, sS11, sS13, Lin13, HKE13, MR13], based on GMW [NNOB12] , and based on arithmetic circuits [DPSZ12] .
The work on secure computation of RAM programs is much more recent. [GKK + 12] introduces the idea of using ORAM inside a Yao-based secure two-party computation in order to accommodate (amortized) sublinear-time secure computation. The work of [LO13, GHL + 14] study non-interactive garbling schemes for RAM programs which can be used to design protocols for secure RAM program computation. The recent work of [KS14] , implements ORAM-based computation using arithmetic secure computation protocol of [DPSZ12] , hence extending these ideas to the multiparty case, and implementing various oblivious datastructures. SCVM [LHS + 14] and Obliv-C [Zah14] provide frameworks (including programming languages) for secure computation of RAM programs that can be instantiated using different secure computation RAM programs on the back-end. The above work all focus on the semi-honest adversarial model. To the best of our knowledge, our work provides the first practical solution for secure computation of RAM program with malicious security. Our constructions can be used to instantiate the back-end in SCVM and Obliv-C with malicious security.
Preliminaries

(Oblivious) RAM Programs
A RAM program is characterized by a deterministic circuit Π and is executed in the presence of memory M . The memory is an array of blocks, which are initially set to 0 n . An execution of the RAM program Π on inputs (x 1 , x 2 ) with memory M is given by:
st := x 1 x 2 0 n ; block := 0 n ; inst := ⊥ do until inst has the form (halt, z):
, is a technique for hiding all information about a RAM program's memory (both its contents and the data-dependent access pattern). Our constructions require a RAM program that hides only the memory access pattern, and we will use other techniques to hide the contents of memory. Throughout this work, when we use the term "ORAM", we will be referring to this weaker security notion. Concretely, such an ORAM can often be obtained by taking a standard ORAM construction (e.g., [SvDS + 13, CP13]) and removing the steps where it encrypts/decrypts memory contents.
Define I(Π, M, x 1 , x 2 ) as the random variable denoting the sequence of values taken by the inst variable in RamEval(Π, M, x 1 , x 2 ). Our precise notion of ORAM security for Π requires that there exist a simulator S such that, for all x 1 , x 2 and initially empty M , the output S(1 λ , z) is indistinguishable from I(Π, M, x 1 , x 2 ), where z is the final output of the RAM program on inputs x 1 , x 2 .
Garbling Schemes
In this section we adapt the abstraction of garbling schemes [BHR12b] to our needs. Our 2PC protocol constructions re-use wire labels between different garbled circuits, so we define a specialized syntax for garbling schemes in which the input and output wire labels are pre-specified.
We represent a set of wire labels W as a m × 3 array. Wire labels W [i, 0] and W [i, 1] denote the two wire labels associated with some wire i. We employ the point-permute optimization [PSSW09] , so we require
] is the wire label that encodes false for wire i. For shorthand, we use τ (W ) to denote the m-bit string W [1, 2] · · · W [m, 2].
We require the garbling scheme to have syntax F ← Garble(f, E, D) where f is a circuit, E and D represent wire labels as above.
For
i.e., the wire labels with select bits v. We also define W | * x := W | x⊕τ (W ) , i.e., the wire labels corresponding to truth values x. The correctness condition we require for garbling is that, for all f , x, and valid wire label descriptions E, D, we have:
If Y denotes a vector of output wire labels, then it can be decoded to a plain output via lsb(Y ) ⊕ τ (D), where lsb is applied component-wise. Hence, τ (D) can be used as output-decoding information. More generally, if µ ∈ {0, 1} m is a mask value, then revealing (µ, τ (D) ∧ µ) allows the evaluator to learn only the output bits for which µ i = 1. Let W denote the uniform distribution of m×3 matrices of the above form (wire labels with the constraint on least-significant bits described above). Then the security condition we need is that there exists an efficient simulator S such that for all f, x, D, the following distributions are indistinguishable:
To understand this definition, consider an evaluator who receives garbled circuit F and wire labels E| * x which encode its input x. The security definition ensures that the evaluator learns no more than the correct output wires D| * f (x) . Consider what happens when we apply this definition with D chosen from W and against an adversary who is given only partial decoding information (µ, τ (D)∧µ). 1 Such an adversary's view is then independent of f (x) ∧ µ. This gives us a combination of the privacy and obliviousness properties of [BHR12b] . Furthermore, the adversary's view is independent of the complementary wire labels D| * f (x) , except possibly in their least significant bits (by the point-permute constraint). So the other wire labels are hard to predict, and we achieve an authenticity property similar to that of [BHR12b] . 2 Finally, we require that it be possible to efficiently determine whether F is in the range of Garble(f, E, D), given (f, E, D). For efficiency improvements, one may also reveal a seed which was used to generate the randomness used in Garble.
These security definitions can be easily achieved using typical garbling schemes used in practice (e.g., [KS08] ). We note that the above arguments hold even when the distribution W is slightly different. For instance, when using the Free-XOR optimization [KS08] , wire label matrices E and D are chosen from a distribution parameterized by a secret ∆, where E[i, 0] ⊕ E[i, 1] = ∆ for all i. This distribution satisfies all the properties of W that were used above.
Conventions for wire labels. We exclusively garble the ORAM circuit which has its inputs/outputs partitioned into several logical values. When W is a description of input wire labels for such a circuit, we let st(W ), rand(W ), block(W ) denote the submatrices of W corresponding to the incoming internal state, random tape, and incoming memory block. When W describes output wires, we use st(W ), inst(W ) and block(W ) to denote the outgoing internal state, output instruction (read/write/halt, and memory location), and outgoing memory data block. We use these functions analogously for vectors (not matrices) of wire labels.
(XOR-Homomorphic) Commitment
In addition to a standard commitment functionality F com , one of our protocols requires an XOR-homomorphic commitment functionality F xcom . This functionality allows P 1 to open the XOR of two or more commited messages without leaking any other information about the individual messages. The funcionality is defined in Figure 1 . Further details, including an implementation, can be found in [FJN + 13].
3 Batching Protocol
High-level Overview
Roughly speaking, the LEGO technique of [NO09, FJN + 13] is to generate a large quantity of garbled gates, perform a cut-and-choose on all gates to ensure their correctness, and finally assemble the gates together into The functionality is initialized with internal value i = 1. It then repeatedly responds to commands as follows:
• On input (commit, m) from P 1 , store (i, m) internally, set i := i + 1 and output (committed, i) to both parties.
• On input (open, S) from P 1 , where S is a set of integers, for each i ∈ S find (i, m i ) in memory. If for some i, no such m i exists, send ⊥ to P 2 . Otherwise, send (open, S, i∈S m i ) to P 2 . a circuit which can tolerate a bounded number of faulty gates (since the cut-and-choose will not guarantee that all the gates are correct). More concretely, with sN gates and a cut-and-choose phase which opens half of them correctly, a statistical argument shows that permuting the remaining gates into buckets of size O(s/ log N ) each ensures that each bucket contains a majority of correct gates, except with negligible probability in s. For each gate, the garbler provides a homomorphic commitment to its input/output wire labels, which is also checked in the cut and choose phase. This allows wires to be connected on the fly with a technique called soldering. A wire with labels (w 0 , w 1 ) (here 0 and 1 refer to the public select bits) can be soldered to a wire with labels (w 0 , w 1 ) as follows. If w 0 and w 0 both encode the same truth value, then decommit to ∆ 0 = w 0 ⊕ w 0 and ∆ 1 = w 1 ⊕ w 1 . Otherwise decommit to ∆ 0 = w 0 ⊕ w 1 and ∆ 1 = w 1 ⊕ w 0 . Then when an evaluator obtains the wire label w b on the first wire, w b ⊕ ∆ b will be the correct wire label for the second wire. To prove that the garbler hasn't inverted the truth value of the wires by choosing the wrong case above, she must also decommit to the XOR of each wire's translation bit (i.e., β ⊕ β where w β and w β both encode false).
Next, an arbitrary gate within each bucket is chosen as the head. For each other gate, we solder its input wires to those of the head, and output wires to those of the head. Then an evaluator can transfer the input wire labels to each of the gates (by XORing with the appropriate solder value), evaluate the gates, and transfer the wire labels back. The majority value is taken to be the output wire label of the bucket. The cut-and-choose ensures that each bucket functions as a correct gate, with overwhelming probability. Then the circuit can be constructed by appropriately soldering together the buckets in a similar way.
For our protocol we use a similar approach but work with buckets of circuits, not buckets of gates. Each bucket evaluates a single timestep of the RAM program. To transfer RAM memory and internal state between timesteps, we solder wires together appropriately (i.e., state input of time t soldered to state output of time t − 1; memory-block input t soldered to memory-block output of the previous timestep that wrote to the desired location). Additionally, the approach of using buckets also saves an asymptotic log T factor in the number of circuits needed for each timestep (i.e., the size of the buckets), where T is the total running time of the ORAM, a savings that motivates similar work on batch pre-processing of garbled circuits [HKK + 14, LR14].
We remark that our presentation of the LEGO approach above is a slight departure from the original papers [NO09, FJN + 13]. In those works, all gates were garbled using Free XOR optimization, where w 0 ⊕ w 1 is a secret constant shared on all wires. Hence, we have only one "solder" value w 0 ⊕ w 0 = w 1 ⊕ w 1 . If the sender commits to only the "false" wire label of each wire, then the sender is prevented from inverting the truth value while soldering ("false" is always mapped to "false"). However, to keep the offset w 0 ⊕ w 1 secret, only one of the 4 possible input combinations of each gate can be opened in the cut-and-choose phase. The receiver has only a 1/4 probability of identifying a faulty gate. This approach does not scale to a cut-and-choose of entire circuits, where the number of possible input combinations is exponential. Hence our approach of forgoing common wire offsets w 0 ⊕w 1 between circuits and instead committing to the translation bits. As a beneficial side effect, the concrete parameters for bucket sizes are improved since the receiver will detect faulty circuits with probability 1, not 1/4. Back to our protocol, P 1 generates O(sT / log T ) garblings of the ORAM's next-instruction circuit, and commits to the circuits and their wire labels. P 2 chooses a random half of these to be opened and aborts if any are found to be incorrect.
For each timestep t, P 2 picks a random subset of remaining garbled circuits and the parties assemble them into a bucket B t (this is the MkBucket subprotocol) by having P 1 open appropriate XORs of wire labels, as described above. We can extend the garbled-circuit evaluation function Eval to EvalBucket using the same syntax. Then EvalBucket inherits the correctness property of Eval with overwhelming probability, for each of the buckets created in the protocol.
After a bucket is created, P 2 needs to obtain garbled inputs on which to evaluate it. See Figure 3 for an overview. Let X t denote the vector of input wire labels to bucket B t . We use block(X t ), st(X t ), rand(X t ) to denote the sets of wire labels for the input memory block, internal state, and shares of random tape, respectively. The simplest wire labels to handle are the ones for internal state, as they always come from the previous timestep. We solder the output internal state wires of bucket B t−1 to the input internal state wires of bucket B t . Then if Y t−1 were the output wire labels for bucket B t−1 by P 2 , we obtain st(X t ) by adjusting st(Y t−1 ) according to the solder values.
If the previous memory instruction was a read of a location that was last written to at time t , then we need to solder the appropriate output wires from bucket B t to the corresponding input wires of B t . P 2 then obtains block(X t ) by adjusting the wire labels block(Y t ) according to the solder values. If the previous memory instruction was a read of an uninitialized block, or a write, then P 1 simply opens these input wire labels to all zero values (see GetInput pub ).
To obtain wire labels rand(X t ), we have P 1 open wire labels for its shares (GetInput 1 ) and have P 2 obtain its wire labels via a standard OT (GetInput 2 ).
At this point, P 2 can evaluate the bucket (EvalBucket). Let Y t denote the output wire labels. P 1 opens the commitment to their translation values, so P 2 can decode and learn these outputs of the circuit. P 2 sends these labels back to P 1 , who verifies them for authenticity. Knowing only the translation values and not the entire actual output wire labels, P 2 cannot lie about the circuit's output except with negligible probability.
Detailed Protocol Description
Let Π be the ORAM program to be computed. DefineΠ(st, block, inp 1 , inp 2,1 , . . . , inp 2,n ) = Π(st, block, inp 1 , i inp 2,i ). Looking ahead, during the first timestep, the parties will provide inp 1 = x 1 and inp 2 = x 2 , while in subsequent timesteps they input their shares r 1 and r 2 of the RAM program's randomness. P 2 's input is further secret shared to prevent a selective failure attack on both x 2 and his random input r 2 . We first define the following subroutines / subprotocols: prot Solder(A, A ) // A, A are wire labels descriptions P 1 opens F xcom -commitments to τ (A) and τ (A ) so that P 2 receives τ = τ (A) ⊕ τ (A ) for each position i in τ and each b ∈ {0, 1}:
Adjust(·, ∆ st ) read from block last written at t ∆ block = Solder(·, ·) Adjust(·, ∆ block ) no read, or read from uninitialized block GetInput pub (·, 0 n ) decode via τ (·) GetInput 1 , GetInput 2
Text above an edge refers to the entire set of wire labels. Text below an edge refers to the wire labels visible to P 2 while evaluating. 
prot GetInput 1 (A, x) // A describes wire labels; P 1 holds x P 1 opens commitments of A| * x ; return these values prot GetInput 2 (A, x) // A describes wire labels; P 2 holds x for each position i in A, parties invoke an instance of F ot :
) P 2 uses input x i P 2 stores the output as X[i] P 2 returns X We now describe the main protocol for secure evaluation of Π. We let s denote a statistical security parameter, and T denote an upper bound on the total running time of Π.
1. [Pre-processing phase] Circuit garbling: P 1 and P 2 agree on the total number N = O(sT / log T ) of garbled circuits to be generated. Then, for each circuit index i ∈ {1, . . . , N }:
(a) P 1 chooses random input/output wire label descriptions E (i) , D (i) and commits to each of these values component-wise under F xcom .
2. [Pre-processing phase] Cut and choose: P 2 randomly picks a subset S c of {1, . . . , N } of size N/2 and sends it to P 1 . S c will denote the set of check circuits and S e = {1, . . . , N } \ S c will denote the set of evaluation circuits. For check circuit index i ∈ S c :
(a) P 1 opens the commitments of E (i) , D (i) , and GC (i) .
3. Online phase: For each timestep t:
(a) Bucket creation: P 2 chooses a random subset of B t of S e of size Θ(s/ log T ) and a random head circuit hd t ∈ B t . P 2 announces them to P 1 . Both parties set S e := S e \ B t .
(b) Garbled input: randomness: P 1 chooses random r 1 ← {0, 1} n , and P 2 chooses random r 2,1 , . . . , r 2,n ← {0, 1} n . P 2 sets
Garbled input: state: If t > 1 then the parties execute:
and P 2 sets st(X t ) := Adjust(st(Y t−1 ), ∆ st ). Otherwise, in the first timestep, let x 1 and x 2 denote the inputs of P 1 and P 2 , respectively. For input wire labels W , let st 1 (W ), st 2 (W ), st 3 (W ) denote the groups of the internal state wires corresponding to the initial state x 1 x 2 0 n . To prevent selective abort attacks, we must have P 2 encode his input as n-wise independent shares, as above. P 2 chooses random r 2,1 , . . . , r 2,n ∈ {0, 1} n such that n i r 2,i = x 2 , and sets: 3
Garbled input: memory block: If the previous instruction inst t−1 = (read, ) and no previous (write, ) instruction has happened, or if the previous instruction was not a read, then the parties do block(X t ) = GetInput pub (block(E (hdt) ), 0 n ). Otherwise, if inst t−1 = (read, ) and t is the largest time step with inst t = (write, ), then the parties execute:
Then P 2 sets block(X t ) := Adjust(block(Y t ), ∆ block ).
(e) Construct bucket: P 1 and P 2 run subprotocol MkBucket(B t , hd t ) to assemble the circuits.
(f) Circuit evaluation: For each i ∈ B t , P 1 opens the commitment to GC (i) and to τ (inst(D (i) )).
(g) Output authenticity: P 2 sendsỸ = inst(Y t ) to P 1 . Both parties decode the output inst t = lsb(Ỹ ) ⊕ τ (inst(D (hdt) )). P 1 aborts if the claimed wire labelsỸ do not equal the expected wire labels inst(D (hdt) )| * instt . If inst t = (halt, z), then both parties halt with output z.
Security proof
Due to page limits, we give only an overview of the simulator S and security proof. The complete details are deferred to Appendix C.
Assumptions. The security of our protocol relies on the security underlying functionalities, i.e. F xcom , F com , F ot , a garbling scheme satisfying properties discussed in Section 2.2, and an ORAM scheme satisfying standard properties discussed in Section 2.1. All the functionalities can be instantiated using standard number theoretic assumptions, and for UC security would be in the CRS model. The garbling scheme can be instantiated using a standard PRF, or using stronger assumptions such as correlation-secure hash functions for taking advantage of free-XOR. As noted earlier, we do not require the garbling scheme to be adaptively secure, but if so, we can simplify the protocol by not committing to the garbled circuits.
When P 1 is corrupted: The pre-processing phase does not depend on party's inputs, so it is trivial to simulate the behavior of an honest P 2 . However, S can obtain P 1 's commitments to all circuits and wire labels. Hence, it can determine whether each of these circuits is correct.
In each timestep t of the online phase, S can abort if an bucket is constructed with a majority of incorrect circuits; this happens with only negligible probability. S can abort just as an honest P 2 would abort if P 1 cheats in the Solder, GetInput 1 , or GetInput pub subprotocols. Using a standard argument from [LP07] , S can also match (up to a negligible difference) the probability of an honest P 2 aborting due to cheating in the GetInput 2 subprotocol. S can extract P 1 's input x 1 in timestep t = 1 by comparing the sent wire labels to the committed wire labels extracted in the offline phase. S can send x 1 to the ideal functionality and receive the output z. Then S generates a simulated ORAM memory-access sequence. Each time in step (3g), S knows all of the relevant wire labels so can send wire labelsỸ chosen to encode the desired simulated ORAM memory instruction.
When P 2 is corrupted: In the pre-processing phase, S simulates commit messages from F com . After receiving S c from P 2 , it equivocates the opening of the check sets to honestly garbled circuits and wire labels.
In each timestep t of the online phase, S sends random wire labels in the GetInput 1 and GetInput pub subprotocols, and also simulates random wire labels as the output of F ot in the GetInput 2 subprotocols. These determine the wire labels that are "visible" to P 2 . S also extracts P 2 's input x 2 from its select bits sent to F ot . It sends x 2 to the ideal functionality and receives the output z. Then S generates a simulated ORAM memory-access sequence.
In the Solder steps, S equivocates soldering values chosen to map visible wire labels to their counterparts in other circuits, and chooses random soldering values for the non-visible wire labels. When it is time to open the commitment to the garbled circuit, S chooses a random set of visible output wire labels and equivocates to a simulated garbled circuit generated using only these visible wire labels. S also equivocates on the decommitment to the decoding information τ (inst(D (i) )), chosen so that the visible output wires will decode to the next simulated ORAM memory instruction. Instead of checking P 2 's claimed wire labels in step (3g), the simulator simply aborts if these wire labels are not the pre-determined visible output wire labels.
Efficiency and Parameter Analysis
In the offline phase, the protocol is dominated by the generation of many garbled circuits, O(sT / log T ) in all. In Appendix B we describe computation of the exact constant. As an example, for T = 1 million, and to achieve statistical security 2 −40 , it is necessary to generate 10 · T circuits in the offline phase.
In the online phase, the protocol is dominated by two factors: the homomorphic decommitments within the Solder subprotocol, and the oblivious transfers (in GetInput 2 ) in which P 2 receives garbled inputs. For the former, we require one decommitment for each input and output wire label (to solder that wire to another wire) of the circuitΠ. Hence the cost in each timestep is proportional to the input/output size of the circuit and the size of the buckets. Continuing our example from above (T = 10 6 and s = 40), buckets of size 5 are sufficient.
In Appendix B we additionally discuss parameter settings for when the parties open a different fraction (i.e., not 1/2) of circuits in the cut-and-choose phase. By opening a smaller fraction in the offline phase, we require fewer circuits overall, at the cost of slightly more circuits per timestep (i.e., slightly larger buckets) in the online phase.
We require one oblivious transfer per input bit of P 2 per timestep (independent of the size of buckets). P 2 's input is split in an s-way secret share to assure input-dependent failure probabilities, leading to a total of sn OTs per timestep (where n is the number of random bits required byΠ). However, online oblivious transfers are inexpensive (requiring only few symmetric-key operations) when instantiated via OT extension [IKNP03, ALSZ13] , where the more expensive "seed OTs" will be done in the pre-processing phase. In Section 5 we suggest further ways to reduce the required number of OTs in the online phase.
Overall, the online overhead of this protocol (compared to the semi-honest setting) is dominated by the bucket size, which is likely at most 5 or 7 for most reasonable settings.
In terms of memory requirements, P 1 must store all pre-processed garbled circuits, and P 2 must store all of their commitments. For each bit of RAM memory, P 1 must store the two wire labels (and their decommitment info) corresponding to that bit, from the last write-time of that memory location. P 2 must store only a single wire label per memory bit.
Streaming Cut-and-choose Protocol
High-level Overview
The standard cut-and-choose approach is (for evaluating a single circuit) for the sender P 1 to garble O(s) copies of the circuit, and receiver P 2 to request half of them to be opened. If all opened circuits are correct, then with overwhelming probability (in s) a majority of the unopened circuits are correct as well.
When trying to apply this methodology to our setting, we face the challenge of feeding past outputs (internal state, memory blocks) into future circuits. Naïvely doing a separate cut-and-choose for each timestep of the RAM program leads to problems when reusing wire labels. Circuits that are opened and checked in time step t must have wire labels independent of past circuits (so that opening these circuits does not leak information about past garbled outputs). Circuits used for evaluation must be garbled with input wire labels matching output wire labels of past circuits. But the security of cut and choose demands that P 1 cannot know, at the time of garbling, which circuits will be checked or used for evaluation.
Our alternative is to use a technique suggested by [MGFB14] to perform a single cut-and-choose that applies to all timesteps. We make O(s) independent threads of execution, where wire labels are directly reused only within a single thread. A cut-and-choose step at the beginning determines whether each entire thread is used for checking or evaluation. Importantly, this is done using an oblivious transfer (as in [KMR12, KsS12] ) so that P 1 does not learn the status of the threads.
More concretely, for each thread the parties run an oblivious transfer allowing P 2 to pick up either k check or k eval . Then at each timestep, P 1 sends the garbled circuit but also encrypts the entire set of wire labels under k check and encrypts wire labels for only her input under k eval . Hence, in check threads P 2 receives enough information to verify correct garbling of the circuits (including reuse of wire labels -see below), but learns nothing about P 1 's inputs. In evaluation threads, P 2 receives only P 1 's garbled input and the security property of garbled circuits applies. If P 1 behaves incorrectly in a check thread, P 2 aborts immediately. Hence, it is not hard to see that P 1 cannot cause a majority of evaluation threads to be faulty while avoiding detection in all check threads, except with negligible probability.
Reusing wire labels is fairly straight-forward since it occurs only within a single thread. The next circuit in the thread is simply garbled with input wire labels matching the appropriate output wire labels in the same thread (i.e., the state output of the previous circuit, and possibly the memory-block output wires of an earlier circuit). We point out that P 1 must know the previous memory instruction before garbling the next batch of circuits: if the instruction was (read, ), then the next circuit must be garbled with wire labels matching those of the last circuit to write to memory location . Hence this approach is not compatible with batch pre-processing of garbled circuits. For enforcing consistency of P 1 's input, we use the approach of [sS13] 4 , where the very first circuit is augmented to compute a "hiding" universal hash of P 1 's input. For efficiency purposes, the hash is chosen as M · (x 1 r), where M is a random binary matrix M of size s × (n + 2s + log s) chosen by P 2 . We prevent input-dependent abort based on P 2 's input using the XOR-tree approach of [LP07] , also used in the previous protocol.
We ensure authenticity of the output for P 1 using an approach suggested in [MR13] . Namely, wire labels corresponding to the same output wire and truth value are used to encrypt a random "output authenticity" key. Hence P 2 can compute these output keys only for the circuit's true output. P 2 is not given the information required for checking these ciphertexts until after he commits to the output keys. At the time of committing, he cannot guess complementary output keys, but he does not actually open the commitment until he receives the checking information and is satisfied with the check circuits.
The adaptation of the input-recovery technique of Lindell [Lin13] is more involved and hence we discuss it separately in Section 4.5.
Detailed Protocol Description
We now describe the streaming cut-and-choose protocol for secure evaluation of Π, the ORAM program to be computed. Recall thatΠ(st, block, inp 1 , inp 2,1 , . . . , inp 2,n ) = Π(st, block, inp 1 i inp 2,i ). We let s denote a statistical security parameter parameter, and T denote an upper bound on the total running time of Π. Here, we describe the majority-evaluation variant of the protocol and discuss how to integrate the input-recovery technique in Section 4.5.
1. Cut-and-choose. The parties agree on S = O(s), the number of threads (see discussion below). P 2 chooses a random string b ← {0, 1} S . Looking ahead, thread i will be a check thread if b i = 0 and an evaluation thread if b i = 1.
For each i ∈ {1, . . . , S}, P 1 chooses two symmetric encryption keys k (i,check) and k (i,eval) . The parties invoke an instance of F ot with P 2 providing input b i and P 1 providing input (k (i,check) , k (i,eval) ).
2. RAM evaluation. For each timestep t, the following are done in parallel for each thread i ∈ {1, . . . , S}:
(a) Wire label selection. P 1 determines the input wire labels E (t,i) for garbled circuit GC (t,i) as follows. If t = 1, these wire labels are chosen uniformly. Otherwise, we set st(E (t,i) ) = st(D (t−1,i) ) and choose rand 1 (E (t,i) ) and rand 2 (E (t,i) ) uniformly. If the previous instruction inst t−1 = (read, ) and no previous (write, ) instruction has happened, or if the previous instruction was not a read, then P 1 chooses block(E (t,i) ) uniformly at random. Otherwise, we set block(E (t,i) ) = block(D (t ,i) ), where t is the last instruction that wrote to memory location .
(b) Input selection. Parties choose shares of the randomness required forΠ: P 1 chooses r 1 ← {0, 1} n , and P 2 chooses r 2,1 , . . . , r 2,n ← {0, 1} n .
(c) P 1 's garbled input transfer. P 1 sends the following wire labels, encrypted under k (i,eval) :
The following additional wire labels are also sent in the clear:
s garbled input transfer. P 2 obtains garbled inputs via calls to OT. To guarantee that P 2 uses the same input in all threads, we use a single OT across all threads for each input bit of P 2 . For each input bit, P 1 provides the true and false wire labels for all threads as input to F ot , and P 2 provides his input bit as the OT select bit. Note that P 2 's inputs consist of the strings r 2,1 , . . . , r 2,n as well as the string x 2 for the case of t = 1.
(e) Input consistency. If t = 1, then P 2 sends a random s × (n + 2s + log s) binary matrix M to P 1 . P 1 chooses random input r ∈ {0, 1} 2s+log s , and augments the circuit forΠ with a subcircuit for computing M · (x 1 r).
(f) Circuit garbling. P 1 chooses output wire labels D (t,i) at random and does GC (t,i) = Garble(Π, E (t,i) , D (t,i) ), where in the first timestep,Π also contains the additional subcircuit described above. P 1 sends GC (t,i) to P 2 as well as τ (inst(D (t,i) )).
In addition, P 1 chooses a random ∆ t for this time-step and for each inst-output bit j, he chooses random strings w (t,j,0) and w (t,j,1) (the same across all threads) to be used for output authenticity, such that w (t,j,0) ⊕w (t,j,1) = ∆ t . For each thread i, output wire j and select bit b corresponding to truth value b , let v i,j,b denote the corresponding wire label. (g) Garbled input collection. If thread i is an evaluation thread, then P 2 assembles input wire labels X (t,i) for GC (t,i) as follows: P 2 uses k (eval,i) to decrypt wire labels sent by P 1 . Along with the wire labels sent in the clear and those obtained via OTs in GetInput 2 , these wire labels will comprise rand(X (t,i) ); block(X (t,i) ) in the case of a write or uninitialized read; and st(X (t,i) ) when t = 1.
Other input wire labels are obtained via:
where t is the last write time of the appropriate memory location, and Y denote the output wire labels that P 2 obtained during previous evaluations.
(h) Evaluate and commit to output. If thread i is an eval thread, then P 2 evaluates the circuit via Y (t,i) = Eval(GC (t,i) , X (t,i) ) and decodes the output
For each inst-output wire label j, P 2 decrypts the corresponding ciphertext c i,j,b , then takes w j to be the majority result across all threads i. P 2 commits to w j . If t = 1, then P 2 verifies that the output of the auxiliary function M · (x 1 r) is identical to that of all other threads; if not, he aborts.
(i) Checking the check threads. P 1 sends Enc k (i,check) (seed (t,i) ) to P 2 , where seed (t,i) is the randomness used in the call to Garble. Then if thread i is a check thread, P 2 checks the correctness of GC (t,i) as follows. By induction, P 2 knows all the previous wire labels in thread i, so can use seed (t,i) to verify that GC (t,i) is garbled using the correct outputs. In doing so, P 2 learns all of the output wire labels for GC (t,i) as well. P 2 checks that the wire labels sent by P 1 in the clear are as specified in the protocol, and that the c i,j,b ciphertexts and h i,j,b are correct and consistent. He also decrypts c i,j,b for b ∈ {0, 1} with the corresponding output label to recover w (t,j,b) and checks that w (t,j,0) ⊕ w (t,j,1) is the same for all j. Finally, P 2 checks that the wire labels obtained via OT in GetInput 2 are the correct wire labels encoding P 2 's provided input. If any of these checks fail, then P 2 aborts immediately.
(j) Output verification. P 2 opens the commitments to values w j and P 1 uses them to decode the output inst t . If a value w j does not match one of w (t,j,0) or w (t,j,1) , then P 1 aborts.
Security Proof
Again we only give a brief overview of the simulator, with the details deferred to Appendix D.
The security of the protocol relies on functionalities F com , F ot which can both be instantiated under number theoretic assumptions in the CRS model, a secure garbling scheme and an ORAM scheme satisfying standard properties discussed earlier. More efficiency can be obtained using RO or correlation-secure hash functions, to take advantage of the free-XOR technique for garbling (and faster input-consistency checks), or the use of fast OT extension techniques.
When P 1 is corrupt: In the cut-and-choose step, the simulator S extracts both encryption keys k (i,eval) and k (i,check) . Just as P 2 , the simulator designates half of the threads to be check threads and half to be eval threads, and aborts if a check thread is ever found to be incorrect. However, the simulator can perform the same check for all threads, and keeps track of which eval threads are correct. A standard argument shows that if all check threads are correct, then a majority of eval threads are also correct, except with negligible probability. Without loss of generality, we can have S abort if this condition is ever violated.
Knowing both encryption keys, S can associate P 1 's input wire labels with truth values (at least in the correct threads). If P 1 provides disagreeing inputs x 1 among the correct eval threads, then S aborts, which is negligibly close to P 2 's abort probability (via the argument regarding the input-consistency of [sS13] ). Otherwise, this determines P 1 's input x 1 which S sends to the ideal functionality, receiving output z in return. S generates a simulated ORAM memory access pattern.
In the output commitment step, S simulates a commit message. Then after the check phase, S learns all of the output-authenticity keys. So S simply equivocates the opening of the output keys to be the ones encoding the next ORAM memory instruction.
When P 2 is corrupt: In the cut-and-choose phase, S extracts P 2 's selection of check threads and eval threads. In check threads, S always sends correctly generated garbled circuits, following the protocol specification and generates dummy ciphertexts for the encryptions under k (i,eval) . Hence, these threads can be simulated independently of P 1 's input.
In each eval thread, S maintains visible input/output wire labels for each circuit, chosing new output wire labels at random. S ensures that P 2 picks up these wire labels in the input collection step. S also extracts P 2 's input x 2 in this phase, from its select bit inputs to F ot . S sends x 2 to the ideal functionality and receives output z. Then S generates a simulated ORAM memory access pattern.
At each timestep, for each eval thread, S generates a simulated garbled circuit, using the appropriate visible input/output wire labels. It fixes the decoding information τ so that the visible output wire labels will decode to the appropriate ORAM instruction. In the output reveal step, S aborts if P 2 does not open its commitment to the expected output keys. Indeed, P 2 's view in the simulation is independent of the complementary output keys.
Efficiency and Parameter Analysis
At each timestep, the protocol is dominated by the generation of S garbled circuits (where S is the number of threads) as well as the oblivious transfers for P 2 's inputs. As before, using OT extension as well as the optimizations discussed in Section 5, the cost of the oblivious transfers can be significantly minimized. Other costs in the protocol include simple commitments and symmetric encryptions, again proportional to the number of threads. Hence the major computational overhead is simply the number of threads. An important advantage of this protocol is that we avoid the soldering and the "expensive" xor-homomorphic commitments needed for input/outputs of each circuit in our batching solution. On the other hand, this protocol always require O(s) garbled circuit executions regardless of the size of the RAM computation, while as discussed earlier, our batching protocol can require significantly less garbled circuit execution when the running time T is large. The choice of which protocol to use would then depend on the running time of the RAM computation, the input/output size of the next-instruction circuits as well as practical efficiency of xor-homomorphic commitment schemes in the future.
Compared to our other protocol, this one has a milder memory requirement. Garbled circuits are generated on the fly and can be discarded after they are used, with the exception of the wire labels that encode memory values. P 1 must remember 2S wire labels per bit of memory (although in Section 5 we discuss a way to significantly reduce this requirement). P 2 must remember between S and 2S wire labels per bit of memory (1 wire label for evaluation threads, 2 wire labels for check threads).
Using the standard techniques described above, we require S ≈ 3s threads to achieve statistical security of 2 −s . Recently, techniques have been developed [Lin13] for the SFE setting that require only s circuits for security 2 −s (concretely, s is typically taken to be 40). We now discuss the feasibility of adapting these techniques to our protocol:
Integrating Cheating Recovery
The idea of [Lin13] is to provide a mechanism that would detect inconsistency in the output wire labels encoding the final output of the computation. If P 2 receives output wire labels for two threads encoding disparate values, then a secondary computation allows him to recover P 1 's input (and hence compute the function himself). This technique reduces the number of circuits necessary by a factor of 3 since we only need a single honest thread among the set of evaluated threads (as opposed to a majority). We refer the reader to [Lin13] for more details. We point out that in some settings, recovering P 1 's input may not be enough. Rather, if P 2 is to perform the entire computation on his own in the case of a cheating P 1 , then he also needs to know the contents of the RAM memory! Cheating recovery at each timestep. It is possible to adapt this approach to our setting, by performing an input-recovery computation at the end of each timestep. But this would be very costly, since each inputrecovery computation is a maliciously secure 2PC that requires expensive input-consistency checks for both party's inputs, something we worked hard to avoid for the state/memory bits. Furthermore, each cheatingrecovery garbled circuit contains non-XOR gates that need to be garbled/evaluated 3s times at each timestep. These additional costs can become a bottleneck in the computation specially when the next-instruction circuit is small.
Cheating recovery at the end. It is natural to consider delaying the input-recovery computation until the last timestep, and only perform it once. If two of the threads in the final timestep (which also computes the final output of computation) output different values, the evaluator recovers the garbler's input. Unfortunately, however, this approach is not secure. In particular, a malicious P 1 can cheat in an intermediate timestep by garbling one or more incorrect circuits. This could either lead to two or more valid memory instruction/location outputs, or no valid outputs at all. It could also lead to a premature "halt" instruction. In either case, P 2 cannot yet abort since that would leak extra information about his private input. He also cannot continue with the computation because he needs to provide P 1 with the next instruction along with proof of its authenticity (i.e. the corresponding garbled labels) but that would reveal information about his input.
We now describe a solution that avoids the difficulties mentioned above and at the same time eliminates the need for input-consistency checks or garbling/evaluating non-XOR gates at each timestep. In particular, we delay the "proof of authenticity" by P 2 for all the memory instructions until after the last timestep. Whenever P 2 detects cheating by P 1 (i.e. more than two valid memory instructions), instead of aborting, he pretends that the computation is going as planned and sends "dummy memory operations" to P 1 but does not (and cannot) prove the authenticity of the corresponding wire labels yet. For modern tree-based ORAM constructions ([SvDS + 13, CP13], etc) the memory access pattern is always uniform, so it is easy for P 2 to switch from reporting the real memory access pattern to a simulated one. Note that in step (h) of the protocol, P 2 no longer needs to commit to the majority w j . As a result, step (j) of the protocol will be obsolete. Instead, in step (h), P 2 sends the inst t in plaintext. This instruction is the single valid instruction he has recovered or a dummy instruction (if P 2 has attempted to cheat).
After the evaluation of the final timestep, we perform a fully secure 2PC for an input-recovery circuit that has two main components. The first one checks if P 1 has cheated. If he has, it reveals P 1 's input to P 2 . The second one checks the proofs of authenticity of the inst instructions P 2 reveals in all timesteps and signals to P 1 to abort if the proof fails.
First cheating recovery, then opening the check circuits. For this cheating recovery method to work, we perform the evaluation steps (step (h)) for all time-steps first (at this stage, P 2 only learns the labels for the final output but not the actual value), then perform the cheating recovery as described above, and finally perform all the checks (step (i)) for all time-steps.
We now describe the cheating recovery circuit which consists of two main components in more detail.
• The first component is similar to the original cheating recovery circuit of [Lin13] . P 2 's input is the XOR of two valid output authenticity labels for a wire j at step t for which he has detected cheating (if there is more than one instance of cheating he can use the first occurrence). Lets denote the output authenticity labels for jth bit of block(Y (t,i) ) at time-step t with w (t,j,b) , b ∈ {0, 1}. Then P 2 will input w (t,j,0) ⊕ w (t,j,1) to the circuit. If there is no cheating, he inputs garbage. Notice that w (t,j,0) ⊕ w (t,j,1) = ∆ t for valid output authenticity values, as described in the protocol (note that we assume that all output authenticity labels in timestep t use the same offset ∆ t ).
P 1 inputs his input x 1 . He also hardcodes ∆ t . For timestep t (as shown in Figure 5 ) the circuit compares P 2 's input against the hardcoded ∆ t . If P 2 's input is the same as the ∆ t , cheating is detected and the circuit outputs 1. To check that P 2 's input is the same as at least one of the hard-coded ∆s, in the circuit of Figure 6 we compute the OR of all these outputs. Thus, if the output of this circuit is 1, it means that P 1 has cheated in at least one timestep.
To reveal P 1 's input, we compute the AND of output of circuit of Figure 6 with each bit of P 1 's input as depicted in Figure 7 . This concludes the description of the first component for cheating recovery.
• In the second component, we check the authenticity of the memory instructions P 2 provided in all timesteps. In particular, he provides the hash of concatenation of all output authentication labels he obtained during the evaluation corresponding to inst in all timesteps (P 2 uses dummy labels if he does not have valid ones due to P 1 's cheating), while P 1 does the same based on the plaintext instructions he received from P 2 and the labels which he knows. The circuit then outputs 1 if the two hash values match. The circuit structure is therefore identical to that of Figure 5 , but the inputs are the hash values. An output of 0 would mean that P 2 does not have a valid proof of authenticity.
As shown in the final circuit of Figure 7 then, if P 1 was not already caught cheating in the previous step, and P 2 's proof of authenticity fails, the circuit outputs a 1 to signal an abort to P 1 . This is a crucial condition, i.e., it is important to ensure P 1 did not cheat (the output of circuit of Figure 6 ) before accusing P 2 of cheating, since in case of cheating by P 1 say in timestep t, P 2 may be able to prove authenticity of the instructions for timestep t or later.
Efficiency: Following the techniques of [Lin13] , all the gates of Figures 5, and 6 can be garbled using non-cryptographic operations (XORs) and only the circuit of Figure 7 has non-XOR gates. More precisely it requires |x 1 | ANDs and a NOT gate. Of course, the final circuit will be evaluate using a basic maliciously secure 2PC. Thus, we need to add a factor of 3s to the above numbers which results in garbling a total of 3s(|x 1 | + 1) non-XOR gates which is at most 12s(|x 1 | + 1) symmetric operations.
The input consistency checks are also done for P 1 's input x 1 and P 2 's input which is a proof of cheating of length |∆| and a proof of authenticity which is the output of a hash function (both are in the order of the computational security parameter). We stress that the gain is significant since both the malicious 2PC and the input consistency cheks are only done once at the end.
Optimizations
Here we present a collection of further optimizations compatible with our 2PC protocols:
outt MatchBox t 
Hide only the input-dependent behavior
Systems like SCVM [LHS + 14] use static program analysis to "factor out" as much input-independent program flow as possible from a RAM computation, leaving significantly less residual computation that requires protection from the 2PC mechanisms. The backend protocol currently implemented by SCVM achieves security only against semi-honest adversaries. However, our protocols are also compatible with their RAM-level optimizations, which we discuss in more detail:
Special-purpose circuits. For notational simplicity, we have described our RAM programs via a single circuit Π that evaluates each timestep. Then Π must contain subcircuits for every low-level instruction (addition, multiplication, etc) that may ever be needed by this RAM program.
Instruction-trace obliviousness means that the choice of low-level instruction (e.g., addition, multiplication) performed at each time t does not depend on private input. The SCVM system can compile a RAM program into an instruction-trace-oblivious one (though one does not need full instruction-trace obliviousness to achieve an efficiency gain in 2PC protocols). For RAM programs with this property, we need only evaluate an (presumably much smaller) instruction-specific circuit Π t at each timestep t.
It is quite straight-forward to evaluate different circuits at different timesteps in our cut-and-choose protocol of Section 4. For the batching protocol of Section 3, enough instruction-specific circuits must be generated in the pre-processing phase to ensure a majority of correct circuits in each bucket. However, we point out that buckets at different timesteps could certainly be different sizes! One particularly interesting use-case would involve a very aggressive pre-processing of the circuits involved in the ORAM construction (i.e., the logic translating logical memory accesses to physical accesses), since these will dominate the computation and do not depend on the functionality being computed. 5 The bucket size / replication factor for these timesteps could be very low (say, 5), while the less-aggressively pre-processed instructions could have Along similar lines, we have for simplicity described RAM programs that require a random input tape at each timestep. This randomness leads to oblivious transfers within the protocol. However, if it is known to both parties that a particular instruction does not require randomness, then these OTs are not needed. For example, deterministic algorithms require randomness only for the ORAM mechanism. Concretely, treebased ORAM constructions [SCSL11, SvDS + 13, CP13] require only a small amount of randomness and at input-indepenent steps.
Memory-trace obliviousness. Due to their general-purpose nature, ORAM constructions protect all memory accesses, even those that may already be input-independent (for example, sequantial iteration over an array). One key feature of SCVM is detecting which memory accesses are already input-independent and not applying ORAM to them. Of course, such optimizations to a RAM program would yield benefit to our protocols as well.
Reusing memory
We have described our protocols in terms of a single RAM computation on an initially empty memory. However, one of the "killer applications" of RAM computations is that, after an initial quasi-linear-time ORAM initialization of memory, future computations can use time sublinear in the total size of data (something that is impossible with circuits). This requires an ORAM-initialized memory to be reused repeatedly, as in [GKK + 12].
Our protocols are compatible with reusing garbled memory. In particular, this can be viewed as a single RAM computation computing a reactive functionality (one that takes inputs and gives outputs repeatedly).
Other Protocol Optimizations
Storage requirements for RAM memory. In our cut-and-choose protocol, P 1 chooses random wire labels to encode bits of memory, and then has to remember these wire labels when garbling later circuits that read from those locations. As an optimization, P 1 could instead choose wire labels via F k (t, j, i, b), where F is a suitable PRF, t is the timestep in which the data was written, j is the index of a thread, i is the bit-offset within the data block, and b is the truth value. Since memory locations are computed at run-time, P 1 cannot include the memory location in the computation of these wire labels. Hence, P 1 will still need to remember, for each memory location , the last timestep t at which location was written.
Adaptive garbling. In the batching protocol, P 1 must commit to the garbled circuits and reveal them only after P 2 obtains the garbled inputs. This is due to a subtle issue of (non)adaptivity in standard security definitions of garbled circuits; see [BHR12a] for a detailed discussion. These commitments could be avoided by using an adaptively-secure garbling scheme.
Online/offline tradeoff. For simplicity we described our online/offline protocol in which P 1 generates many garbled circuits and P 2 opens exactly half of them. Lindell and Riva [LR14] also follow a similar approach of generating many circuits in an offline phase and assigning the remainder to random buckets; they also point out that changing the fraction of opened circuits results in different tradeoffs between the amount of circuits used in the online and offline phases. For example, checking 20% of circuits results in fewer circuits overall (i.e., fewer generated in the offline phase) but larger buckets (in our setting, more garbled circuits per timestep in the online phase).
Consider the following parameters. The number of actual data items stored in memory is denoted by N . In the level-0 tree of the ORAM, each node contains a constant number of blocks, Z. Each block consists of a metadata section of length D and a data section of the same size. Encrypting a block is implemented by AES-128. The security parameter (for key length and the length of the tag in authenticated encryption) is S . We also denote the Sender side storage for the ORAM by CS. For simplicity, we consider the case of a non-recursive ORAM. Therefore, CS is equal to N × D (i.e. Sender needs to store his share of metadata for all memory locations). Since we are assuming the use of cheating recovery technique, the number of threads is S = s.
To compare the efficiency of our approach with the naive transformation, we compare the overhead incurred by each approach. The overhead is computed in three aspects: 1) the number of extra gates necessary, 2) the extra input consistency checks, and 3) the extra storage requirement on Sender's side. These extra cost are computed over the run-time (T ) of the program. To clarify what we mean by "extra" overhead, consider the following.
If the size of a circuit (number of non-XOR gates) computing a semi-honest 2PC ORAM is denoted by SO and it stores CS bytes of data in Sender's side, using cut-and-choose and cheating recovery, we would at least need a circuit size of M S = s × SO for cut-and-choose and 3s × |x 1 | non-XOR gates for cheating recovery. We would also need s × CS bytes at Sender's side. Moreover, we would require the usual input consistency checks on x 1 . Therefore, in the run-time of the program, we would need M S T = M S ×T +3s×|x 1 | non-XOR gates and CS s = s × CS bytes of storage. Any cost other than M S T , CS s and the input consistency checks on the |x 1 | is considered an overhead. In what follows, we compute the overhead of the naive transformation approach.
For each invocation of ORAM, we have the following costs. We need to apply MACing and verification for 8D + 2CS bits. The authenticated encryption and decryption are each called on 3Z log N + Z blocks. We need to check input consistency on 2D + 3S + CS bits of data. And finally, the cost of cheating recovery for a circuit with input size M is 3s × M non-XOR gates. Thus, for an ORAM application with running time T and assuming the use of cheating recovery, the overhead for time-steps t 1 to t 2 such that t = t 2 − t 1 (corresponding to a single ORAM call) is as follows.
• MAcing: almost free.
• Verification: t s × (2 × (8D + 2CS)) non-XOR gates.
• Authenticated Encryption: t s × (13,600 × (3Z log N + Z)) non-XOR gates.
• Authenticated Decryption: t s × (13,728 × (3Z log N + Z)) non-XOR gates.
• Cheating Recovery: 3t s × (8D + 2CS) non-XOR gates.
Note that during the run time of a program, many such ORAM calls are performed such that T = t × num of calls.
Given D = 64 (so that we can feed 2D = 128 blocks of data to AES), N = 2 10 , S = 128, s = 40, Z = 4, and CS = N × D the total size of the overhead is T × 154.36 × 2 20 non-XOR gates. We would also have a computational overhead of O(T × IC × N D) for input consistency checks, where IC is the overhead of input consistency check for one bit of data on s garbled circuits. The Sender storage does not have any overhead.
A.2 Our approach
In our approach, we do not need to check the correctness of the state information using MAC. We also, do not need authenticated encryption and decryption. Moreover, we perform the cheating recovery only once at the end of the protocol. Therefore, our only overhead is introduced by the final cheating recovery which is equal to 3s × (|x 1 | + 1) (see section 4.5), where x 1 is the input to the circuit in the first time-step. Notice that only 3s of it is considered "extra" overhead.
Our approach achieves the above at a cost of increasing the Sender's storage requirements. In our approach Sender needs know for each memory location and for each thread, which circuit updated that location (i.e. he needs to store the seed (|seed| = S ) of the circuit) and also when was the last update performed (i.e. he needs to store a time-step t (|t| = log T )). This results in an extra N × s × (S + log T ) storage for Sender. As for input consistency, note that we do not need any input consistency checks for the intermediate circuits which are responsible for ORAM access.
Given the same concrete parameters as above, with the addition of |x 1 | = 128 the overheads are as follows. Our approach needs only 120 extra non-XOR gates at the cost of an extra 5M B + log T × 40KB of Sender storage. Table 1 provides a comparison of the overhead of the two approaches. Notice that as the running time increase our performance on circuit overhead increases linearly while the storage requirements increases only logarithmic. As can be seen in this table, our approach saves orders of magnitude on circuit size (number of non-XOR gates) and removing the need for costly input consistency checks, while adding only a small overhead on Sender storage size. 
B Concrete Bounds for Batch Preprocessing Protocol
Here we compute the number of circuits ρ needed per bucket in the protocol of Section 3. Let T denote the total number of time steps taken by the RAM program.
In that protocol, P 1 generates 2ρT circuits and exactly half are checked. The remaining ones get placed randomly into T buckets of ρ circuits each.
Let B(ρ, T, m) denote the probability that some bucket contains a minority of good circuits, when m circuits are bad. Then we have the following recurrence:
In this recurrence, i indexes the number of bad circuits in the first bucket. The fraction gives the probability of the first bucket receiving exactly i bad circuits. If i < ρ/2 then the condition is not yet met and it must further hold on the remaining T − 1 buckets; if i ≥ ρ/2 then the condition is met (hence 1).
Then let B * (ρ, T, m) denote the overall probability that an adversary will be successful by generating m bad circuits. Since the bad circuits must survive the cut and choose, and then a minority-good bucket is generated, we have:
m)
A value of ρ is sufficient to achieve security 2 −s if we have
Using these recurrences, we were able to exactly compute the minimal values of ρ for s = 40 and selected values of T :
T minimum ρ needed: 100 13 250 11 500 9 5,000 7 100,000 7 500,000 5
These are admittedly a very small sample size, though we can report that the points are fit closely (r = 0.97) by the linear regression ρ = 1.86 · (40/ log 2 T ) + 1.46. We note that the analyses of [HKK + 14] are slightly different, in that they need only a single good circuit in each bucket (i.e., the adversary succeeds only by making a bucket with no good circuits).
Checking a different fraction of circuits. In [LR14] , it is suggested to check a different (i.e., not 1/2) fraction of circuits in the offline phase. Indeed, if the parties check a smaller fraction of circuits, then P 1 generates fewer circuits overall (in the offline phase) but P 2 evaluates more circuits per timestep in the online phase (i.e., buckets must be bigger).
Suppose that 1 − φ fraction of circuits are checked in the offline phase. In order to have T buckets of ρ circuits each, P 1 must generate N = ρT /φ circuits total and the parties must check N − ρT of them. Then the probability of m bad circuits surviving the cut and choose is: We note that [LR14] also prove a bound on the bucket size ρ; namely, if:
ρ ≥ 2s + 2 log T − log(−1.25 log φ) − 1 log T + log(−1.25 log φ) − 2 then the total probability of a majority-bad bucket is at most 2 −s , when using buckets of size ρ. However, the exact bounds that we have computed are significantly tighter.
C Security Proof of Batching Protocol
In this section we prove the security of the batching protocol of Section 3.
Case 1: P 1 is corrupted. In this part, we are going to construct a simulator S progressively by using a standard hybrid argument. Let π f denote the protocol of section 3.2. We begin by showing the real view of P 1 during the protocol and then constructing the simulator such that S can therefore simulate the whole protocol independent of P 2 's input. We define H 0 to be the real protocol π f , i.e. P 1 and P 2 follow the protocol while S does not change anything, it acts the same as P 2 . During the execution of π f , the view of P 1 consists of 1. A random check circuits set S c .
2. A random subset of B of S e of size Θ(s/ log T ).
3. The view in the standard oblivious transfer protocols when running protocol GetInput 2 . Also, notice that P 2 may abort during the execution of protocol GetInput pub and GetInput 2 , S needs to compute such abort probabilities which are independent of P 2 's input.
4. At the end of π f , P 1 receives a messageỸ = inst(Y t ).
We construct S that simulates all P 1 's view of above. Since (a) and (b) does not depend on any of P 2 's input, S can just behave the same as an honest P 2 : For the cut-and-choose, S picks a random subset S c and sends it to P 1 , if any checking circuit in S c fails, S abort the protocol. Also, at each timestep t, S chooses a random subset B and announces it to P 1 . Now we describe the simulation of the rest of P 1 's view, via a sequence of hybrid interactions:
Hybrid H 0 : Ideal functionality: We define hybrid H 0 to be the same as the real interaction, where the simulator S plays the role of an honest P 2 and also honestly plays the role of the ideal functionalities of F xcom , F com and F ot . One thing we highlight is that S can extract P 1 's input and all wire labels from the ideal functionlities.
Hybrid H 1 : Ensure good buckets: At each timestep t, in step (3f) of Circuit Evaluation, S learns all garbled circuits and wire labels from the ideal functionality F com and F xcom , even for evaluation circuits. So we define hybrid H 1 to be identical to H 2 except that S will abort if B t does not have a majority of good circuits. Here, by "good" circuit we mean that its the circuit would be accepted by P 2 in checking phase if P 1 had opened it (along with its wire labels).
To show that H 1 ≈ H 0 , it suffices to show that the simulator aborts due to a bad bucket only with negligible probability.
In Appendix B, we define a value B * (ρ, T, m), which is the probability that the adversary successfully generates m malicious circuits, P 2 does not abort in the cut-and-choose phase, and yet some B t does not contain a majority of good circuits, when buckets have size ρ and there are T timesteps. This event corresponds exactly to the event that the simulator aborts in H 1 . We assume that ρ is chosen so that B * (ρ, T, m) < 2 −s , which is negligible.
Hybrid H 2 : ComputeỸ differently: Define H 2 to be the same as H 1 , except for the following changes. S extracts P 1 's plain input x 1 from the ideal functionalities in the first timestep, then executes the RAM program Π on inputs (x 1 , x 2 ) as RamEval(Π, M, x 1 , x 2 ).
At each "Circuit evaluation" step of the protocol, where P 2 performs Y t = EvalBucket(B t , X t , hd t ), S instead computes Y t = D (hdt) | * (st,inst,block) , where (st, inst, block) denote the internal variables defined in RamEval(Π, M, x 1 , x 2 ) for the corresponding timestep.
Then we claim that H 2 ≡ H 1 . This follows the correctness condition of garbling schemes. Specifically, the correctness condition for garbling schemes is:
Thus, if the majority circuits in bucket B t are good (which is guaranteed in these hybrids), it is easy to see that the correctness condition extends to EvalBucketas:
Then, one can verify that at each timestep t, the garbled inputs X t to EvalBucket always encode the inputs to Π within RamEval, and the garbled outputs Y t of EvalBucket always encode the outputs of Π within RamEval.
Hybrid H 3 : Selective abort: In subprotocol GetInput 2 , parties invoke an instance of a standard oblivious transfer protocol F ot . However, P 1 can use malicious wire labels for oblivious transfer and cause P 2 to abort when execute protocol π f . Then the probability of P 2 aborting depends on P 2 's input.
Our protocol used the technique of [LP07] to deal with selective aborts: namely, we encoded P 2 's input via s-way XOR shares. We define H 3 to be identical to H 2 except that S uses the technique of [LP07] to simulate the probability of P 2 's aborts, by extracting P 1 's inputs to F ot . The analysis of [LP07] shows that S can simulate the probability of P 2 's abort to within 2 −s+1 , where denotes the length of input and s is the security parameter. Hence H 3 ≈ H 2 .
Hybrid H 4 : Simulating ORAM memory accesses Let S ORAM be the simulator from the security definition of ORAMs (Section 2.1).
Notice that H 3 does not actually use all outputs of the RAM next-instruction circuit Π. In the output of RamEval(Π, M, x 1 , x 2 ), only I(Π, M, x 1 , x 2 ) is used in H 3 , to generateỸ t which is sent to P 1 . Define H 4 to be identical to H 3 except that S uses the simulated access pattern of S ORAM (1 λ , f (x 1 , x 2 )). From the security of ORAM, we have that H 4 ≈ H 3 . Now the simulator S described in hybrid H 4 is a valid simulator in the ideal world. S does not require P 2 's input x 2 -it only requires f (x 1 , x 2 ) which it can receive from the ideal functionality.
Case 2: P 2 is corrupted: First we give a overview of P 2 's real view in the protocol. Then we use a sequence of hybrids to construct S step by by step until eventually, S can implement the protocol independent of P 1 's input. Consider the protocol, P 2 's view consists of:
1. Commitments to all garbled circuits and wire labels under F com and F xcom .
2. The set of check circuits with size ρT .
3. The set of evaluation circuits with size ρT . 4. At each timestep t, P 2 receives wire labels from GetInput pub and P 1 's auxiliary input wire labels in subprotcols GetInput 1 .
5.
At each timestep t, P 2 receives his auxiliary input wire labels from F ot before he can evaluate the bucket B t . Notice that at the end of the protocol, P 2 sends the outputỸ = inst(Y t ) to P 1 . P 1 may abort ifỸ = inst(D (hd[Bt]) )| * instt .
We now describe the sequence of hybrids: Let H 0 be the real protocol π f and we formally describe the simulator S.
Hybrid H 0 : Ideal functionalities: We begin by letting S follow Π as an honest P 1 except that S also plays the role of all of the ideal functionalities.
Hybrid H 1 : Circuits: From P 2 's view, we see that P 2 eventually receives a set of check circuits S c and a set of evaluation circuits S e , both of size ρT . In the real world, P 1 generates those garbled circuits and commits to all of them in step (1) of pre-processing phase. We define H 1 to be the same as H 0 except that, instead of letting S generate all circuits at the very beginning, we have S simulate the commitment messages in the pre-processing phase, but actually garble a circuit (honestly) only when its associated commitments are about to be opened opened.
It is not hard to see that H 1 ≡ H 0 since we only delay the time of constructing circuits and such construction is independent of P 1 's input.
Hybrid H 2 : Visible wire labels: Now, we would like to generate simulated garbled circuits for the evaluation circuits, but before that we must know exactly which wire labels will be visible to P 2 .
Recall that in hybrid H 1 , S chooses random translation bits τ (E) for the wire labels. Then in subprotocol GetInput 2 , P 2 specifies certain inputs v and receives E| * v = E| τ (E)⊕v . Let λ(E) = τ (E) ⊕ v denote these select bits which become "visible" to P 2 .
We define H 2 so that S first chooses λ(E) at random. Then it arranges so that P 2 receives these wire labels from subprotocol GetInput 2 . At the same time, S still extracts P 2 's input v and sets τ (E) = λ(E) ⊕ v accordingly.
Similarly, in H 1 , P 2 chooses the translation bits τ (D) randomly for output wire labels D. Conversely, in H 2 , at the time that S actually garbles this circuit, S already knows what the logical input to this circuit will be. Hence, it can simulate the steps of RamEval and predict what the output v of this circuit will be. Hence it chooses λ(D) at random and sets τ (D) = λ(D) ⊕ v accordingly.
Also note that in subprotocol Solder(A, A ), P 1 is supposed to open a commitment to τ (A) ⊕ τ (A ). In this hybrid, however, we can replace τ (A) ⊕ τ (A ) = λ(A) ⊕ λ(A ) since the protocol only solders wires that will carry the same logical value.
We have that H 1 ≡ H 2 , since all the distributions involved are identical.
Hybrid H 3 : Simulated circuits: We define hybrid H 3 to be the same as H 2 except that S generates each evaluation circuit using the simulator S GC from the security of garbling schemes. More concretely, for each evaluation circuit, instead of running Garble(Π, E, D), we run S GC (Π, E| λ(E) , D| λ(D) ).
Then we have H 3 ≈ H 2 , by the security of the garbling scheme.
Hybrid H 4 : Simulated access pattern: Observe that in H 3 , the values λ(A) are used to simulate the garbled circuits, but corresponding τ (A) values are no longer used in the Solder subprotocol. The only place τ (A) values are used is when P 1 reveals τ (inst(D (hdt) )).
Hence, as S is simulating the steps of RamEval, the only values it actually uses in H 3 are the access pattern I(Π, M, x 1 , x 2 ). We define H 4 to be identical, except that S uses the simulated access pattern S ORAM (1 λ , f (x 1 , x 2 )). Then we have that H 4 ≈ H 3 by the security of ORAM.
Finally, H 4 describes a valid simulator S for the ideal model. It does not use P 1 's input x 1 except to obtain f (x 1 , x 2 ) to provide as input to S ORAM .
D Security Proof of Streaming Cut-and-choose Protocol
We assume an adversary A that can control any of the two parties (at most one party in a run of protocol). In what follows, we consider two cases: adversary controlling party P 1 or P 2 . 1. P 1 is corrupted. Simulator S sets the simulated P 2 's input as follows. It sets x 2 to all zeros since P 2 's input can be anything. It will randomly choose the values for r 2,1 , · · · , r 2,n as an honest P 2 would do, since the security of the ORAM depends on these values to be sampled randomly.
Simulator would pick a random string b as an honest P 2 would and sets it as the input of F ot . The adversary will choose the two keys for each thread and sends them as his input to F ot . Since S is simulating the F ot , it will know both the "eval" and "check" keys for all the threads. Later on in the protocol, this will enable it to extract P 1 's input.
At each time-step,
• S receives P 1 's garbled input as described in the protocol. More specifically, for the first time-step t = 1, S receives st 1 (E (t,i) )| * x1 and rand 1 (E (t,i) )| * r1 encrypted under k (i,eval) . Since the simulator already knows k (i,eval) , it can decrypt them to extract the actual garbled value. To extract the actual input, simulator needs to know the opening of circuit. S will not know that until the check phase, which happens after the evaluation phase.
• S continues with the rest of the protocol as an honest P 2 would, choosing a random matrix M, gather the garbled input, evaluate the "eval" circuits, check the "check" circuits, and perform output verification. S will abort if an honest P 2 would have aborted.
• In checking phase, simulator will receive the seeds encrypted by k (i,check) . Since it already knows k (i,check) for "all" the threads, it can extract P 1 's input (for the first time-step, t = 1) as follows.
S reconstruct the circuits of all "eval" threads using the seeds it had recovered. Afterwards, for the set of reconstructed eval circuits, it compares the input garbled values that it had received before against their corresponding circuits. If the garbled values match the opened circuits, S can extract P 1 's input for that circuit. Simulator will then set P 1 's input to be majority input to "eval" threads.
Simulator will abort if either of the following events happen. 1) if the majority of "eval" circuits are bad (the reconstructed circuits are not valid garbling of the function that is being computed).
2) The majority of extracted inputs are invalid (if the garbled input values do not match the reconstructed circuits) or the valid input are inconsistent. Adversary can distinguish the simulator in the following cases. 1) The majority of the "eval" circuits are bad. In this case, an honest P 2 will not abort but S will. Following the standard cut-and-choose arguments, this event happens with negligible probability. 2) All "eval" circuits are correct, the output of the hash function M is the same, but the inputs are inconsistent. In this case the honest P 2 will not abort but the simulator will. As discussed in [sS13] , the probability of this event is negligible.
• Simulator will pass the extracted input of P 1 to TTP. It will then resume the protocol by performing the steps in checking phase and following the protocol for the rest of the time-steps, behaving as an honest P 2 would.
• To ensure that A cannot distinguish the block output of each time-step from a real execution, S create a sequence of simulated, random looking RAM accesses and in each time-step it returns one of them. Since the simulator has the seed to all the eval circuits of each time-steps, as describe above, it can return correct garbled values corresponding the simulated RAM access that it wishes to return. By security of ORAM, this simulated RAM access is indistinguishable from the actual execution.
• When the protocol finishes, S will then output whatever A outputs.
To prove the indistinguishability consider the following arguments.
• The simulator can abort in three cases: 1) if the output of the augmented circuits are not identical, or 2) if P 1 fails the checking phase. None of them depend on P 2 's input. And 3) If inputs to "eval" threads are invalid, are inconsistent, or if the majority of "eval" circuits are bad circuits. As described above, in these cases A can distinguish the simulator but only with negligible probability.
• By security of ORAM, and the hiding property of the commitment scheme used, the choice of x 2 will not have a distinguishable effect on the view of A since all he sees during the run of the protocol are the commitments regarding the output authenticity and the memory access patterns. In particular, following the ORAM properties, memory access patterns look random in the view of the adversary and are indistinguishable regardless of P 2 's input value.
2. P 2 is corrupted. Similar to the previous case, simulator sets x 2 to all zeros and assigns a random value to r 1 . The rest of the simulation is as follows.
• S chooses random values for k (i,eval) and k (i,check) for all i ∈ {1, . . . , S} and sets them as input to F ot . By simulating F ot , S can extract P 1 's choices of cut-and-choose bits.
• Simulator follows the protocol as an honest P 1 would do and selects garbled values for input wires, and sends the encrypted garbled values corresponding to his inputs as stated in the protocol.
• S will use the garbled values corresponding to P 2 's input wires as input to F ot . As before, since S is simulating F ot , it will receive P 1 's input when he passes them to F ot . S will then pass P 1 's input to TTP and receive the result of the computation z.
• In time-step t = 1, as instructed by the protocol, S will interact with P 2 to receive the matrix M . It would then choose r randomly.
• Having the matrix M, P 1 's inputs, P 1 's choices of cut-and-choose bits, and the result of computation z, S proceeds to garble the circuits as follows.
(a) Simulator will create garbled circuits corresponding to checked threads as an honest P 1 would do. Simulator will also create the output authenticity values w j,0 and w j,1 . And computes the values for c i,j,b and h i,j,b , b ∈ {0, 1} for "check" circuits as an honest P 1 would. (b) For the "eval" circuits, S behaves differently. In each time-step (except for the last), circuits should output some garbled value for st output wires (can be any arbitrary value) and a valid garbled value for block output wires. In the last time-step, the st output wires represent the output of the computation, so they cannot be arbitrary. S creates a series of random looking memory access instructions that it intents to output at each time-step. It also knows the values z of the last time-step st output wires. By security of garbling scheme, S can simulate garbled circuits that always output the garbled value corresponding the these predetermind values and leak nothing else.
• After garbling the circuits, S sends them along with output authenticity checks as stated above.
• It will continue the protocol to the end as an honest P 1 would and aborts accordingly.
The proof of indistinguishability is as follows.
• For input consistency check circuits, since P 1 is choosing the random values r and feeds x 1 ||r to the hash function M , following [sS13] the output of the sub-circuit computing hash function M looks random.
• For the evaluation circuits, by security of the garbling scheme, A can guess the actual values of the garbled st values, with negligible probability. By security of the garbling scheme, if A knows one of the two garbled values of wire, he can correctly guess the other value only with negligible probability. Therefore, even though A will know the truth value of the garbled value corresponding to block output wires, he cannot obtain the other garbled value. Therefore, by security of the encryptions used, he cannot decrypt the c i,j,1−b since he does have access to the decryption key. As a result, A cannot distinguish the fake circuit from the correct circuit, except with negligible probability. For the last time-step, we can employ the same reasoning about the indistiguishability of the fake circuit that always outputs z with the actual circuit that computes z.
• Moreover, by security of the ORAM, the randomly created access patterns are indistinguishable from the real run of the protocol.
• The check circuits are constructed correctly and by security of Yao's protocol they do not leak any information regarding P 1 's input. Therefore, they do not effect the view of the A.
• In the rest of the simulation S acts as an honest P 1 would and aborts accordingly.
