Abstract. Private function evaluation (PFE) is a special case of secure multi-party computation (MPC), where the function to be computed is known by only one party. PFE is useful in several real-life settings where an algorithm or a function itself needs to remain secret due to its confidential classification or intellectual property. In this work, we look back at the seminal PFE framework presented by Mohassel and Sadeghian at Eurocrypt'13. We show how to adapt and utilize the wellknown half gates garbling technique (Zahur et al., Eurocrypt'15) to their constant round 2-party PFE scheme. Compared to their scheme, our resulting optimization considerably improves the efficiency of both the underlying Oblivious Evaluation of Extended Permutation (OEP) and secure 2-party computation (2PC) protocol, and yields a more than 40% reduction in overall communication cost (the computation time is also slightly decreased, and the number of rounds remains unchanged).
Introduction
Suppose that one invents a novel and practical algorithm capable of being directly used to detect and identify criminals in crowds with a high degree of precision based on information about their behaviors obtained from street video recordings. It is obvious that this algorithm would be precious and that many governmental organizations would pay millions to apply it. The inventor of the algorithm has the right to keep the algorithm commercially confidential, and to offer only its use for a certain fee since it is her own intellectual property. On the other hand, governmental organizations will generally be unwilling to reveal their databases and records to the parties to whom they do not sufficiently trust. This and such problems have been addressed by private function evaluation (PFE).
PFE is a special case of secure multi-party computation (MPC) in which n participants jointly compute a function f on their private inputs x 1 , . . . , x n , and obtain the result f (x 1 , . . . , x n ). The difference of PFE from the standard MPC setting is that here the function f is also a private input of one of the participants 4 . A PFE solution would be more useful than conventional MPC in various applications, e.g., the ones where the function itself contains private information, or reveals security weaknesses; or the ones where service providers prefer hiding their function or its specific implementation as their intellectual property. The task of designing efficient PFE protocols for special or generic purposes is addressed in several papers in literature [1] [2] [3] [4] [5] [6] .
Generic PFE solutions are mainly classified into two categories. The first one is the universal circuit [7] based approach that works with any MPC protocol. The ideal functionality of MPC F Ug for a universal circuit U g takes as input a certain sized (g) boolean circuit representation C f of the private function f , and inputs of parties x 1 , . . . , x n (i.e., F Ug (C f , x 1 , . . . , x n )), and outputs f (x 1 , . . . , x n ). The works based on this approach mainly aim to reduce the size of universal circuits, and to optimize their implementations using some MPC techniques [1, 2, 8, 9] . Unfortunately, existing universal circuit based schemes result in massive circuit sizes, which is the root cause of their inefficiency. Therefore, some recent PFE solutions avoid using universal circuits for the sake of practicality [4] [5] [6] . An early attempt on this second category is Paus, Sadeghi, and Schneider's work [4] . They introduce -what they called-a semi-private function evaluation in which the type of the gates is a secret of one party, but the circuit topology (i.e., the set of all connections of predecessors and successors of each gate) is public to both parties. Due to the weaker assumption of semi-privacy, their approach does not provide a complete PFE solution. Another significant improvement in this category comes from Katz and Malka's 2-party PFE (2-PFE) scheme with a mechanism for hiding the circuit topology based on singly homomorphic encryption [5] . Although their scheme is favorable in terms of asymptotic complexity, its high computational cost unfortunately makes its use impractical.
In [6] , Mohassel and Sadeghian come up with a prominent and efficient PFE framework which ensures hiding the topology of the circuit. It has been basically designed for the semi-honest model, and later generalized to the malicious case by Mohassel, Sadeghian and Smart [10] . Their generic PFE framework can be applied to both arithmetic and boolean circuits. Considering the 2-party boolean circuit based scheme, they split the PFE task into two sub-functionalities: (1) Circuit topology hiding (CTH), (2) Private gate evaluation (PGE). Briefly speaking, in CTH, a series of procedures is performed: First, the function owner (say P 1 ) detaches the interconnections of the gates to obtain single gates, and keeps the topological mapping of the circuit private. Second, P 1 and the other party (say P 2 ) engage in an oblivious evaluation of switching network (OSN) 5 protocol which consists of O(glg(g)) oblivious transfer (OT) operations (throughout this paper, g denotes the number of gates, and lg() denotes the logarithm base 2). Next, in1. Regarding the OSN phase: (1) We reduce the number of required OTs by N = 2g. Concretely, [6] Among the above improvements, the foremost gain comes from the reduction in the input sizes of the OSN protocol. The overall communication cost of our scheme is (6N lgN + 0.5N + 3)λ bits 7 , which is a significant improvement compared to [6] , whose communication cost is (10N lgN +4N +5 )λ bits. This means more than 40% saving in bandwidth size. Table 1 illustrates  the comparison of two schemes in terms of communication cost for various circuit sizes (see also  Table 7 for a more detailed comparison).
Organization. In Section 2, we give preliminary information about oblivious transfer, Yao's garbled circuits, and half gates optimization. In Section 3, we present the 2-PFE framework and scheme of [6] in detail. In Section 4, we introduce our 2-party PFE protocol and analyze its security in the semi-honest model. In Section 5, we analyze our protocol in terms of communication and computation complexities and compare it with 2-PFE scheme in [6] . Finally, Section 6 concludes the paper with projecting forward to future work.
Preliminaries
This section provides some background information on oblivious transfer, Yao's garbled circuits, and the state-of-the-art half gates optimization.
Oblivious transfer
A k-out-of-m oblivious transfer protocol is a two-party protocol where one of the parties is the sender (S ) who has set of values {x 1 , . . . , x m }, and the other one is the receiver (R ) who has k selection indices. At the end of the protocol, R only learns k of the S 's inputs according to his selection indices; whereas S learns nothing. Oblivious transfer (OT) is a critical underlying protocol used in many MPC constructions [17, 18] .
OT extension is a way of obtaining many OTs from a few number of OT runs and cheap symmetric cryptographic operations. Ishai et al. constructed the first OT extension method [19] , which reduces a given large number of required OTs to a fixed size security parameter. Later, several OT extension schemes based on [19] are proposed for improving the efficiency [20, 21] .
Yao's protocol
Yao's protocol is essentially a 2PC protocol secure in the semi-honest adversary model. It allows two parties, the garbler and the evaluator, to evaluate an arbitrary polynomial-sized function f (x) = f (x 1 , x 2 ), where x 1 is the garbler's private input and x 2 is the evaluator's private input, without leaking any information about their private inputs to each other beyond what is implied by the pure knowledge of the function output. The main idea is that the garbler prepares an encrypted version of C f (a boolean circuit representation of f ). This encrypted version is called the garbled circuitF and sent to the evaluator. The evaluator then computes the output from the garbled version of the circuit without obtaining the garbler's input bits or intermediate values.
Recently, several major optimizations proposed for Yao's protocol, mainly aiming at bandwidth efficiency and or reduction of the garbling and evaluating costs (e.g., garbled row reduction 3 ciphertexts (GRR3) [22] , free-XOR [23] , garbled row reduction 2 ciphertexts [15] , pipelining [24] , fleXOR [14] , half gates [16] and [25] 
Defines the half gate:
The garbler sends T Ec . Here, we briefly describe the garbling procedure of odd gates using the half gates technique, and refer the reader to [16] for further details and its security proof. Any odd gate type can be written as Equation (2.1) where α 1 , α 2 and α 3 define the gate type, e.g., setting α 1 = 0, α 2 = 0, α 3 = 1 results in a NAND gate [16] . Let v i denote the one bit truth value on the i th wire in a circuit.
The garbler garbles an odd gate by following the steps for both half gates in Table 2 . The tokens for FALSE and TRUE on the i th wire are denoted as w 0 i and w i , respectively. The global free-XOR offset is denoted as R. The garbler sets R {0, 1} λ−1 1 globally, and w 0 i {0, 1} λ and w 1 i ← w 0 i ⊕ R for each wire. We have lsb(R) = 1 so that lsb(w 0 i ) = lsb(w 1 i ). w b Gc and w b
Ec denote the tokens for the garbler and the evaluator half gate outputs for truth value b, respectively. T Gc and T Ec denote the λ-bit strings needing to be sent for the garbler and evaluator half gates, respectively. Let w i be a token on i th wire obtained by the evaluator who does not know its corresponding truth value v i . For the ith wire, let p i := lsb(w 0 i ), a value only known to the garbler. If two symbols are appended, an AND operation is implied, i.e., ab = a ∧ b. H : {0, 1} λ × Z → {0, 1} λ denotes a hash function with circular correlation robustness for naturally derived keys 9 , having the security parameter λ.
The token on the output wire of the odd gate for FALSE is w 0 Gc ⊕ w 0 Ec since the output of the odd gate is an XOR of half gate outputs. The two ciphertexts computed T Gc and T Ec is needed to be sent to the evaluator for each gate.
2-Party PFE Framework
In [6] , Mohassel and Sadeghian present a generic PFE framework for boolean and arithmetic circuits in multi-party case. However, in this work, our focus is mainly on boolean circuits in 2-party setting. Throughout this paper the party who knows the private function is called P 1 , also playing the evaluator role in 2PC; whereas the other party is be called P 2 , also playing the garbler role in 2PC. In order to achieve 2-PFE, Mohassel and Sadeghian show that hiding the parties' private inputs, the topology of the boolean circuit representation C f of private function, and the functionality of its gates is required. The framework is not concerned with hiding the numbers of gates, input/output wires, and the type of the gates of the circuit. The complete task of PFE is classified into two functionalities: (1) Circuit Topology Hiding (CTH), (2) Private Gate Evaluation (PGE).
In a nutshell, in CTH, P 1 extracts the topological mapping π f (kept private) from the circuit representation C f , and converts the whole circuit into a collection of single gates. Then P 1 and P 2 engage in an oblivious evaluation of switching network (OSN) protocol where P 2 obliviously obtains tokens on gate inputs. In PGE, a 2PC protocol is performed to obtain the final output.
In the rest of this section, we describe the notions related to CTH, and the 2-PFE scheme proposed in [6] .
Context of CTH
Let g, n and m denote the number of gates (size), the number of inputs and the number of outputs of C f , respectively. OW denotes the set of outgoing wires which is the union of the input wires of the circuit and the output wires of its non-output gates (having M = n+g −m elements in total whose indices are chosen randomly): {ow 1 , . . . , ow n+g−m }. Similarly, IW denotes the set of incoming wires which is the input wires of each gate in the circuit (having N = 2g elements in total whose indices are chosen randomly): {iw 1 , . . . , iw 2g }.
The full description of the topology of a boolean circuit C f can be accomplished by a mapping π f : OW → IW. The mapping π f maps i to j (i.e., π f (i) → j), if and only if ow i ∈ OW and iw j ∈ IW correspond to the same wire in the circuit C f . Note that the mapping π f is not a function if an outgoing wire corresponds to more than one incoming wire, while its inverse π −1 f is always a function. Figure 1 shows an example circuit C f and its mapping π f . From the inclusion-exclusion principle, we obtain Equation (3.2) that gives the number of possible mappings for the given M and N values.
In the context of CTH, ρ indicates the number of possible circuit topologies. Thus, the security of CTH is proportional to ρ.
In what follows, we describe the main elements of CTH functionality whose essential target is the oblivious application of the mapping π f .
Oblivious evaluation of mapping. A mapping π : {1, . . . , N } → {1, . . . , N } is a permutation if it is a bijection. We next define the extended permutation (EP) as follows:
Definition 1 (Extended permutation (EP)). Given the positive integers M and N , a mapping π : {1, . . . , M } → {1, . . . , N } is called an EP if for all y ∈ {1, . . . , N }, there exists a unique x ∈ {1, . . . , M } such that π(x) = y.
The 2-party oblivious evaluation of extended permutation (2-OEP) functionality is defined as follows:
Definition 2 (2-OEP functionality). The first party P 1 's inputs are an EP π : {1, . . . , M } → {1, . . . , N }, and a blinding vector for incoming wires T := [t j {0, 1} λ ] for j = 1, . . . , N . The other party P 2 's inputs are a vector for outgoing wires
. . , N while P 1 learns nothing.
We call any 2-party protocol construction realizing the 2-OEP functionality as a 2-OEP protocol. Mohassel and Sadeghian have constructed a constant round 2-OEP protocol by introducing the OSN structure. Since we also utilize their 2-OEP protocol in our scheme, here we give some of its details. Mainly, they first construct an extended permutation using switching networks, then provide a method using OTs for oblivious evaluation of the resulting switching network. We refer our reader to [6] for the security proof and application of this construction on various MPC protocols.
EP construction from switching networks. Each 2-switch takes two λ-bit strings and two selection bits as input, outputting two λ-bit strings [6] . Each of the outputs may get the value of any of the input strings depending on the selection bits. This means for input values (x 0 , x 1 ), there are four different switch output possibilities. The two selection bits s 0 and s 1 are used for determining the switch output (y 0 , y 1 ). In particular, the switch outputs y 0 = x s 0 , and y 1 = x s 1 .
Unlike 2-switches, 1-switches have only one selection bit s. For an input (x 0 , x 1 ), a 1-switch outputs one of the two possible outputs: (x 0 , x 1 ) if s = 0, and (x 1 , x 0 ) otherwise.
Definition 3 (Switching Network (SN)).
A switching network SN is a collection of interconnected switches whose inputs are N λ-bit strings and a set of selection bits of all switches, and whose outputs are N λ-bit strings.
The mapping π : {1, . . . , N } → {1, . . . , N } related to an SN (π(i) = j) implies that when the SN is executed, the string on the output wire j gets the value of that on the input wire i. A permutation network PN is a special SN whose mapping is a permutation of its inputs. In contrast to SNs, PNs compose of 1-switches.
Waksman proposes an efficient PN construction in [26] . Mainly, this work suggests that a PN with N = 2 κ inputs can be constructed with N lgN − N + 1 switches.
In [6] , the authors propose the construction of an extended permutation by combining SNs and PNs. However, extended permutations differ from SNs in that the number of their inputs M and that of their outputs N need not be equal (M ≤ N ). N − M additional dummy inputs are added to the real inputs of an EP π : {1, . . . , M } → {1, . . . , N } in order to simulate it as an SN.
The SN design for extended permutation is divided into three components (see also Figure  2 ):
1. Dummy placement component. Dummy placement component takes N input strings composing of real and dummy ones. For each real input that π maps to k different outputs, the dummy-value placement component's output is the real string followed by k − 1 dummy strings.
2. Replication component. Replication component takes the output of the dummy-value placement component as input. If a value is real, it goes unchanged. If it is a dummy value, it is replaced by the real value which precedes it. This can be computed by a series of N − 1 2-switches whose selection bits (s 0 , s 1 ) are either (0,0) or (0,1). If the selection bits are (0,0), that means x 1 is dummy, and x 0 goes both of the outputs. If they are (0,1), that means both inputs are real, and both are kept on the outputs in the same order. At the end of this step, all the dummy inputs are replaced by the necessary copies of the real inputs. 3. Permutation component. Permutation component takes the output wires of the replication component as input. It outputs a permutation of them so that each string is placed on its final location according to the prescription of mapping π.
An efficient implementation of both dummy placement and permutation blocks is via the use of a Waksman permutation network. Combining the three components, one gets a larger switching network, where the number of switches needed is 2(N lgN − N + 1) + N − 1 = 2N lgN − N + 1 [6] . The topology of the whole switching network is the same for all N input EPs, and the selection bits specify the input values appearing on the outputs.
Oblivious evaluation of SN construction (OSN).
We continue with describing Mohassel and Sadeghian's method for oblivious evaluation of switching networks using OTs.
Adapting the switching network construction to the 2-OEP functionality, P 1 produces the selection bits of the switching network using π, and has a blinding vector T . P 2 has an input vector for outgoing wires W . At the end, P 2 learns the switching network's blinded output vector for incoming wires S, and P 1 learns ⊥. We describe the oblivious evaluation of one of its building block, i.e., a single 2-switch u.
Let the input wires of the 2-switch be a and b, and its output wires be c and d. Each of the four wires of the switch has a uniformly random string assigned by P 2 as her share of that wire in the preparation stage, namely, r a , r b , r c , r d {0, 1} λ for a, b, c, d, respectively. P 1 has the strings w 1 ⊕ r a and w 2 ⊕ r b as his shares for the two input wires. The purpose is enabling P 1 to obtain his output shares according to his selection bits. There are four possibilities for P 1 's output shares depending on his selection bits s 0u and s 1u (see Table 3 ). Table 3 . P1 learns one of these rows according to his selection bits. P 2 prepares a table with four rows using r a , r b , r c , r d (see Table 4 ). P 1 and P 2 engage in a 1-out-of-4 OT in which P 2 inputs the four rows that she has prepared, and P 1 inputs his selection bits for the switch u. At the end, P 1 learns one of the rows as the output in the table. Assume that P 1 's selection bits are (1,0). This means P 1 retrieves the third row, i.e., (r b ⊕ r c , r a ⊕ r d ). According to the his selection bits, P 1 XORs his input share w 2 ⊕ r b with r b ⊕ r c , as well as his other input share w 1 ⊕ r a with r a ⊕ r d , and obtains his output shares w 2 ⊕ r c and
The oblivious evaluation of the entire SN for EP goes as follows. In an offline stage, P 2 sets a uniformly random λ-bit string to each wire in the switching network. P 2 blinds each Table 4 . P1 gets one of these rows by engaging in 1-out-of-4 OT with P2. element of her input vector W and the dummy strings which she assigned for N − M inputs of the switching network with her corresponding shares for input wires (an XOR operation is involved in each blinding). P 2 prepares tables for each switch in the switching network similar to Table 3 and Table 4 . However, both tables for each switch in this scenario has two rows since each switch, in fact, has two possible outputs 10 . This means each switch in the entire switching network can be evaluated running 1-out-of-2 OT. Moreover, the construction permits parallel OT runs and or use of OT extension, resulting in a constant round scheme. P 2 needs to send her blinded inputs to P 1 , which can be done during her turn in OT extension in order not to increase the round complexity unnecessarily. Once P 1 gets P 2 's blinded inputs which are also his input shares and the outputs of all OTs, he evaluates the entire switching network in topological order, obtaining his output shares. P 1 blinds his output shares with corresponding elements of T (again, an XOR operation is involved in each blinding), and sends the resulting vector to P 2 . P 2 unblinds each element using her shares for output wires, and obtains the OEP output S. The extended permutation in this construction includes 2N lgN − N + 1 switches in total, requiring 2N lgN − N + 1 OTs for their oblivious evaluation.
Mohassel and Sadeghian's 2-PFE scheme
Here we provide an outline of Mohassel and Sadeghian's 2-PFE construction, and refer the reader to their work for detailed information and its security proof [6] . Their protocol is as follows. P 2 first randomly generates tokens w 0 i , w 1 i {0, 1} λ for each ow i ∈ OW corresponding to FALSE and TRUE, respectively. P 1 also generates random blinding strings t 0 j , t 1 j {0, 1} λ for each iw j ∈ IW. And then P 1 and P 2 engage in OSN slightly modified from their 2-OEP protocol, where at the end, P 2 learns [σ 0 j = w 0
. P 2 garbles each gate by encrypting the tokens w 0 c , w 1 c on its outgoing wire with the blinded strings σ 0 a , σ 1 a , σ 0 b , σ 1 b on its incoming wires according to its truth table. P 2 sends the garbled gates and her garbled input tokens to P 1 . P 1 gets his garbled input tokens using OT which can be done in an earlier stage together with other OTs not to increase round complexity. Using the circuit mapping, his blinding strings, the garbled gates and the garbled inputs P 1 evaluates the whole garbled circuit, and obtains the tokens of output bits of f (x). In [6] , a gate hiding mechanism is not provided for 2-PFE scheme but instead all gates in the circuit are let to be only a NAND gate. Mohassel and Sadeghian's scheme involves oblivious evaluation of a switching network made of 2N lgN + 1 switches. This is composed of an additional N switches to the ones in their EP construction. The oblivious evaluation of this switching network requires 2N lgN + 1 OTs [6] . All of the OTs in the protocol can be combined for just one invocation of OT extension.
Our More Efficient 2-Party PFE Scheme
In what follows, we describe our scheme in detail (see also Figure 3 ). P1 compiles f into a boolean circuit C f , and extracts the mapping π f and the template of private circuitC f . P1 sendsC f to P2. P1 randomly generates the vector T . P2 randomly generates the vector W 0 . They engage in a 2-OEP protocol where P2 learns S 0 as the output. With the knowledge of W 0 , S 0 andC f , P2 garbles each gate and sends the garbled circuit to P1. With the knowledge of π f ,C f , T , the garbled circuit and the garbled inputs, P1 evaluates the whole garbled circuit.
In the preparation stage, P 1 compiles the function into a boolean circuit C f consisting of only NAND gates 11 , and extract the circuit mapping π f by randomly assigning incoming and outgoing wire indices. Both party need to have the pre-knowledge of template of private circuit C f defined as follows:
Definition 4 (Template of Private Circuit (C f )). A template of private circuitC f is some information about a circuit C f which consists of: (1) the number of each party's input bits, (2) the number of output bits, (3) the total numbers of incoming (N ) and outgoing wires (M ), (4) the incoming and outgoing wire indices which belong to the same gates, (5) the outgoing wire indices corresponding to each parties inputs, and (6) the incoming wire indices belonging to output gates.
We continue with describing the main parts of our scheme, namely 2-OEP and 2PC garbling protocols. Our complete 2-party PFE protocol is provided in Appendix A.
Use of 2-OEP protocol
Let w 0 i and w 1 i be the tokens for FALSE and TRUE on the ith outgoing wire ow i ∈ OW, respectively, and R be the global free-XOR offset [23] throughout the circuit. P 2 sets w 0 i 11 Any functional-complete gate can be used to rule out the need for a gate hiding mechanism as in [6] . {0, 1} λ for each ow i . The blinding string on the jth incoming wire iw j ∈ IW is denoted as t j . P 1 sets t j {0, 1} λ for each iw j . P 1 and P 2 engage in a 2-OEP protocol where P 1 's inputs are π f and a blinding vector for incoming wires T := [t j ] for j = 1, . . . , N , and P 2 's inputs is a token vector for FALSE on outgoing wires W 0 := [w 0 i ] for i = 1, . . . , M . At the end, P 2 learns the vector of blinded strings for
. . , N , while P 1 learns ⊥.
Garbler half gate (p b known to the garbler)
Evaluator half gate (p b ⊕v b known to the evaluator) Defines the half gate:
P2 sends TGc, TEc, and ψc. Table 5 . Adapting half gates technique to our 2-PFE for garbling an odd gate. Here, α1, α2 and α3 define the gate type (e.g., α1 = 0, α2 = 0 and α3 = 1 for a NAND gate, see Equation (2.1) Since our protocol allows all wires in the circuit to have the same offset R, unlike [6] , P 1 needs only a single blinding string t j for each wire, and P 2 does not need to input both tokens w 0 i and w 1 i to the 2-OEP protocol. This leads to a considerable decrease in communication cost compared to [6] , in which two blinding strings t 0 j and t 1 j for each wire are used, and both w 0 i and w 1 i are inputs to the OSN protocol (slightly modified 2-OEP protocol).
Our garbling scheme for 2-PFE
This section presents our garbling scheme based on half gates technique [16] . Similar to half gates technique, P 2 sets R {0, 1} λ−1 1, w 1 i ← w 0 i ⊕ R for TRUE on each ow i , and σ 1 j ← σ 0 j ⊕ R for TRUE on each iw j . We have lsb(R) = 1 so that lsb(w 0 i ) = lsb(w 1 i ), and lsb(σ 0 j ) = lsb(σ 1 j ). P 2 follows the steps in Table 5 in order to garble each odd gate.
We now give some necessary notation as follows. Let w 0 c and w 1 c denote both tokens on an outgoing wire, while σ 0 a , σ 1 a , σ 0 b , σ 1 b denote the blinded strings on incoming wires. Let also v j denote the one bit truth value on the jth incoming wire in a circuit. Further, w b
Gc and w b
Ec denote the tokens for the garbler and the evaluator half gate outputs for truth value b, respectively. T Gc and T Ec denote the λ-bit strings needed to be sent for the garbler and evaluator half gates, respectively. ψ c denotes the additional λ-bit string needed to be sent for carrying to the specific output token. w i and σ j are the token on ith outgoing wire and the blinded string on jth incoming wire obtained by P 1 while evaluating the garbled circuit, respectively. For the jth incoming wire, let p j := lsb(σ 0 j ) be a value only known to P 2 . If two symbols are appended, we imply an AND operation, i.e., ab = a ∧ b. H : {0, 1} λ × Z → {0, 1} λ denotes a hash function with circular correlation robustness for naturally derived keys, having the security parameter λ.
putF in topological order using π f ifGi is a non-output gate then for ow i ∈ Inputs(F ) and Fig. 4 . Our complete half gate based garbling scheme for 2PC. Gb NAND and Gb * NAND are the original half gate and our modified NAND garbling procedures, respectively.
Following the framework 12 of [27] , Figure 4 depicts our complete garbling scheme, composed of the following procedures. The garble procedure Gb takes 1 λ ,C f , S 0 and W 0 as input, and outputs (F ,ê,d) whereF is the garbled version ofC f ,ê is the encoding information, andd is decoding information. Gb calls two private gate garbling procedures: (1) Gb * NAND garbles nonoutput NAND gates, and outputs (T G , T E , ψ), (2) Gb NAND garbles output NAND gates, and outputs (T G , T E , Y 0 ). En is the encode algorithm that takes the plaintext inputx of the circuit and e as input, and outputs a garbled inputX. Ev is the evaluate procedure that takes the inputsF ,X, π f and T , and outputs garbled outputŶ . De is the decode algorithm that takeŝ Y and d as input, and outputs the plaintext outputŷ of the circuit.
We highlight that the main difference of our garbling scheme from the half gates technique is that the former requires an additional ciphertext ψ c per gate. This is required because of the nature of 2-PFE, in which the tokens on an outgoing wire are predetermined and specified values, while in the in half gates they are indeed a function of the input strings. Since in our scheme the output tokens of output gates are not predetermined, these gates can be garbled with half gates technique. Each output gate is then garbled with two ciphertexts.
Note also that P 1 gets his own garbled inputs by means of OT. This can also be done in an earlier stage together with other OTs in 2-OEP protocol (if OSN construction is used) in order not to increase round complexity. For this setting, P 2 needs to pick R and compute the tokens for TRUE on P 1 's input wires before 2-OEP protocol. This setting is compatible with our protocol as well.
for di ∈d do {modify the antepenultimate line of Gb} Fig. 5 . Modification of our garbling scheme in Figure 4 for achieving authenticity property defined in [27] .
Security of the proposed protocol
Security of our protocol basically relies on the main line of the security proofs in [5, 6] . Table 6 . Analysis of communication costs for 2-party PFE schemes (see Section 3.1 for details of transfers in the OSN phases). Theorem 1. Given that the 2-OEP protocol used in our protocol is secure in presence of semihonest adversaries, and that our garbling scheme based on half gates technique uses a hash function H with circular correlation robustness for naturally derived keys [16] , then our two party PFE protocol is secure in the presence of semi-honest adversaries.
Proof sketch. In accordance with [5, 6] , we divide our scheme into two phases: (1) generation of random strings the wires of each gate (via 2-OEP protocol), (2) execution of Yao based 2PC scheme (via our half gates based protocol) using those strings. In the first phase, we utilize the 2-OEP protocol in a black-box manner. The second phase esentially utilizes our half gates based garbling. Under the following two conditions, the security of our protocol depends on the security of half gates garbling scheme against semi-honest adversaries:
1. The collection of 3g − o λ-bit strings used in garbling (i.e., 2g for FALSEs on gate inputs and g − o for FALSEs on non-output gate outputs) must be indistinguishable to P 2 from a uniformly random distribution U (3g−o)λ . 2. P 1 cannot have any further knowledge other than a specific linear constraint set between a subset of keys (the tokens on the outgoing wires and the blinded strings on the incoming wires). This set of constraints cannot provide any advantage to obtain any key values.
In [6] , it is shown that the first phase of the 2-PFE meets the two conditions mentioned above for the set of generated keys. Hence, the security of our overall protocol relies on the security of our half gates based garbling scheme. In what follows, we provide a reduction to the security of half gates technique which is already proven in [16] .
The main difference of our garbling scheme from [16] is the additional third ciphertext ψ i for each garbled non-output gate. ψ i is the linear constraint between the output token of a nonoutput gate (w i ) which is chosen uniformly at random and the half gate output tokens (w Gi and w Ei ) whose value depends on the input blinded strings. In P 1 's view, who does not have any pre-information about w i , ψ i is indistinguishable from U λ due to the security characteristics of one-time pad encryption. Hence, ψ i does not give any information about the input tokens of the gate. Therefore, the security of our scheme is reduced to half gates technique in terms of privacy and obliviousness [27] .
Since ψ i is not included in an output gate, it does not affect authenticity of half gates garbling. In accordance with [16] , our garbling scheme in Figure 4 can be modified as in Figure 5 to achieve authenticity property [27] .
Complexity Analysis
We analyze the complexity of our protocol, and compare it with Mohassel and Sadeghian's 2-party PFE scheme [6] . Without loss of generality, in order for a fair comparison, we assume that the 2-OEP protocol of our scheme is also realized by the OSN construction in [6] , and that the OSN phases in both protocols are optimized with the General OT extension scheme of Asharov, Lindell, Schneider, and Zohner [21] . Similar results can be obtained by using other Ishai et al. based OT extension schemes [19, 20] as well.
Regarding the OSN phase, the total number of OTs in our 2-PFE protocol is 2N lgN −N +1, while it is 2N lgN + 1 in [6] (see Section 3.1). Moreover, our protocol requires only one of the tokens on a wire entering the OSN phase, so the size of the rows in Table 4 which enter each OT is reduced by a factor of two [6] , further resulting in a significant decrease in communication cost.
Regarding the 2PC phase, our scheme garbles each non-output gate with three ciphertexts, and each output gate with two ciphertexts. This yields more than 25% reduction compared to the same phase in the scheme in [6] . Table 7 . Communication cost comparison of 2-party PFE schemes in terms of λ bits. Table 6 shows the number of strings and their corresponding lengths sent in each turn in both schemes (see also Section 3.1 for details of transfers in the OSN phases). We omit the OTs for P 1 's garbled input, the transfers for decoding the garbled output, and the base OTs in the OT extension scheme [21] . The strings sent by P 2 during the OT extension in [6] , in fact, consists of 4N lgN − 2N + 2 of λ-bit stings and 2N λ-bit strings. The data sent by P 2 before OT extension can also be sent during P 2 's turn in OT extension for a saving in the number of rounds. Table 7 reflects the communication cost reduction resulting from our 2-PFE protocol for various gate number levels including 2 6 , 2 10 , 2 14 , 2 18 , and 2 20 .
Recently, in [13] , Wang and Malluhi attempt to improve the 2-PFE scheme in [6] by removing only one ciphertext from each garbled gate while remaining the cost of OSN phase unchanged. However, the influence of 2PC phase in [6] on overall communication cost is already quite low (see Table 7 ). Reducing the bandwidth use in the 2PC phase by 25% only results in less than 1% reduction in the total cost. For instance, given a circuit with 1024 gates, their optimization reduces the communication cost of the 2PC phase from 4,096 λ-bit strings to 3,072 of them, while the OSN phase cost remains 229,38 λ-bits. Therefore, the overall gain from their optimization for this setting is ∼0.4%.
Besides the bandwidth improvements, our protocol also slightly enhances [6] in terms of computation complexity. In 2PC phase, our scheme requires an additional cryptographic operation per gate in Ev procedure compared to the garbling scheme in [6] , resulting in additional 0.5N operations in total. On the other hand, our protocol requires N -less OTs in OSN phase, resulting in 2N -less symmetric encryptions by P 2 , assuming that the smaller input strings to an OT does not change the cost of the cryptographic operations involved. In fact, these are asymptotically low costs compared to the total computation requirements of both schemes since both require O(N lgN ) cryptographic operations.
The round complexity of our scheme does not differ from the 2-PFE scheme in [6] since our protocol still consists of a constant round OT extension scheme in OSN phase, and our half gate based garbling scheme in 2PC phase consists of the same number of rounds as in the basic garbling scheme used in [6] .
Conclusion
In this paper, we proposed an efficient and secure protocol for 2-PFE. The motivation behind our work is that the bandwidth of various channels is the main constriction for many secure computation applications, including the ones for PFE. Our optimization significantly improves Mohassel and Sadeghian's 2-PFE scheme [6] in both OSN and 2PC phases in terms of communication complexity. In particular, in OSN phase, our protocol reduces the number of required OTs and data sizes entering the protocol. In 2PC phase, our half gate based scheme garbles each non-output gate with three ciphertexts, and each output gate with two ciphertexts. All in all, our protocol improves the state-of-the-art by saving more than 40% of the overall communication cost.
We conclude with the following two open questions:
1. Although the 2-OEP protocol in [6] , which we utilize in our protocol, is quite efficient for many circuit sizes, fails to be so in large-sized circuits due to its O(glgg) complexity. This fact arises the following question: Can we have a 2-OEP protocol that has linear asymptotic complexity while also being efficient in small circuit sizes? 2. Our 2-PFE protocol permits only one gate functionality (e.g., NAND or NOR) in a boolean circuit. This yields another important future challenge: Can we have a gate hiding mechanism in 2-PFE schemes permitting the use of various gates in logic circuit representations?
