Abstract. This paper deals with the differential power attack on a pairing cryptoprocessor. The cryptoprocessor is designed for pairing computations on elliptic curves defined over finite fields with large prime characteristic. The work pinpoints the vulnerabilities of such pairing computations against side-channel attacks. By exploiting the power consumptions, the paper experimentally demonstrates such vulnerability on FPGA platform. A suitable counteracting technique is also suggested to overcome such vulnerability.
Introduction
Bilinear pairing or pairing is a new and increasingly popular way of constructing cryptographic protocols. This has resulted in the development of pairing based schemes such as identity based encryption (IBE) which are ideally used in identity aware devices. The security of such devices leads to the security of pairing computations. In the last decade, an increasingly popular form of attack known as side-channel attack (SCA) [3, 4] , which exploits the weakness in implementations, have developed. SCA breaks a cryptosystems by analyzing the information that could be measured through some covert channel of a cryptoprocessor like : power consumption, time, electromagnetic radiation, fault, etc.
Pairing can be computed on different characteristic fields like binary (F 2 m ), trinary (F 3 m ), and large prime (F p ). The security of pairing computations over first two fields against differential power analysis (DPA) attack have been described in [9] and [5] , respectively. However, security analysis of pairing computations on prime fields against side-channel attack has not been considered before.
This paper explores the side-channel vulnerability of pairing computations on FPGA platform. One of the popular pairing friendly elliptic curves defined over F p is the Barreto-Naehrig curve (BN curve) [11] . A dual-core pairing cryptoprocessor for BN curves has been developed on FPGA platform. The paper proposes an optimized parallel scheduling of underlying finite field operations for Tate pairing computations by the cryptoprocessor. It further observes the mathematical formula of different steps of the pairing computation and pinpoints the vulnerability against side-channel attacks. The paper then describes a differential power analysis (DPA) technique based on such vulnerability. The actual DPA attack has been mounted on FPGA platforms which ascertains the secret parameter of pairing computation. The paper then proposes a suitable computation technique for counteracting the above vulnerability.
The paper is organized as follows: section 2 provides a mathematical background of pairing computation technique. The description of pairing cryptoprocessor over prime field is given in section 3. The vulnerability of pairing computation over prime fields is pointed out in section 4. The proposed DPA attack and its counteracting technique is described in section 5. The paper is concluded in section 6.
Mathematical Background
Pairing is a bilinear map which is performed on a pair of elements of a group (say G 1 ) to an element of another group (say G 3 ). Pairings for cryptographic applications use an additive group defined over elliptic or hyperelliptic curves as G 1 and a multiplicative group defined over an integer field as G 2 [7] . The mapping also follow two important properties called bilinearity and non-degeneracy. Sometimes the pairing is computed on two elements from two different additive groups (say G 1 and G 2 ) and it maps to an element of a multiplicative group G 3 . The groups G 1 and G 2 are in general formed by an elliptic curve over F q and F q k , where k is also known as embedding degree of the elliptic curve. The security of a pairing is based on the difficulty to solve the discrete logarithm problem in G 1 , G 2 , and G 3 .
The computation efficiency of such bilinear map is also an important factor for cryptographic applications. Cryptographic pairings are efficiently computed by Miller's algorithm [1, 2] which is shown in Alg. 1. More specifically this algorithm shows the computation of Tate pairing. Several optimizations of this algorithm have been presented in [8] . The resulting algorithm proposed in [8] is called BKLS algorithm for Tate pairing computation. Other pairings like ate, R-ate are computed by similar way using different parameters other than r and by interchanging the input points [13] .
The underlying elliptic curve plays an important role for achieving computation efficiency and security of a pairing computation. Active research is going on for finding out such a pairing-friendly elliptic curves. One of the most popular pairing-friendly elliptic curves is known as Barreto-Naehrig curves (BN curves) [11] . The BN curve is defined over a large prime field with embedding degree 12. Thus G 1 and G 2 in Alg. 1 are additive elliptic curve groups defined over F p and F p 12 , respectively. The pairing value t r (P, Q) = f ∈ G 3 , where G 3 is a multiplicative integer group defined over F p 12 . For achieving 128-bit security the BN curve is defined over a 256-bit prime field.
Input: P ∈ G1 and Q ∈ G2. Output: tr(P, Q).
Algorithm 1:
Computing the Tate pairing.
The BN curves also admit a sextic twist [13] , which means that the point Q in Alg. 1) is mapped on a point Q ′ defined over F p 2 . Thus, the line functions l T,T (Q) and l T,P (Q) is computed over F p 2 instead of F p 12 . Value of the line functions are represented as :
, and a quadratic non-residue W over F p 2 . The Miller function f is computed over F p 12 , which is represented as :
, and f · l T,P (Q) are performed on F p 12 . Whereas all other computations are performed on F p and F p 2 .
The detailed procedure of pairing computation including the final exponentiation on BN curve is described in [13] and [14] . Another efficient way of computing final exponentiation is described in [15] . We use Jacobian coordinate systems for performing elliptic curve operations, where a point (X, Y, Z) corresponds to the point (x, y) in affine coordinates with x = X/Z 2 and y = Y /Z 3 .
Pairing Crytoprocessor (PCP)
The major operations for a pairing computation are point doubling (PD), point addition (PA), line computation (l(Q)), f 2 , and f · l(Q). In case of Tate pairing on BN curve, the PA and PD are performed in F p . Similarly, the operation l(Q) is performed in F p 2 while the other two operations are performed in F p 12 . However, the operations in these extension fields consist of a set of operations in underlying F p .
The current work explores the side-channel vulnerabilities of a pairing cryptoprocessor (PCP). Therefore, instead of designing a new architecture from the scratch, we implement the pairing cryptoprocessor that was proposed in [16] on FPGA platform. The work first implement a programmable core for computing all necessary F p operations. Based on this programmable core we design a cryptoprocessor for pairing computation on FPGA platform. The proposed design consists two programmable cores which exploit the parallelism of Miller's algorithm. Each of the programmable cores can perform operations on F p and F p 2 .
We follow the formula and algorithms for the computation of asymmetric Tate pairing that are given in [13] . The major steps in pairing algorithm (Alg. 1) are the Miller function and the final exponentiation. The Miller function consists of two major steps, namely : doubling step and addition step. Here, we discuss the computation of above steps for Tate pairing over BN curve on our dual-core PCP.
The Tate pairing (t r ) over BN curve takes input points P and Q over F p and F p 2 , respectively. The parameter r is a 256-bit prime of Hamming weight 91. Thus, the Miller algorithm runs for 255 iterations having 255 doubling steps and 90 addition steps. There are sufficient independent operations within the doubling and addition steps which can be performed in parallel. Our dual-core PCP consists of a fixed number of functional units. Therefore, an optimization can be done based on the available functional units and the operations. In the following subsections, we describe an optimized scheduling of above steps on proposed PCP.
Computation of Doubling Step
The doubling step consists of the following computations.
• The point doubling (2T ) operation.
• The computation of tangent line at point T (l T,T (Q)).
• The squaring of Miller function (f 2 ).
• The multiplication of Miller function with line function (f
The computation of 2T , l T,T (Q), and f 2 are performed in parallel on our PCP. In Jacobian coordinates the formulae for doubling a point
. In case of Tate pairing computation on BN curves, the parameters {x, y} ∈ F p 2 and {X, Y, Z, X 3 , Y 3 , Z 3 } ∈ F p . Let us assume that x and y are represented as x 0 + x 1 U and y 0 + y 1 U, where {x 0 , x 1 , y 0 , y 1 } ∈ F p and U is an indeterminant. The above operations are performed by one of the programmable cores in the dual-core PCP by following way.
In the above scheduling nonlinear F p operations are performed in the instructions 1, 2, 8, 10, 13, and 14. If we assume that F p squaring (s) ≈ F p multiplication (m) then the cost of above operations is 6m on a programmable core in our dual-core PCP. At the same time other core starts the computation of
The equivalent representations of f are :
The computation of c = f 2 is performed in F p 12 using complex method by following way.
where v, c 0 , c 1 are in F p 6 and β is a quadratic non-residue in F p 6 . It requires two F p 6 multiplications. Now, one F p 6 multiplication is performed in the tower field F (p 2 ) 3 using Karatsuba technique by six multiplications in F p 2 . Let us consider that an element a i ∈ F p 2 is represented as : a i0 +a i1 U, a ij ∈ F p . The computation of v = f 0 · f 1 on a programmable core is as follows: The result v ∈ F p 6 is represented as :
2 , where v ij ∈ F p . In the above computation, steps 1, 2, 3, 6, 13, 20 perform multiplications in F p 2 . Thus the cost of v = f 0 · f 1 is 6m, which is computed in parallel with 2T , l T,T (Q) by the proposed PCP.
The second F p 6 multiplication, i.e., the computation of (f 0 + f 1 )(f 0 + βf 1 ) is performed by both the programmable cores, which costs only 3m in the PCP. Therefore, the total cost of computing 2T , l T,T (Q), and f 2 by the PCP is 9m. The l(Q) is represented as :
The computation of f ·l(Q) is performed in the tower field F ((p 2 ) 3 ) 2 by following way.
The top most extension is quadratic. Thus the computation of f · l(Q) is done by three F p 6 multiplications, which are identified as :
One multiplication in F p 6 using Karatsuba method requires 18 F p multiplications. However, due to the sparse representation of l(Q) the cost of computing t 1 i , 1 ≤ i ≤ 3 is lesser than the actual costs of three F p 6 multiplications. Both the equations for t 1 1 and t 1 3 require only 14 F p multiplications. In our parallel cryptoprocessor the above two equations are computed in parallel on two programmable cores, which costs 5m. The computation of t 1 2 requires only nine F p multiplications, which is performed on both the cores and it costs only 2m. Therefore, the computation of f ·l(Q) requires 37 F p multiplications, which costs only 7m in our PCP. Therefore, the total cost for computing the doubling step (the computation of 2T, l T,T (Q), f 2 , and f ·l(Q)) of the Miller algorithm for Tate pairing on BN curve is 9m + 7m = 16m.
Computation of Addition Step
The addition step consists of the computations of T + P , l T,P (Q), and f · l T,P (Q). The formulae for mixed Jacobian-affine addition are the following: if T = (X 1 , Y 1 , Z 1 ) is in Jacobian coordinates and P = (X 2 , Y 2 ) is in affine coordinates, then T + P = (X 3 , Y 3 , Z 3 ) where
During the addition step of Miller algorithm we compute the above operations in parallel on both cores. There are limited independent operations in this step. Therefore, there are scopes for optimizing the scheduling of operations on F p arithmetic units for reducing the additional registers and related wiring. The respective scheduling is shown here.
In the above scheduling, the nonlinear operations (multiplication and squaring) in F p are performed in steps 1, 2, 4, 5, and 8. Thus, the cost of computing T + P , l T,P (Q) is 5m in the PCP. This computation is followed by f · l(Q), which costs 7m. Therefore, the cost for evaluating the addition step is 5m + 7m = 12m in the PCP.
Computation of Final Exponentiation
The final exponentiation is computed by following way. It follows the optimization to factor (p 12 − 1)/r into three parts [14] and compute f (p 12 −1)/r as :
The computation is done by following way:
where z is a BN parameter and we choose z = 6000000000001F2D (in hexadecimal). Table 1 lists the operation costs of final exponentiation on the PCP. The power of (p
2 is an easy exponentiation, which is performed by a conjugation (Frobenius) and a division [15, 10] . The operation f
Thus, f p 4 Side-channel Vulnerability
Page and Vercauteren [5] presented SPA and DPA attacks on the pairing computations performed by the Duursma-Lee algorithm [6] and the BLKS algorithm [8] over F 3 m . The power consumption attack on η T pairing computation over F 2 m is described by Kim et al. in [9] . However, the same in case of F p has not been studied so far. This section investigates the security of pairing computations over F p against power consumption attacks.
Weakness of Pairing Computations in F p
In the decryption step of identity-based encryption schemes, a dominant operation is e(U, S ID ), where S ID is the fixed secret key, and U is a part of a ciphertext [17] . In this case side-channel attacks may try to extract the secret key from the pairing computation by repeatedly manipulating U . The Tate pairing over F p consists of elliptic curve group operations (ECD and ECA), the line functions, and the Miller function [13] . The line functions as per the definition provided by Chatterjee et al. [12] use both the public point U and private point S ID . The formula of line functions are based on the underlying F p primitives. During the addition step of Tate pairing computation the formula of the line function is l(x, y) [12] . In pairing based cryptographic schemes, the point T = (X 1 , Y 1 , Z 1 ) is an intermediate resultant point of current point doubling operation, the point U = (X 2 , Y 2 ) is used as a public parameter (it could be the plain texts or messages), and S ID = (x, y) is used as the private key. The resultant point (T +U ) is represented by (X 3 , Y 3 , Z 3 ). Therefore, in such a scheme the operations (x − X 2 ) and (y − Y 2 ) could be exploited through side-channel attacks for finding out the x and y-coordinates of the secret point.
Proposed DPA Attack
In this section, we investigate differential power analysis (or DPA) attack against the subtraction (x − X 2 ) used in the Tate pairing on elliptic curves in F p , where x is secret and X 2 is public and known to, or even chosen by, the attacker. The subtraction (x − X 2 ) in F p is computed by first computing S = x − X 2 and then the result is reduced (if required) by adding p with S. Let us assume that all operations are performed on 2's complement numbers. Therefore, the subtraction S = x−X 2 could be performed as:
, where k represents the bit length of operands (x, X 2 ) andX 2i corresponds to the 1's complement of X 2i . The subtraction is started from the least significant bit (or LSB) by computing sum and carry bits iteratively. The formula for i-th carry bit is: c i = x iX2i ⊕ x i c i−1 ⊕X 2i c i−1 . Similarly, the i-th sum bit is computed as:
The proposed DPA attack works by following way. The attacker first collects the power consumption traces of n number of randomly chosen public point U .
We consider the simplified Hamming weight model for power leakage [18] . In this model, power consumption depends on the Hamming weight of the data being processed. Thus, we can express the power consumption W as:
where H, ε, and η represent the Hamming weight of the intermediate data, the incremental amount of power for each extra 1 in the Hamming weight, and the noise, respectively. We assume that the average of noise η is zero. Let W be the power consumption associated with the subtraction operation (x − X 2 ). We start from the LSB and iteratively find all bits of the x-coordinate of the secret point S ID = (x, y). To recover the i-th bit of x, we guess that x i = 0 and divide power consumptions into two sets byX 2i ⊕ c i−1 .
Thus, the differential power consumption is:
If the guess is correct, then the averages of P 1 and P 0 are, ε(M + 1)/2 and ε(M − 1)/2, where M corresponds to the bit length of S. Thus, if ∆ > 0, we know that x i = 0; otherwise, the averages of P 1 and P 0 is ε(M − 1)/2 and ε(M + 1)/2. Thus, if ∆ < 0 then x i = 1. There should be a positive peak when x i = 0 and a negative peak when x i = 1.
In summary, since the subtraction operation (x − X 2 ) of line function in pairing computation is vulnerable to the proposed attack, we can recover x. Next, we can obtain the value of y-coordinate of the secret point S ID by solving the curve equation.
Mounting the DPA on FPGA Platform
We perform the actual DPA attack on aforementioned pairing cryptoprocessor (or PCP). The PCP is implemented on a customized FPGA board for power analysis. We put an one ohm resistor between the VCCint pin of the FPGA chip and the on board voltage regulator. We measure the current drawn through that resistor during pairing computation by a current probe. The specification of the probe is Tektronix current probe (serial number B014316). We use the probe with a TCPA300 power amplifier in standby mode. The measured power is displayed and stored in a Tektronix TDS5032B Digital Phosphor Oscilloscope. We develop software tools to automate the whole process for varying inputs. The power consumptions are measured in terms of mV which is varying around ±5mV . The power signal is sampled at 12.5M S/s.
We choose an x with x 0 = 0 and perform (x − X 2 ) for 2000 times with 2000 different randomly chosen X 2 . The respective power consumptions are stored in 2000 one dimensional vectors. Now we differentiate the the power vectors in two sets namely P 1 and P 0 . A vector will be in set P 1 ifX 20 ⊕ c −1 = 1; i.e., X 20 = 1.
Otherwise, the vector will be in set P 0 . For computing the differential power consumption we subtract the average of P 0 vectors (means) from the average of P 1 vectors. We say this differential power consumption vector as difference-ofmeans which is represented by ∆. Then we accumulate the samples of ∆ and plot it. The respective difference-of-means is depicted in Fig. 1(a) , which shows a positive peak as expected for x 0 = 0. The same experiment has been repeated for another x with x 0 = 1. The difference-of-means in this case is plotted in Fig. 1(b) . In this case the expectation of < P 1 − P 0 > is negative and we got the result as expected with 2000 random X 2 .
Above experimental result ensures that an attacker can easily mount the DPA attack on pairing computation over F p . After finding out the LSB, DPA can be performed for second LSB, and so on. The same power traces could be utilized for finding out all secret bits. The differentiation of power vectors into two sets depending on the current value of (X 2i ⊕ c i−1 ) upto the generation of the difference-of-means will be repeated for finding out each of the secret bits. Thus, above DPA attack iteratively finds out all bits of the x-coordinate of secret S ID . After obtaining the x-coordinate, the value of y-coordinate could be obtained easily by solving the underlying elliptic curve equation.
Proposed Counteracting Technique
In the pairing computation, the secret point is only used for computing the line functions. The formula of the line function during doubling step of the Miller algorithm over F p is as follows:
where T = (X, Y, Z) be the intermediate resultant point of Miller algorithm while 2T = (X 3 , Y 3 , Z 3 ) [12] . The formula of l T,T (x, y) is using the secret point S ID = (x, y) of identity based encryption (IBE) [17] . But, it does not use the public point U = (X 2 , Y 2 ). Therefore, this function could not be exploited by any side-channel attacks.
The second line function l T,P (x, y) is computed during the addition step of the Miller algorithm. In IBE scheme P is replaced by U . The formula of l T,P (x, y) is:
is the intermediate result of doubling step and (X 3 , Y 3 , Z 3 ) represents the addition result of T + U . In this line computation formula both public point U = (X 2 , Y 2 ) and private point S ID = (x, y) are used. The computation of l T,U (x, y) is the main weakness of pairing computation over F p against side-channel attacks. The DPA attack described above can easily find out the x and y-coordinates of private point S ID by exploiting the above formula.
The main drawback of the above formula is that the public and private parameters are directly involved to perform an F p operation. The side-channel attack thus exploit the respective F p operation for finding out the secret bits by manipulating public parameter U . To counter act on such computation against side-channel attacks it could be computed by following way. The above computation technique does not have any F p primitive which is performed on one public parameter and one private parameter. The attacker may try to exploit the power consumption of the cryptoprocessor during the computation of l T,P (x, y). The private parameter x in the above formula is multiplied with an unknown parameter (Y 2 Z 3 1 − Y 1 ). Therefore, no difference-of-mean can be computed for identifying the secret bits of x.
The second secret parameter y is multiplied with Z 3 in the modified computation of l T,P (x, y). The parameter Z 3 is computed by executing the formula Z 3 = Z 1 (X 2 Z 2 1 − X 1 ) which ensures Z 3 is unknown due to the unknown temporary point T (X 1 , Y 1 , Z 1 ). Therefore, no difference-of-mean value can be computed based on the specific bits of Z 3 for identifying the secret bits of y. Thus, the proposed counteracting technique protects both x and y coordinates of secret point S ID , which ensures the security of pairing computation against DPA attack.
Conclusion
This paper has demonstrated an optimized scheduling of Tate pairing computation over BN curve on a dualcore pairing cryptoprocessor. The computation cost for One Tate pairing achieving 128-bit security on FPGA platform is 35.3ms. The paper further analyzes the effect of covert power channel of the pairing cryptoprocessor against physical security. The paper has pinpointed the vulnerability
