• It is well suited for identity based cryptography − has gained lot of importance in recent times
• As a natural consequence, implementations of pairings are also extremely important
• This paper broadly addresses design techniques of a pairing cryptoprocessor with high security level
Introduction cont…
• Cryptographic pairings are computed on elliptic or hyperelliptic curves − defined over suitably large finite fields − having small embedding degree [2, 3].
• The security of a pairing depends on the underlying algebraic curves and respective field types − Example of an128-bit secure pairing : η T pairing computed on a supersingular elliptic curve defined over F 2 1223 and having embedding degree k = 4.
• NIST recommendation: 128-bit symmetric security is essential beyond 2030 − it is of importance to explore the efficient implementation techniques of 128-bit secure pairings on different platforms.
4
[2] Hoffstein, J., Pipher, J., and Silverman, J. 
Existing works
• Hardware implementation of 128-bit secure pairings was introduced in 2009, individually by Kammler et al. [4] and Fan et al. [5] .
− Described hardware implementation techniques for computing 128-bit secure pairings over Barreto-Naehrig curves (BN curves) [6] .
• Thereafter, designs in [7, 8, 9, 10] are appeared in literature − computes 128-bit secure pairings in 2.3ms, 16.4ms, 3.5ms, and 1.07ms.
• High-speed software implementations reported in [11, 12] − compute 128-bit secure pairings in 0.832ms and 1.87ms.
Major contributions of the paper are:
It explores area-time tradeoff designs of Karatsuba multiplier over F 2 1223 field.
It further explores high speed architecture for computing η T pairing on supersingular elliptic curves.
It provides the first hardware implementation result of an 128-bit secure pairing on elliptic curves over characteristic two fields.
The proposed design achieves the fastest computation (190 µs) of an 128-bit secure pairing. Karatsuba multiplication is an efficient and popular technique for fields like F q m.
• It is a divide-and-conquer algorithm • An m-bit multiplication is divided recursively into several m/k-bit multiplications with small k ∈ {2, 3}.
The F 2 1223-Multiplier (Cont…) CHES 2011, Nara, Japan
Multiplication for k = 2 in F q m could be computed as :
An m-bit multiplication can be performed by: -three m/2-bit multiplications -four m-bit and two m/2-bit additions.
Implementation could be performed in several ways. The synthesis tool estimates 95324 LUTs.
− It is feasible to implement on a high-end single FPGA device − a full pairing hardware demands much more circuits than a single multiplier − may infeasible to put in a single FPGA.
It consists of a fully parallel 306-bit Karatsuba multiplier
• Nine 306-bit multiplications are performed in serial for computing a multiplication in F 2 1223
The computation is performed as: Multiplier Architecture • The operands of nine multiplications are stored into two sets of nine 306-bit parallel shift registers.
306-bit hybrid-parallel Karatsuba multiplier
• The registers are automatically reloaded by synchronous shift operations.
-ensures two correct operands at a 00 and b 00 registers.
• Multiplier latency: one clock cycle.
• Partial results of 1223-bit multiplication are accumulated accordingly.
(Algorithm is provided in Appendix)
• Latency of one 1223-bit multiplication: 10 clock cycles
Multiplier on FPGA CHES 2011, Nara, Japan Demands ≈30k LUT.
• affordable to implement on a medium range FPGA
Demands low resources : 16231 LUTs
Requires 27 serial use Latency of one 1223-bit multiplication : 151ns
• 2.5 times slower than 306-bit parallel multiplier
• The A⋅T value is 2.46.
-is 1.2 times higher than 306-bit parallel multiplier
Serial use of 306-bit parallel multiplier provides the most optimized design.
1223
CHES 2011, Nara, Japan
The pairing computation consists of two major operations 1. the non-reduced pairing (Miller's algorithm) 2. the final exponentiation
Main Features: Consists of a common datapath for both operations Adequate parallelism is applied to achieve high speed
Major difference with existing architecture of [13]: Contrast to two separate coprocessors current design has one processing unit for both operations. Final Exponentiation : -the output of Miller's algorithm is raised to the power of (2 2446 -1)(2 1223 -2 612 + 1).
-powering G = g 0 + g 1 u + g 2 v +g 3 uv in is easy : -further one inversion followed by one multiplication in .
-in total, cost of final exponentiation is : 98M + 1842S + 135A.
-the proposed cryptoprocessor computes final exponentiation in 2922 clock cycles.
Total clock cycle count for computing an 128-bit secure η T pairing is 47610 on our proposed cryptoprocessor.
Experimental results
• The whole design has been done in Verilog (HDL).
• Results from the place-and-route report of Xilinx ISE Design Suit is shown here:
• it finishes computation of one 128-bit secure η T pairing in 190 µs on a Virtex-6 FPGA. • Pairing cryptoprocessor o A common datapath for both non-reduced pairing and final exponentiation. o Reduces the overall logic cells o It computes η T pairing in characteristic-two field with higher security (128:105) in half area. o it achieves eight times speedup and provides the best area * time product compared to the existing designs.
