Experimental Comparison of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves Cryptography by Gallin, Gabriel et al.
Experimental Comparison of Crypto-processors
Architectures for Elliptic and Hyper-Elliptic Curves
Cryptography
Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon
To cite this version:
Gabriel Gallin, Arnaud Tisserand, Nicolas Veyrat-Charvillon. Experimental Compari-
son of Crypto-processors Architectures for Elliptic and Hyper-Elliptic Curves Cryptogra-
phy. CryptArchi: 13th International Workshops on Cryptographic Architectures Embed-




Submitted on 11 Sep 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.

Experimental Comparison of Crypto-processors
Architectures for Elliptic and Hyper-Elliptic Curves
Cryptography
Gabriel GALLIN, Arnaud TISSERAND and
Nicolas VEYRAT-CHARVILLON
CNRS – IRISA – University Rennes 1 – CAIRN
HAH Project
13th CryptArchi Workshop: June 29-30th, 2015
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1 Context & Motivations
2 Proposed Crypto-processor(s)
3 Experiments & Comparisons
4 Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 2 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1 Context & Motivations
2 Proposed Crypto-processor(s)
3 Experiments & Comparisons
4 Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 2 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Asymmetric Cryptography: ECC and HECC
Cryptographic primitives for protocols such as digital signature,
exchange of secret keys and some specific encryption schemes
Elliptic Curve Cryptography (ECC):
- Actual standard for public key crypto-systems
- Better performance and lower cost than RSA
Hyper-Elliptic Curve Cryptography (HECC):
- Evolution of ECC focusing a larger set of curves
- Studied for future generations of asymmetric crypto-systems
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 3 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Operations for (H)ECC
HAH Project, IRISA–IRMAR
Hardware and Arithmetic for Hyperelliptic
Curves Cryptography


























E : y2 = x3 + 4x + 20 over GF(1009)
Points on E : P, Q= (x , y) or (x , y , z)
Coordinates: x , y , z ∈ GF(·)
GF(p), GF(2m), t : 160–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operation
for i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)
P = DBL(P)
Point addition/doubling operations
sequence of finite field operations
DBL: v1 = z21 , v2 = x1 − v1, . . .
ADD: w1 = z21 ,w2 = z1 × w1, . . .
GF(p) or GF(2m) operations
operation modulo large prime (GF(p))
or irreducible polynomial (GF(2m))
2. Side Channel Attacks (SCAs)
DBL DBL DBL DBL DBL DBLADD ADD







I Differential analysis (statistics)
I Templates and learning
3. Protections & Counter-Measures Against SCAs
I Uniform comp. durations




I Add noise (!)
















Random recoding: ∀i [Ri(k)]P = [k ]P
4. From ECC to HECC






























































































































































































































































































































































































































































































































































































































































Cost: 38M + 6S
Examples of computation expressions for projective coordinates
5. HAH Project Objectives
I Efficient algorithms and representations for HECC
I HECC protections against SCAs (passive and active)
I Fast, low-power and secure hardware implementations (open
source hardware code and programming tools)
I Intensive security evaluation using our SCA setup

























I Arithmetic Units (AUs): ±, ×, ÷ over GF(p)/GF(2m)
various configurations (area vs speed, internal protection)
I Various key recoding methods (and dedicated units)
I Configuration: field size, internal word size, #AUs, type(AUs)
I Circuit/architecture level protections


















8. Implementation Results on FPGA
XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word size w = 32 bits
Recoding units:
Recoding BIN NAF-2 NAF-3 NAF-4
area slices (FF/LUT) 565 (1321/1461) 570 (1340/1479) 571 (1344/1495) 503 (1348/1489)
freq. (MHz) 225 228 237 217
Area/speed trade-offs for ECC and HECC configurations:
#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.
ECC 1 2 503 (1348/1489) 217 626 (1450/1643) 230 694 (1649/1891) 211
2 2 689 (1744/1894) 219 754 (1948/2208) 234 931 (2345/2712) 220
3 2 809 (2146/2245) 205 942 (2449/2704) 222 1105 (3046/3436) 222
HECC 1 2 522 (1344/1405) 228 520 (1434/1535) 217
2 2 634 (1746/1786) 226 689 (1926/2055) 220 area freq.
4 2 852 (2552/2531) 201 917 (2912/3045) 195 slices (FF/LUT) MHz
8 2 1347 (4145/3882) 204 1601 (4865/4928) 209
9. Algorithms and Architecture Impacts on SCAs
Activity traces from CABA1 simulations (after filtering) for several






















































































































1 Cycle Accurate Bit Accurate (i.e. simulations close to real power measurements)
http://h-a-h.inria.fr/
Metric for algorithms efficiency: number of multiplications (M) and squares (S) in Fp
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 4 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
ECC versus HECC
size of Fp & scalar ADD DBL source
ECC `ECC 12M + 2S 7M + 3S [1]
HECC `HECC ≈ 12`ECC 40M + 4S 38M + 6S [3]
ECC:
- Size of Fp and scalar 2× larger
- Simpler ADD and DBL operations
HECC:
- Smaller Fp and scalar
- More operations in Fp for ADD and DBL
In theory, HECC naturally better than ECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 5 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Objectives
Design hardware crypto-processors based on customisable architecture
Implementation of small circuits on FPGA and ASIC
Study various trade-offs between:
- Computation speed
- Area cost
- Energy consumption and protection against physical attacks
Experimental comparisons of different architecture configurations
→ choice of the architecture parameters for the crypto-processor
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 6 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1 Context & Motivations
2 Proposed Crypto-processor(s)
3 Experiments & Comparisons
4 Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 6 / 18




















25-bit instruction address control signals
scalar
word digit w-bit data word
Customizable number of arithmetic units over Fp: ±, ×, ÷
→ nM multipliers of size nB
w : size of data words
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 7 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Architecture Parameters
w = 32 bits, for small processors
1 adder/subtracter (±), small and fast
1 inverter (÷), sufficient for the computations
nM multipliers (×):
- Based on Montgomery algorithm for modular multiplication [5]
- nB : number of parallel active words in the multiplier
- 3-stage pipeline
Classical key recoding techniques from literature:
→ standard binary, window λNAF methods with λ ∈ {2...4}
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 8 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1 Context & Motivations
2 Proposed Crypto-processor(s)
3 Experiments & Comparisons
4 Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 8 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Details of the Experiments
Objective: compare versions of the processor for various (nM, nB)
Implementation on Xilinx Spartan 6 LX75 FPGA
No DSP blocks (for ASIC compatibility)
Design tools:
- VHDL and assembly code generation: Python scripts
- Design implementation: Xilinx design environment ISE 14.6
- Simulations of complete scalar multiplications with Modelsim
Theoretical validation with SAGE (with more than 10k vectors)
Implementation: translate, map, place and route of full processor
Same optimisation efforts for ECC and HECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 9 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Impact of Key Recoding Unit
Various recoding techniques proposed to reduce the number of curve
operations:
- BIN: standard binary from left to right
- NAF: non-adjacent form
- λNAF: window methods with λ ∈ {3, 4}
Implementation results for an ECC processor (nM = 1, nB = 1)
recoding BIN NAF 3NAF 4NAF
area, slices (FF/LUT) 517 (1347/1433) 536 (1366/1445) 560 (1370/1454) 547 (1374/1460)
frequency, MHz 229 234 235 231
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 10 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion





M nB = 1 nB = 2 nB = 4
area freq. area freq. area freq.




1 3 547 (1374/1460) 231 573 (1476/1625) 233 673 (1674/1875) 233
2 3 722 (1776/1903) 220 811 (1979/2210) 227 942 (2377/2701) 220
3 3 810 (2174/2236) 221 915 (2480/2698) 215 1130 (3077/3430) 214
4 3 952 (2569/2656) 215 1100 (2977/3282) 217 1512 (3771/4293) 216





1 4 514 (1336/1374) 235 549 (1434/1513) 234
2 4 646 (1716/1783) 220 737 (1912/2055) 234
3 4 732 (2092/2075) 224 826 (2386/2485) 225
4 4 870 (2476/2424) 218 1022 (2868/2987) 214
5 4 976 (2865/2773) 219 1115 (3355/3465) 210
6 4 1089 (3233/3092) 203 1240 (3821/3908) 208
7 4 1145 (3601/3426) 213 1372 (4287/4365) 205
8 4 1281 (3981/3809) 191 1552 (4765/4890) 183
9 4 1379 (4363/4051) 202 1691 (5245/5277) 199
10 4 1543 (4739/4435) 196 1856 (5719/5801) 198
11 4 1547 (5114/4750) 189 1936 (6192/6240) 198
12 4 1738 (5499/5128) 191 2100 (6675/6771) 188
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 11 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Results for ECC (256 bits) and HECC (128 bits) (2/2)
Average computation time (ms) for 50 [k]P:
nB
nM
1 2 3 4 5 6 7 8 9 10 11 12
HECC
1 15.6 8.6 5.7 4.7 3.9 3.7 3.3 3.6 3.4 3.5 3.6 3.6
2 11.9 6.2 4.5 3.6 3.2 2.8 2.8 3.0 2.7 2.7 2.8 2.9
ECC
1 28.1 15.3 12.4 12.4 12.7
2 17.7 9.6 8.3 8.0 8.4
4 11.1 6.2 5.4 5.1 5.3
Standard deviation for 1000 [k]P:
configuration ECC (1,1) ECC (3,4) HECC (1,1) HECC (6,2)
average time [ms] 28.2 5.3 15.5 2.8
std. deviation [ms] 0.289 0.056 0.324 0.045
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 12 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion


















































G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 13 / 18











































1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2
(nM, nB)
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 14 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Comparison with Literature
Source FPGA
area freq. [k]P duration
slices / DSP MHz ms
ECC 1,2
Spartan 6
573 / 0 233 17.7
ECC 1,4 673 / 0 233 11.1
ECC 2,4 942 / 0 220 6.2
ECC 3,4 1 130 / 0 214 5.4
[4]
Virtex-5 1 725 / 37 291 0.38
Virtex-4 4 655 / 37 250 0.44
[2] Virtex-4
13 661 / 0 43 9.2
20 123 / 0 43 7.7
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 15 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Summary
1 Context & Motivations
2 Proposed Crypto-processor(s)
3 Experiments & Comparisons
4 Conclusion & Perspectives
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 15 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Conclusion
Implementation of ECC/HECC processors ⇒ cost/performance
trade-offs
We are able to select the appropriate number of multipliers and their
size
Experimental results: HECC always has better performance compared
to ECC
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 16 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Perspectives
Improve arithmetic units (especially add DSP blocks)
Study resistance to physical attacks (SCA1, faults injection) of
processors with good time/area trade-offs
1Side Channel Attacks
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 17 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
This work is partly funded by
PAVOIS project (ANR-12-BS02-002-01) http://pavois.irisa.fr/
HAH project http://h-a-h.inria.fr/
Thank you for your attention
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 18 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
References
[1] D. J. Bernstein and T. Lange.
Explicit-formulas database.
http://hyperelliptic.org/EFD/.
[2] S. Ghosh, M. Alam, D. Roychowdhury, and I.S. Gupta.
Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks.
Computers and Electrical Engineering, 35(2):329–338, March 2009.
[3] T. Lange.
Formulae for Arithmetic on Genus 2 Hyperelliptic Curves.
Applicable Algebra in Engineering, Communication and Computing, 15(5):295–328, February 2005.
[4] Y. Ma, Z. Liu, W. Pan, and J. Jing.
A high-speed elliptic curve cryptographic processor for generic curves over GF(p).
In Proc. 20th International Workshop on Selected Areas in Cryptography (SAC), volume 8282 of LNCS, pages 421–437.
Springer, August 2013.
[5] P. L. Montgomery.
Modular multiplication without trial division.
Mathematics of Computation, 44(170):519–521, April 1985.
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 19 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Implementation Flow
HAH Project, IRISA–IRMAR
Hardware and Arithmetic for Hyperelliptic
Curves Cryptography


























E : y2 = x3 + 4x + 20 over GF(1009)
Points on E : P, Q= (x , y) or (x , y , z)
Coordinates: x , y , z ∈ GF(·)
GF(p), GF(2m), t : 160–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operation
for i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)
P = DBL(P)
Point addition/doubling operations
sequence of finite field operations
DBL: v1 = z21 , v2 = x1 − v1, . . .
ADD: w1 = z21 ,w2 = z1 × w1, . . .
GF(p) or GF(2m) operations
operation modulo large prime (GF(p))
or irreducible polynomial (GF(2m))
2. Side Channel Attacks (SCAs)
DBL DBL DBL DBL DBL DBLADD ADD







I Differential analysis (statistics)
I Templates and learning
3. Protections & Counter-Measures Against SCAs
I Uniform comp. durations




I Add noise (!)
















Random recoding: ∀i [Ri(k)]P = [k ]P
4. From ECC to HECC






























































































































































































































































































































































































































































































































































































































































Cost: 38M + 6S
Examples of computation expressions for projective coordinates
5. HAH Project Objectives
I Efficient algorithms and representations for HECC
I HECC protections against SCAs (passive and active)
I Fast, low-power and secure hardware implementations (open
source hardware code and programming tools)
I Intensive security evaluation using our SCA setup

























I Arithmetic Units (AUs): ±, ×, ÷ over GF(p)/GF(2m)
various configurations (area vs speed, internal protection)
I Various key recoding methods (and dedicated units)
I Configuration: field size, internal word size, #AUs, type(AUs)
I Circuit/architecture level protections


















8. Implementation Results on FPGA
XC6SLX75 FPGA, GF(p), 256-bit ECC or 128-bit HECC, internal word size w = 32 bits
Recoding units:
Recoding BIN NAF-2 NAF-3 NAF-4
area slices (FF/LUT) 565 (1321/1461) 570 (1340/1479) 571 (1344/1495) 503 (1348/1489)
freq. (MHz) 225 228 237 217
Area/speed trade-offs for ECC and HECC configurations:
#mult. BRAM mult. 1 col. mult. 2 col. mult. 4 col.
ECC 1 2 503 (1348/1489) 217 626 (1450/1643) 230 694 (1649/1891) 211
2 2 689 (1744/1894) 219 754 (1948/2208) 234 931 (2345/2712) 220
3 2 809 (2146/2245) 205 942 (2449/2704) 222 1105 (3046/3436) 222
HECC 1 2 522 (1344/1405) 228 520 (1434/1535) 217
2 2 634 (1746/1786) 226 689 (1926/2055) 220 area freq.
4 2 852 (2552/2531) 201 917 (2912/3045) 195 slices (FF/LUT) MHz
8 2 1347 (4145/3882) 204 1601 (4865/4928) 209
9. Algorithms and Architecture Impacts on SCAs
Activity traces from CABA1 simulations (after filtering) for several






















































































































1 Cycle Accurate Bit Accurate (i.e. simulations close to real power measurements)
http://h-a-h.inria.fr/
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 20 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
Assembly Language: a Basic Example




















1 read fu mul 0, 0, 1 read operands a and b
2 launch fu mul 0 launch ab
3 read fu mul 1, 3, 4 read operands d and e
4 launch fu mul 1 launch de
5 wait fu mul 0 wait until the end of ab
6 write fu mul 0, 5 write the result of ab
7 set OPMODE, 0 set addition (+) mode
8 read fu add sub 0, 5, 2 read ab and operand c
9 launch fu add sub 0 launch (ab) + c
10 wait fu mul 1 wait until the end of de
11 write fu mul 1, 6 write the result of de
12 wait fu add sub 0 wait until the end of (ab) + c
13 write fu add sub 0, 5 write the result of (ab) + c
14 read fu add sub 0, 5, 6 read (ab) + c and de
15 launch fu add sub 0 launch ((ab) + c) + (de)
16 wait fu add sub 0 wait until the end of ((ab) + c) + (de)
17 write fu add sub 0, 5 write ((ab) + c) + (de)
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 21 / 18
Summary Context & Motivations Crypto-processor(s) Experiments & Comparisons Conclusion
FPGA characteristics
FPGA Spartan 6 Virtex-4 LX200 [2] Virtex-5 LX110T [4] Virtex-4 LX100 [4]
number of slices 11662 89088 17280 49152
number of FF 93296 178176 69120 98304
number of LUT 46648 178176 69120 98304
G.Gallin - A.Tisserand - N.Veyrat-Charvillon Comparison of Architectures ECC/HECC CryptArchi, Jun. 29-30, 2015 22 / 18
