Secure hardware design against side-channel attacks by Park, Jungmin
Graduate Theses and Dissertations Iowa State University Capstones, Theses andDissertations
2016




Follow this and additional works at: https://lib.dr.iastate.edu/etd
Part of the Computer Engineering Commons
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University
Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University
Digital Repository. For more information, please contact digirep@iastate.edu.
Recommended Citation
Park, Jungmin, "Secure hardware design against side-channel attacks" (2016). Graduate Theses and Dissertations. 15786.
https://lib.dr.iastate.edu/etd/15786
Secure hardware design against side-channel attacks
by
Jungmin Park
A dissertation submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Major: Computer Engineering
Program of Study Committee:








Copyright c© Jungmin Park, 2016. All rights reserved.
ii
DEDICATION
I would like to dedicate this thesis to my wife Mihyun and to my daughter Clare and to
my son Kevin and Kaden without whose support I would not have been able to complete this
work. I would also like to thank my friends and family for their loving guidance and financial
assistance during the writing of this work.
iii
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
CHAPTER 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
CHAPTER 2. SIDE-CHANNEL ANALYSIS ATTACKS . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Differential Power Analysis (DPA) Attack . . . . . . . . . . . . . . . . . . . . . 9
2.3 Profiling Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Na¨ıve Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Linear discriminant analysis . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Quadratic discriminant analysis . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 Support vector machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Side-channel Based Disassembler of AVR microcontroller . . . . . . . . . . . . . 16
2.4.1 Preliminary Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
CHAPTER 3. SECURITY METRICS . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Basic Definition and Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iv
3.3 Power Model Using Renewal Process and Linear Regression . . . . . . . . . . . 29
3.3.1 Renewal process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Graph based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 SCA Security Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Recognition Rate Using Maximum Likelihood Estimation . . . . . . . . . . . . 43
3.6 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
CHAPTER 4. SECURE LOGIC STYLE . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Sense Amplifed Based Logic (SABL) . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Wave Dynamic Differential Logic (WDDL) . . . . . . . . . . . . . . . . . . . . 54
4.4 t-private Private Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.1 Ishai’s t-private circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.2 The modified t-private circuit . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Design of Secure logic style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.1 Design of SABL-NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.2 Design of WDDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.3 Design of t-private logic cells . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5.4 Comparison of t-private NAND, SABL-NAND and WDDL-NAND . . . 65
4.5.5 SCA attacks of t-private logic circuit . . . . . . . . . . . . . . . . . . . . 67
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
CHAPTER 5. FPGA IMPLEMENTATION AND ASIC IMPLEMENTA-
TION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
v5.2.1 The tail recursive t-private circuit . . . . . . . . . . . . . . . . . . . . . 70
5.2.2 Mapping into k-LUTs with unlimited number of inputs . . . . . . . . . 72
5.2.3 Mapping into k-LUTs with limited number of inputs . . . . . . . . . . . 73
5.2.4 Implementation of t-private full adder . . . . . . . . . . . . . . . . . . . 74
5.3 ASIC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.1 t-private Logic synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.3 Technology Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.4 Verification of robustness . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Example : SBOX design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
CHAPTER 6. t-PRIVATE SYSTEMS: UNIFIED PRIVATE MEMORIES
AND COMPUTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Assumptions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3 t-Private Memory: Schemas, Architecture, and Analysis . . . . . . . . . . . . . 91
6.3.1 Original memory scheme without secrecy . . . . . . . . . . . . . . . . . 91
6.3.2 t-private memory scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.3 t-private memory scheme using a random matrix T . . . . . . . . . . . . 92
6.3.4 Hybrid memory scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4 New Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5 New Computable And t-private Logic Schema And Gates . . . . . . . . . . . . 102
6.5.1 AND operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5.2 OR operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5.3 NOT operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5.4 The perfect secrecy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.6 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
vi
CHAPTER 7. CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . 110
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
APPENDIX A. THE ADVANCED ENCRYPTION STANDARD [FIPS (2001)] 112
A.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.1.1 SubBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.1.2 ShiftRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.3 MixColumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.1.4 AddRoundKey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.1.5 Key Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
APPENDIX B. TOOL SCRIPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B.1 Setup (FreePDK45) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B.2 RTL Complier Tcl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.3 Encounter Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.3.1 Configuration file (encounter.conf) . . . . . . . . . . . . . . . . . . . . 122
B.3.2 tcl file (encounter.tcl) . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
vii
LIST OF TABLES
Table 1.1 Proposed Security Metrics and Solution at each Design Abstraction level 5
Table 2.1 Successful recognition rate(SR) of instructions according to classifiers . 20
Table 2.2 SR of instructions using LS-SVM and QDA classifiers . . . . . . . . . . 22
Table 4.1 Secure logic style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Table 4.2 Comparison between t-private AND circuits . . . . . . . . . . . . . . . 61
Table 4.3 Power consumption of SABL NAND (45 nm process) . . . . . . . . . 62
Table 4.4 Power consumption of WDDL NAND (45 nm process) . . . . . . . . 64
Table 4.5 Power consumption of NAND2X1t1 (45 nm process) . . . . . . . . . 67
Table 4.6 Power consumption of AND2X1t1 (45 nm process) . . . . . . . . . . 67
Table 4.7 Comparison of t-private NAND, SABL-NAND and WDDL-NAND . . 67
Table 4.8 Successful recognition rate of t-private circuits using LS-SVM and QDA
classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Table 5.1 Area, power and delay estimation of each t-private logic cell after logic
synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Table 5.2 Power consumption of NAND2X1t1 (45 nm process) . . . . . . . . . 84
Table 5.3 Power consumption of AND2X1t1 (45 nm process) . . . . . . . . . . 84
Table 5.4 Power consumptions of NOR2X1t1 (45 nm process) . . . . . . . . . . 85
Table 5.5 Power consumption of OR2X1t1 (45 nm process) . . . . . . . . . . . 85
Table 5.6 Power consumption of XOR2X1t1 (45 nm process) . . . . . . . . . . 86
Table 5.7 Power consumption of XNOR2X1t1 (45 nm process) . . . . . . . . . 86
Table 5.8 Comparison of insecure and secure S-Box . . . . . . . . . . . . . . . . . 86
viii
Table 6.1 Variables used in this chapter . . . . . . . . . . . . . . . . . . . . . . . 90
Table 6.2 The storage overhead and the success probability of the 4 architectural
schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Table 6.3 Number of Random Bits Used for an AND Gate and for an N -gate Circuit107
Table 6.4 Hardware Implementation on FPGA . . . . . . . . . . . . . . . . . . . 107
Table A.1 ShiftRows: shift offsets for different block lengths . . . . . . . . . . . . 114
ix
LIST OF FIGURES
Figure 2.1 Side-channel analysis attacks . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 2.2 Separation of power traces of ADD and SUB . . . . . . . . . . . . . . . . 20
Figure 2.3 Kernal density estimation denpending on instructions at a specific sam-
pling point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 2.4 Hierarchical classification of registers and successful recognition rate . 21
Figure 2.5 LS-SVM vs QDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 3.1 Renewal process of logic network . . . . . . . . . . . . . . . . . . . . . 29
Figure 3.2 Renewal process caused by triggering two inputs . . . . . . . . . . . . . 33
Figure 3.3 Different transition counts according to logic gate and δ . . . . . . . . 33
Figure 3.4 Logic network graphs of basic logic gates . . . . . . . . . . . . . . . . . 35
Figure 3.5 Reduction of Logic network graph . . . . . . . . . . . . . . . . . . . . . 36
Figure 3.6 The failure probability PrF : Overlapping coefficient of two normal
distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.7 Successful recognition rate according to α (a) when Pr[Tc1 > Tc2 ] >
Pr[Tc1 > Tc2 ] (b) when Pr[Tc1 > Tc3 ] > Pr[Tc1 > Tc2 ] . . . . . . . . . . 48
Figure 3.8 Scattered plots and linear regression (βˆ = 0.085, αˆ = 1.05) of 1000
random samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Figure 3.10 Correlation Power Analysis attack of AES SBOX ( N = 1000 ) . . . . 50
Figure 3.11 Success probability according to the number of samples (N) . . . . . . 50
Figure 3.12 CPA attack of AES SBOX . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 4.1 Schematic of a n-type SABL cell . . . . . . . . . . . . . . . . . . . . . 54
Figure 4.2 Schematic of a combinational WDDL cell . . . . . . . . . . . . . . . . . 55
xFigure 4.3 The Ishai’s t-private circuits (t = 1). . . . . . . . . . . . . . . . . . . . 59
Figure 4.4 An AND-XOR network with a random bit. . . . . . . . . . . . . . . . . 60
Figure 4.5 An expanded AND-XOR network. . . . . . . . . . . . . . . . . . . . . . 60
Figure 4.6 Schematic of SABL-NAND gate . . . . . . . . . . . . . . . . . . . . . . 62
Figure 4.8 Input a = 0, b = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.9 Input a = 0, b = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.10 Input a = 1, b = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.11 Input a = 1, b = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.12 Waveform of SABL NAND gate . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.13 Schematic of WDDL-NAND gate . . . . . . . . . . . . . . . . . . . . . 64
Figure 4.15 Input a = 0, b = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.16 Input a = 0, b = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.17 Input a = 1, b = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.18 Input a = 1, b = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.19 Waveform of WDDL NAND gate . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.20 Schematic of NAND2X1t1 . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 4.21 Schematic of AND2X1t1 . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 5.1 Transformation into LUT-based t-private circuit . . . . . . . . . . . . . 73
Figure 5.2 Full adder cell schemetic . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 5.3 (t = 1)-private full adder cell schematic . . . . . . . . . . . . . . . . . . 75
Figure 5.4 LUT costs of various t-private adders . . . . . . . . . . . . . . . . . . . 75
Figure 5.5 Delay costs of various t-private adders . . . . . . . . . . . . . . . . . . 75
Figure 5.6 The design flow of the ASIC implementation . . . . . . . . . . . . . . . 79
Figure 5.8 Schematic of AND2X1t1 . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 5.9 Verilog description of AND2X1t1 . . . . . . . . . . . . . . . . . . . . . 82
Figure 5.10 Synthesized logic design . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 5.11 Layout of AND2X1t1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 5.12 The steps to create AND2X1t1 . . . . . . . . . . . . . . . . . . . . . . 82
xi
Figure 5.14 Peak currents of NAND2X1t1 . . . . . . . . . . . . . . . . . . . . . . 83
Figure 5.15 Powers of NAND2X1t1 . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 5.16 Distribution of powers and peak currents of NAND2X1t1 . . . . . . 83
Figure 5.17 Layout of the secure AES S-Box . . . . . . . . . . . . . . . . . . . . . . 84
Figure 6.2 The original memory scheme . . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 6.3 The t-private memory scheme . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 6.4 The t-private memory scheme with a random matrix . . . . . . . . . . 92
Figure 6.5 The hybrid memory scheme . . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 6.6 4 architectural memory schemes . . . . . . . . . . . . . . . . . . . . . . 92
Figure 6.8 The success probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Figure 6.9 The storage overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Figure 6.10 Comparison between t-private scheme, t-private scheme with a random
matrix and the hybrid scheme when p = 0.9, k = 128, n = 10, ti = 10 . 96
Figure 6.11 t-Private: (Left) Encoding; (Right) Decoding . . . . . . . . . . . . . . 96
Figure 6.12 The proposed memory scheme . . . . . . . . . . . . . . . . . . . . . . . 100
Figure 6.13 The success probability according to m reused random bits when p =
0.9, t = 91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure 6.15 The success probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 6.16 The number of random bits(t) when Psucc = 0.0078 . . . . . . . . . . . 102
Figure 6.17 Performance comparison between proposed scheme and t-private schemes102
Figure 6.18 An output of AND operation for the perfect secrecy . . . . . . . . . . . 106
Figure A.1 SubByte ( ) applies the S-box to each byte of the State . . . . . . . . 113
Figure A.2 ShiftRows ( ) cyclically shifts the last three rows in the State . . . . 114
Figure A.3 MixColumns( ) operates on the State column-by-column . . . . . . . . 115
Figure A.4 AddRoundKey( ) XORs each column of the State with a word from the
key schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
xii
ACKNOWLEDGEMENTS
I would like to take this opportunity to express my thanks to those who helped me with
various aspects of conducting research and the writing of this thesis. First and foremost, Dr.
Akhilesh Tyagi for his guidance, patience and support throughout this research and the writing
of this thesis. His insights and words of encouragement have often inspired me and renewed
my hopes for completing my graduate education.
xiii
ABSTRACT
Embedded systems such as smart card or IoT devices should be protected from side-channel
analysis (SCA) attacks. For the secure hardware implementation, SCA security metrics to
quantify robustness of the implementation at the abstraction level from the logic level to the
layout level against SCA attacks should be considered. In our design flow, the first security
test is executed at the logic level. If the implementation does not satisfy the threshold of the
SCA security metric based on Kullback-Leibler divergence, the module can be re-synthesized
with secure logic styles such as WDDL or t-private logic circuits. At the final security test, we
use the machine learning technique such as LDA, QDA, SVM and naive Bayes to check the
distinguishability of the side-channel leakage depending on inputs or outputs. These techniques
apply to an ASIC in characterizing the secret data leakage.
In this thesis, t-private logic circuits are implemented with the FreePDK45nm. The SCA
security metric as well as the delay and power consumption is characterized. All this charac-
terization data are stored in the standard liberty format(.lib) in order for general CAD tools
to use this file. The t-private logic package including the general digital logics can be exploited
for secure VLSI design. Also, various classifiers such as LDA, QDA, SVM or naive Bayes are
used to emulate real SCA environment. Based on this SCA simulator, the threshold of the
SCA security metric can be estimated and the security can be verified more accurately. The
secure logic cell package and SCA simulator support the methodology of the secure hardware
implementation.
1CHAPTER 1. INTRODUCTION
Most of modern electrical devices are connected through the Internet. Private information
and secret data go back and forth between devices and servers. If significant and secret infor-
mation such as usernames, passwords and credit cards is controlled freely by adversaries, the
vast monetary demage is caused. Information security has been an extensive issue of many IT
field. For the secure communications and protection of information, cryptography has played a
significant role and modern cryptography such as AES and RSA cannot be broken theoretically.
In disregard of contribution of the modern cryptography, electrical devices can leak information
through side-channels or physical channels that are unintended. Common side-channel attacks
use power/current at Vdd pin [Kocher et al. (1999)] or electromagnetic radiation [Quisquater
and Samyde (2001)] to reveal a secret key. The power side-channel attacks are based on the
fact that the power consumption depends on the intermediate values which are correlated to
both some controlled inputs and some secret data embedded in the crypto-block.
Differential power analysis (DPA) of side-channel attacks has been shown to be especially
effective in finding the secret key by exploiting correlation between the power consumption
and the processed data [Kocher et al. (1999)]. Since this attack needs little knowledge of the
implementation of the cryptographic algorithm and can be performed with relatively cheap
equipment, it poses a major threat to cryptographic devices such as smart cards or embedded
systems. The hypothetical power consumption model (or leakage model) of an adversary based
on the intermediate value which depends on the key is related to the measured power consump-
tion if the key guess is correct. The attack can succeed or fail based on the selected leakage
model. Assuming that power consumption depends solely on the number of switched bits or the
number of 1’s in the intermediate value, Hamming distance model or Hamming weight model is
chosen respectively [Alioto et al. (2010), Mangard et al. (2007), Messerges et al. (2002)]. How-
2ever, in a real circuit based on ASICs, the assumption that power is a function of Hamming
distance or Hamming weight derived from the secret may not hold. Instead, model profiles
(or templetes) may be a better choice at a higher cost for more complex modeling effort. An
even more powerful adversary is the Bayesian side-channel adversary using the template that
selects the key guess which maximizes the probability that key guess is correct given the leakage
probability density (argmaxk∗Pr[k
∗|l]) [Standaert et al. (2009)]. Side-channel attacks of the
Bayesian side-channel adversary should also be considered as major threats.
Many research efforts have targeted techniques to prevent side-channel attacks. The coun-
termeasures of DPA attacks are categorized into two groups: hiding and masking [Mangard
et al. (2007)]. The hiding countermeasures make the power consumption of cryptographic de-
vices independent of the intermediate values by making the power consumption random or
uniformly same for all data values. The masking countermeasures achieve the independence of
power consumption from the intermediate values by randomizing the intermediate values. This
also masks the logic behavior. But earlier countermeasures have been suitable for only specific
hardware implementation. For example, the method to randomize logic behavior should be
changed according to different hardware constraints such as the critical path, timing or power
consumption. The ad hoc approach causes the productivity of the secure design to be low.
Also, we do not know how much these countermeasures enforce DPA security. To the best of
our knowledge, no CAD tools that integrate such a DPA resistance computation and suggest
an appropriate hiding or masking countermeasure to improve a vulnerable design seem to exist.
New paradigm should be needed to satisfy both productivity and security.
1.1 Contribution
There are four main threads for the unified secure design methodology. First, security
against SCA attacks is included as a constrained resource along with delay and area for the
secure hardware implementation of the cryptographic system. The SCA security is quantified
using (1) the normalized variance metric (or the coefficient of variance) [Basel Halak (2013)],
(2) Kullback-Leibler divergence and [S. Kullback and R. A. Leibler (1951)] (3) the information
theoretic metric of the profiled power distribution [Mac et al. (2007)]. In our design flow, SCA
3vulnerability should be verified with these metrics at all implementation abstraction levels from
logic (or gate) to layout level. We estimate Kullback-Leibler divergence from the power distri-
bution gathered from the approximate and quick renewel process based logic level simulation.
The KL divergence is very related to vulnerability aginst side-channel attacks.
Once the SCA metric at the higher logic abstraction level is within safe bounds, the design
flow can enter the next abstraction level refinement. This abstraction refinement (as in logic
level to netlist level) introduces details that may develop new SCA vulnerabilities. Hence
an acceptable SCA metric value at higher abstraction layers still necessitates SCA metric
computation at lower levels. The mutual information metric is computed at the layout level
with multiple SPICE level circuit simulations. The acceptable thresholds for SCA security
metric are defined theoretically. If any combinational module has a value larger than the
threshold, it is flagged as a vulnerable module. Such a hierarchical filter not only results in
more efficient assessment of SCA vulnerabilities, the countermeasures can also be of variable
granularity to match the abstraction level (logic or netlist). Arguably, the corrective steps
taken at logic level are more effective even though the accuracy of the metric at that level is
lower.
The logic level filter uses the classical switching probability computation to estimate power
which depends on the secret data (key) or a correlated intermediate result. Even though simu-
lation based verification can be performed at the logic level, the probabilistic estimation method
for power is more efficient. Statistical Monte Carlo power estimation techniques [Najm (1994)]
are better suited than the BDD based power estimators [Sentovich et al. (1992), Monteiro
et al. (1997)] due to the need for model parametrization with secret key. The statistical power
estimation model is based on the fact that power consumption depends on the transition prob-
ability and capacitance of the output node of logic gates [Najm (1994)]. Since the transition
probability of the output node is influenced by input transition patterns, it can be modeled as a
normal distribution. The mean µˆ and standard deviation σˆ can be estimated through sampling
a large enough space of the input patterns and computing power over that input pattern. The
more distinguishable and identifiable power consumption is according to different inputs, the
more vulnerable is the SCA security. The SCA security metric can be computed as σˆ/µˆ. This
4analytical method can be applied to combinational circuits. Note that the SCA security metric
for multiple implementations of the same behavior can vary even though the logic level boolean
equations specifying the arithmetic function are the same. The normalized variance metric can
be used to compare SCA vulnerability of multiple implementations but it does not provide a
safety threshold to flag a vulnerable implementation. Instead, SCA metric using KL-divergence
divergence plays a critical role to distinguish vulnerable implementations. If the SCA security
metric of any computing block has a large value or is above a threshold, it should be reduced
significantly, possibly to zero, by the proposed resynthesis at the logic level.
Once the logic level design has an acceptable variance metric, It can be synthesized into
transistor level netlist. The information theoretic metric of mutual information can be com-
puted both at the transistor netlist level and physical (or layout) level. Mutual information
(I(K;L)) of the secret data (K) and the corresponding leakage (L) as the third SCA security
metric quantifies amount of information about the secret data in the leakage channel (power).
If the mutual information indicates that a significant fraction of n secret key bits are leaking
through L (power), the design needs to be reinforced.
The second thread consists of a design schema to reduce the SCA vulnerability at the
netlist level. This is done through the so called technology mapping or cell binding phase.
SCA secure versions of the t-private [Ishai et al. (2003)] cells as well as the sense amplified
based logic (SABL) and wave differential dynamic logic (WDDL) for AND, OR, NAND, NOR,
NXOR and XOR logic gates are to be provided in the technology library. These t-private cell
primitives are based on Ishai’s t-private circuits which are robust against the t-th order side-
channel (or probing) attacks [Ishai et al. (2003)]. They can be verified as SCA secure using our
SCA security metrics at all design abstraction levels. The parts of the cryptographic system
determined vulnerable by the KL divergence based SCA security metric at the logic level can
be synthesized with these t-private cells, SABL or WDDL cells.
Third, a t-private logic synthesis method is proposed in order to prevent side-channel
attacks at the logic (or gate) level. After logic synthesis, vulnerable sub-logic can be determined
through SCA security metrics. It should be synthesized into the following reduced area version
of t-private circuits. The boolean functions of insecure parts are represented by the exclusive-
5Table 1.1: Proposed Security Metrics and Solution at each Design Abstraction level
security metrics leakage estimation method solution
logic level KL divergence renewal process t-private logic synthesis
transistor level all simulation balance matching
physical(layout) level all simulation balance matching
OR sum-of-products (ESOP) and then the products are masked with random bits. The masked
products are replaced with t-private circuits. Exclusive-ORs are also replaced with t-private
XOR circuits. We call this t-private logic synthesis. Since t-private XOR and NXOR primitives
have significantly smaller area and better delay than the original t-private circuits, the ESOP
representation may have both area and delay advantages. Table 1.1 summarizes the proposed
security metrics and side-channel leakage estimation methods.
Finally, the fourth thread targets secure memory modules. Memories also leak information.
Private data including cryptographic keys are committed to the memory. This data-at-rest is
open to physical access based attacks. These attacks slice the silicon until individual transistors
are exposed by a Focused Ion Beam (FIB). An electron microscope is used to examine the
silicon. Halderman et al. [Halderman et al. (2008)] proposed ”cold-boot attack” which is a
method to measure a significant fraction of data stored in a powered-off memory (e.g. DRAM)
by cooling the chip to around −50◦C at which temperature the data will persist for several
minutes with minimal error. Ishai’s [Ishai et al. (2003)] t-private coding can be used for memory
as well. Recently, Valamehr et al. [Valamehr et al. (2012)] developed more general and more
efficient masking methods to prevent such memory attacks. However, their more efficient
memory coding methods require the private data-at-rest such as a key to be decoded before
it can be used in computation. We propose coding methods that are as efficient as Valamehr
et al. [Valamehr et al. (2012)] for memory coding, but at the same time can use the encoded
data-at-rest for computing in flight as is. We call such coding systems t-private systems.
1.2 Summary
The thesis is orgaized as the following chapters. Chapter 1 gives an introduction to the
background and contribution of the thesis.
6Chapter 2 presents the overview of side-channel analysis attacks. As an example, side-channel
based AVR diassembler is proposed.
Security metrics are proposed in Chapter 3. This chapter is based on the pulished papers in
VLSID 2016 [Park and Tyagi (2016)] and ISVLSI 2014 [Park and Tyagi (2014b)].
Chapter 4 presents secure logic styles such as t-private logic circuits, SABL and WDDL. These
secure logic cells are implemeted at the various abstract level (from the logic gate level to the
layout level). Also, the SCA vulnerablility of these secure logic style is verified by simulating
SCA attacks.
Chapter 5 presents the methodology of SCA secure FPGA and ASIC implementation. This
chapter is based on the published paper in HOST 2012 [Park and Tyagi (2012)]
Chapter 6 presents t-private memory and systems. Probing-resistant memories are focuced on.
This chapter is based on the published paper in SPACE 2014 [Park and Tyagi (2014a)]
In the final chapter 7, the thesis is concluded with a discussion on future work.
7CHAPTER 2. SIDE-CHANNEL ANALYSIS ATTACKS
2.1 Introduction
Common side-channel analysis attacks use a current path at Vdd or gnd pin or electromag-
netic radiation of a specific location in the chip to reveal a secret key. Power based side-channel
attacks are based on the observation of general CMOS switching characteristic that the power
consumption depends on input signals. Simple power analysis (SPA) attack [Kocher et al.
(1999)] is a technique to directly interpret power consumption measurements collected during
cryptographic operations. SPA attack requires detailed knowledge about the implementation of
the cryptographic algorithm executed by the device under attack. A skilled adversary monitors
only one trace or a few traces of power consumption during cryptographic operations and then
reveals the secret key. This scenario is not practical since it is very difficult to obtain detailed
information of the modern complex hardware implementation such as effective capacitance and
resistance of internal nodes.
But profiling makes the scenario practical. In the profiling phase, an adversary can estimate
probability distribution of power consumption given any secret key by recoding many power
traces at the specific times when cryptographic operations with intermediate values related to
the secret key are performing. The more power traces are exploited for the profiling, the more
accurately the probability distributions are estimated. The correct secret key can be extracted
with various classifiers (or distinguishers) based on the estimated probability distributions and
a maximum-likelihood (ML) decision rule. Machine learning techniques such as linear discrim-
inant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression classifier or
support vector machine (SVM) can be utilized.
As a non-profiling attack, differential power analysis (DPA) attack has been shown to
8Figure 2.1: Side-channel analysis attacks
be especially effective in finding the secret key by exploiting correlation between estimated
power consumption and the processed data. Since this attack needs little knowledge of the
implementation of the cryptographic algorithm and can be performed with relatively cheap
equipment, it is known to be a major threat to cryptographic devices such as smart cards
or embedded systems. The hypothetical power consumption model (or leakage model) of an
adversary based on the intermediate value which depends on the key is related to the measured
power consumption if the key guess is correct. The adversary’s leakage model and the classifier
affect success of failure of attack. Fig. 2.1 shows the diagram of the side-channel analysis
attacks.
The chapter is organized as follows. The next section presents the general method of
differential power analysis attack. Section 2.3 describes profiling attacks with various machine
learning classifiers such as LDA, QDA, na¨ıve Bayes classifier and SVM. As an example of SCA
application, SCA based disassembler of AVR is proposed in Section 2.4.
92.2 Differential Power Analysis (DPA) Attack
There exists a general attack strategy that is used by all DPA attacks. The first step of the
DPA attack is to determine the intermediate value of the cryptographic algorithm executed by
the device under attack, which is denoted by vi = f(di, k
∗), where di is the ith plain text or
cipher text and k∗ is the secret key.
The second step is to measure the power consumption of the cryptographic device while it
encrypts or decrypts D different data blocks including the seleted function at the first step.
We denote the power trace as ~ti = (ti,1, ti,2, . . . , ti,t∗ , . . . , ti,P )
T corresponding to data block
di, where P denotes the length of the trace and ti,t∗ is the power consumption when the
selected function at the first step is performed. An adversary measures a trace for each of
the D data blocks, and hence, the traces can be written as matrix T of size D × P : T =
(~t1, ~t2, . . . , ~tt∗ , . . . , ~tP ), where ~tj for j = 1, . . . , P is a column vector of size D × 1.
The third step is to calculate a hypothetical intermediate value for all possible k : vi,j =
f(di, kj) for i = 1, . . . , D and j = 1, . . . ,K.
The forth step is to map the hypothetical intermediate values to the hypothetical power
consumption values: hi,j = g(vi,j) = g(f(di, kj)) for i = 1, . . . , D and j = 1, . . . ,K. The most
commonly used power consumption models are the Hamming-distance and the Hamming-weight
model. The D×K matrix H is made at this step : H = ( ~h1, . . . , ~hK), where ~hi for i = 1, . . . ,K
is a vector of size D × 1.
The fifth step is to compare the hypothetical power consumption model with the measured
power traces. In order to measure the linear relationships between two vectors ~hi and ~tj for
i = 1, . . . ,K and j = 1, . . . , T , the correlation coefficient is calculated :
ri,j =
∑D




where hi and tj denote the mean values of the vector ~hi and ~tj , respectively. If rk∗,t∗ of the
correct key k∗ and the specific time t∗ has the distinct peak value, the DPA attack is successful.
10
2.3 Profiling Attacks
Assuming that the adversary performs a Bayesian attack, s/he first carries out many experi-
ments to measure power consumption in order to model the conditional probability distribution
of side-channel power given all possible keys k for k = 1, . . . ,K, denoted by Pr[~l|k]. We call
this process the profiling step. After the profiling step, the posterior probability that the secret
key is equal to k given any measured power (~lj) can be computed using Bayes’ theorem :





Using the maximum-likelihood estimation, the best guess key is the key k that leads to the
maximum probability:




If the prior probability Pr[k] for k = 1, . . . ,K is uniformly distributed, Eq. (2.1) is equal
to the following:




The likelihood probability Pr[~lj |k] at Eq. (2.2) determines the kind of the classifier. The
successful classifier selects the correct key : k = k∗.
2.3.1 Na¨ıve Bayes classifier
Assuming that ~lj ∈ Rt with ~lj = (lj,1, . . . , lj,t)T where 1 ≤ t ≤ P and each lj,i is conditionally
independent of every other lj,m for i 6= m given the key k, the classifier is defined as























K(−u) = K(u) for all values of u.






























i=1(Xi − X¯)2 and Srod =





2) denotes the median
of the sample. The classifier of Eq. (2.3) is called na¨ıve Bayes classifier.
2.3.2 Linear discriminant analysis
If the likelihood probability Pr[~lj |k] for k = 1, . . . ,K is the multivariate Gaussian density
fuction with the mean vector ~µk and common covariance matrix Σ of size t× t, that is,





(~lj − ~µk)TΣ−1(~lj − ~µk)
)
,
then the classifier is the following:


























This classifier is the linear discriminant analysis(LDA) classifier.
12
2.3.3 Quadratic discriminant analysis
The quadratic discriminant analysis (QDA) classifier results from the assumption that each
class is drawn from a multivarite Gaussian distribution with a class specific mean row vector
~µk and class specific covariance matrix Σk. The QDA classifier is the following:



































2.3.4 Support vector machine
Support vector machines have been introduced by Vapnik [Vapnik (1995)]. It became more
important and popular in recent years when extensions to general nonlinear SVMs have been
made [Vapnik (1995), Vapnik (1998)].
2.3.4.1 Linear SVM classifier : separable case
Consider a given training set {~xi′, yi}Ni=1, input patterns ~xj ∈ Rd and output patterns yi ∈ R
with class labels yi ∈ {+1,−1}. We define a unique separting hyperplane. We would like to
find ~w and b such that 
~wT ~xi + b ≥ +1 if yi = +1
~wT ~xi + b ≤ −1 if yi = −1
which can be rewritten as
yi(~w
T ~xi + b) ≥ 1, i = 1, . . . , N. (2.4)
The optimal searching hyperplane is the one that maximize the distance between the hyperplane
and the nearest points on either side. The distance of ~xi to the discriminant is
|~wT ~xi + b|
‖~w‖ =
yi(~w
T ~xi + b)
‖~w‖
13
,which we would like to be at least some value ρ which is called margin:
yi(~w











~wT ~w subject to yi(~w
T ~xi + b) ≥ 1 for i = 1, . . . , N.
This is a standard quadratic optimization problem, whose complexity depends on d, the di-
mensionality of the training data. We can convert the optimization problem to a form whose
complexity depends on N , the number of training instances, and not on d. The advantage of
this new formulation is that it will allow us to rewrite the basic functions in terms of kernel
functions [Alpaydin (2010)].
The Lagrangian for this problem is





αi{yi(~wT ~xi + b)− 1}
with Lagrange multipliers αi ≥ 0 for i = 1, . . . , N . Since the main term is convex and the
linear constraints are also convex, this is a convex quadratic optimization problem. Therefore,
we can equivalently solve the dual problem, making use of the Karush-Kuhn-Tucher condition.
The dual is to maximize Lp with respect to ~α, subject to the constraints that the gradient of
Lp with respect to ~w and b are 0 and also that αi ≥ 0. The solution is given by the saddle





























Note that this problem is solved in ~α, not in ~w. Once we solve for ~α, most elements of ~α
vanish with αi = 0 and only a few elements have greater than 0. The data related to nonzero
αi are called support vectors and these data points contribute to the sum in the classifier model
at Eq. (2.5).
2.3.4.2 Linear SVM classifier : non-separable case
If the two classes are not linearly separable such that there is no hyperplane to perfectly
separate the data, the hyperplane that incurs the least error should be searched. The inequality
of Eq. (2.4) is modified into the following:
yi(~w
T ~xi + b) ≥ 1− ξi for i = 1, . . . , N
with slack variables ξi > 0 such that the original inequalities can be violated for certain points
if needed. The optimization problem becomes
min
~w,~ξ










T ~xi + b) ≥ 1− ξi for i = 1, . . . , N
ξi ≥ 0 for i = 1, . . . , N.
The Lagrangian for this problem is
Lp(~w, b, ~ξ; ~α, ~ν) = T (~w, ~ξ)−
N∑
i=1




and Lagrange multipliers αi ≥ 0, νi ≥ 0 for i = 1, . . . , N . The solution is given by the saddle





















= 0→ 0 ≤ αi ≤ c, i = 1, . . . , N.
15
2.3.4.3 Nonlinear SVM classifiers
If the problem is nonlinear, we can map the problem to a high dimensional feature space
(Rnh) by doing a nonlinear transformation using suitably chosen basic function. After the
nonlinear mapping ϕ(~x) : Rn → Rnh , a construction of the linear separating hyperplane is done
in this high dimensional feature space. The optimization problem becomes
min
~w,~ξ










Tϕ(~xi) + b) ≥ 1− ξi for i = 1, . . . , N
ξi ≥ 0 for i = 1, . . . , N.
One constructs the Lagrangian :
Lp(~w, b, ~ξ; ~α, ~ν) = T (~w, ~ξ)−
N∑
i=1




and Lagrange multipliers αi ≥ 0, νi ≥ 0 for i = 1, . . . , N . The solution is given by the saddle





















= 0→ 0 ≤ αi ≤ c, i = 1, . . . , N.
We make use of the Mercer condition by choosing a kernel
K( ~xk, ~xl) = ϕ( ~xk)
Tϕ(~xl).
By applying this theorem one can avoid computations in the huge dimensional feature spce.











αiyiK(~x, ~xi) + b
]
16
with #SV denotes the number of support vectors.
Several kernels K(·, ·) are the followings:
K(~x, ~xi) = ~xi
T~x (linear kernel)
K(~x, ~xi) = (~xi
T~x+ 1)d (polynomial kernel of degree d)
K(~x, ~xi) = exp(−‖~x− ~xi‖2/σ2) (RBF kernel)
K(~x, ~xi) = tanh(κ~xi
T~x+ θ) (MLP kernel).
2.4 Side-channel Based Disassembler of AVR microcontroller
The main focus of the side-channel based disassembler is to extract assembly level code
along with the control flow graph from the side-channel leakage. The significant difference
between side-channel analysis attacks and side-channel based disassembler is the number of
required power sample traces to succeed assuming that both use profiled templates. Side-
channel analysis attacks for secret data leakage have more flexibility in the number of required
sample traces because the adversary can control the plaintext input of the target device. But
side-channel disassembler does not have similar controllability of the target device. It should
recognize a power or EM trace of each executed instruction. In other words, side-channel dis-
assembler should estimate which instruction is executing, which register is used, or what value
is processed with only one sample. This makes side-channel disassembler a more challenging
problem. It requires more advanced estimation techniques.
There exist many challenging problems in complete disassembly. Identification of destina-
tion register, Rd and source register, Rs for register transfer instructions or data for load or
store instructions is difficult. For a more complete monitoring of programs, many variables
such as register names, register data, memory address and values for load/store instructions
should be estimated. Moreover, recent embedded microcontrollers such as ARM Cortex-M or
Cortex-A series have more complex architectures with deeper pipeline stages and larger in-
17
struction sets. Their system clock frequency also approaches a few hundred MHz or about
1GHz. It becomes more difficult to disassemble programs on the recent embedded devices.
Lastly, the acquisition methods of power or EM emanations with oscilloscopes would be sig-
nificantly stretched because of higher system clock frequencies of the target devices (of the
order of 1GHz). High sampling rate oscilloscopes (over 5GS/s) are needed to collect power or
EM leakage information generated at 1GHz frequency to prevent loss of fidelity. For profiling,
multiple data samples are needed, which may be a few billion (232) in case of 32-bit instruc-
tion sets. This can make the profiling process significantly time consuming. Fast bandwidth
between the oscilloscope and the desktop or laptop to store the sampled data is also required.
The oscilloscopes with high sampling, high vertical resolution and fast bandwidth are fairly
expensive (over $ 20 K).
In this section, we propose power side-channel based disassembler of AVR using hierarchi-
cal quadratic discriminant analysis (QDA) classifier and SVM classifier. Even though AVR
microcontroller is not state-of-the-art devices, we believe that our method can be a starting
point to disassemble recent embedded microcontroller. Our disassembler includes estimating
which registers are used and what value the registers have as well as which instructions are
executed. Also, we compare QDA classifier with other classifiers such as na¨ıve Bayes classifier,
LDA classifier and the SVM classifier.
2.4.1 Preliminary Experiments
We conducted preliminary experiments to check if similar style instructions of AVR AT-
mega328p µC can be disassembled through power analysis. We considered 6 data transfer
instructions (add, sub, and, mov, or, eor) from the source register (Rs16 ∼ Rs25) to the
destination register (Rd16 ∼ Rs25). The goal of this experiment is to identify which instruction
is executed and which Rd and Rd are exploited.
The AVR µC has 2 pipeline stages and with a clock frequency of 16 MHz. Tektronix
DPO-4032 oscilloscope is used to sample the power pin at 1.25GS/s, 20MHz bandwidth, 1000
sample points and 128 average mode. Using this oscilloscope, the voltage of the shunt resistor
18
between the GND pin and ground is measured. Each power trace is measured with the following
program segment template: sbi, 5 nops, targeted profiled instruction, 5 nops. The
sbi instruction is executed for the trigger signal. In order to remove power consumption of sbi
instruction and electrical noise, we compute the difference between each power trace and the
reference power traces of sbi and 10 nops sequence. For profiling, 3000 power traces per each
instruction with randomly selected Rs and Rd ( the values of the Rs and Rd also are randomly
distributed ) are sampled. We also measures 3000 power traces per each Rd with randomly
selected instruction and Rs and 3000 power traces per each Rs with randomly selected instruc-
tion and Rd. These training data will be used for the classification. There exist 3 different class
groups. The first class group represents the instruction : Cint = {cadd, csub, cand, cmov, cor, ceor}.
The second class group and third class group represents the source register and the destination
register, respectively : CRd = {crd16, . . . , crd25}, CRs = {crs16, . . . , crs25}.
Before the training, the measured traces should be preprocessed in order to remove noise and
to make different classes be more distinguishable. The continuous wavelet transform (CWT)
to extract distinct features among all classes in both the frequency and the time domain is
used. Principal components (time and frequency) are extracted from the wavelet transform of
the collected traces. Only the principal time and frequency region features are kept, and all
the other time and frequency domain signals are zeroed. An inverse CWT of the shaped time
and frequency signal contains only the principal features in the time domain. The next step
is the feature selection to look for which any specific times are significant. The total number
of sampling point per each inverse-CWT power trace is 160. Assuming that each sampling
point has normal distribution with the mean µi and the variance σ
2
i for i = 1, . . . , 160, the
probability distribution of each class has the multivariate(160-dimensional) normal distribution.
The computation complexity is very expensive and not practical. Thus, the dimensionality
reduction or feature selection is required.
The Kullback-Leibler divergence is useful metric for the feature selection. The more the KL
divergence between two random variable, the more distinguishable two random variables. The
specific sampling points should have large KL-divergence value. Also, the specific sampling
points does not have dependency (or collinearity). To satisfy two conditions, the specific
19
sampling points have locally maximum value. As a result, 160 dimensionality can reduce to
about 10. Fig. 2.2 shows the preprocessing for the separation of power traces of and and sub.
The scatter plots of the raw power traces are overlapped. After CWT analysis and the feature
selection, the scatter plots does not have the overlapping region.
3000 power traces with the specific sample points per each class are used for the training
depending on the classifier. Linear discriminant analysis (LDA), quadratic discriminant analy-
sis(QDA) and na¨ıve Bayes method are executed. Each classifier has different assumption. LDA
assumes that the distribution of each class has multivariate normal distribution with the same
covariance matrix (Σ). QDA has more flexibility than LDA since they assumes that the distri-
bution of each class has multivariate normal distribution with the different covariance matrix
(Σi 6= Σj ∀i 6= j). Na¨ıve Bayes classifier assumes that the probability distribution of each
specific sampling point of each class can be various distribution independently. The marginal
probability distribution of power traces at a specific sampling point resembles the normal dis-
tribution and the marginal probability distribution of each class has different variance. Fig.
2.3 shows the kernel density estimation of each instruction at a specific sampling point. Since
the characteristic of power traces satisfies the assumption of QDA, the QDA classifier has the
best performance among three classifiers (LDA, QDA, na¨ıve Bayes classifier). The successful
recognition rates (SR) of instructions (add, sub, and, mov, or, eor) according to classifiers
are shown in Table 2.1.
The registers from Rd16 (or Rs 16) to Rd25 (or Rs25) can be grouped into 4 classes depend-
ing on the Hamming weight of the binary address of the register. The Hamming weight of the
register address is very related to the power consumption during the fetch and decoding of the
instruction since the address of registers occupies 10-bit length of the 16-bit instruction code.
The classification of registers (Rd, Rs) can be executed hierarchically. The Hamming weight
of the address of the register is identified and then the address of the register in the Hamming
weight class is recognized. Fig. 2.4 shows the hierarchical classification of the register (Rd,
Rs) using the QDA classifer and the successful recognition rate of the Hamming weight class
and the address. The successful recognition rates of the Hamming weight of Rd and Rs are 80%
and 69.6%, respectively. The address of the register Rd and Rs with the 2-Hamming weight is
20
Figure 2.2: Separation of power traces of ADD and SUB
Figure 2.3: Kernal density estimation denpending on instructions at a specific sampling point
recognized at the rate of 77.8% and 67.5%, respectively. The address of the register Rd and Rs
with the 3-Hamming weight is recognized at the rate of 83% and 73.6%, respectively. Fig. 2.4
shows the hierarchical classification of registers Rd and Rs and successful recognition rates.
2.4.2 SVM
LS-SVM(Least Squares Support Vector Machine) [Leuven (2011)] is used to classify in-
structions. Fig. 2.5 shows the successful recognition rates of LS-SVM and QDA to classify
measured power traces into two classes. LS-SVM mostly overcomes QDA classifier in terms of




na¨ıve Bayes 37.1 %
21
Figure 2.4: Hierarchical classification of registers and successful recognition rate
the successful recognition rate. In case of C = {cexor, cmov}, LS-SVM results in 8.5 % better
performance than QDA classifier. Table 2.2 shows successful recognition rates of LS-SVM and
QDA classifier depending on various classes. LS-SVM increases 12 % successful recognition
rates of 6 instructions (add, sub, and, mov, or, eor) compared with QDA result.
22
Figure 2.5: LS-SVM vs QDA
Table 2.2: SR of instructions using LS-SVM and QDA classifiers
LS-SVM QDA
add vs and 88.97 % 89.26 %
add vs sub 89.88 % 89.58 %
add vs exor 81.46 % 85.46 %
add vs or 93.10 % 89.21 %
add vs mov 89.28 % 87.13 %
sub vs mov 95.17 % 94.05 %
sub vs or 91.80 % 93.28 %
sub vs and 94.67 % 95.25 %
sub vs exor 94.13 % 91.57 %
exor vs or 91.5 % 88.12 %
exor vs mov 92.92 % 84.42 %
exor vs and 84.92 % 86.25 %
or vs mov 89.63 % 92.4 %
or vs and 91.85 % 90.37 %
mov vs and 88.76 % 86.92 %
add vs and vs exor 86.83 % 78.56 %
add vs and vs mov 82.73 % 79.93 %
add vs and vs or 83.64 % 82.91 %
add vs and vs sub 85.83 % 85.16 %
add vs exor vs mov 85.23 % 77. 31 %
add vs exor vs or 82.57 % 79. 82 %
add vs exor vs sub 86.01 % 81.6 %
add vs and vs exor vs mov vs or vs sub 82 % 70.1 %
23
CHAPTER 3. SECURITY METRICS
3.1 Introduction
In this chapter, we focus on SCA metrics to flag insecure combinational modules within a
complete cryptographic system. We assume that the adversary is powerful enough to estimate
power consumption accurately to account for the number of switching transitions including
glitches in a complete cryptographic system. From the designer point of view, this assumption
bases security on an all powerful adversary. Even though simulation based profiling can be
performed at the logic level, it should be avoided due to efficiency. The number of input
vectors of the simulation increases exponentially in the number of input bits, denoted by n.
The power consumption can be estimated more efficiently by the Monte Carlo probabilistic
methods. The Monte Carlo probabilistic power estimation model is based on the fact that
power consumption depends on the transition probability and capacitance of the output node
of logic gates [Najm (1994)]. But the probabilistic power estimation model does not consider
glitches caused by the gate delay.
First, we propose a new stochastic power estimation method using renewal process and
linear regression which includes the dynamic power caused by the glitching phenomenon. This
method is used at the logic level design for efficient power profiling. Given any input transitions
of the combinational circuit, the normal power distribution with the mean µ and variance σ2
can be obtained.
Second, security metrics to capture SCA vulnerability with the power estimation are defined
and computed. The CAD for design flow includes the SCA metric estimation and optimization
just as area and delay estimation and optimization. The SCA security is quantified using
(1) the normalized variance metric (or the coefficient of variance) [Basel Halak (2013)], (2)
24
Kullback-Leibler divergence and [S. Kullback and R. A. Leibler (1951)] (3) the information
theoretic metric of the profiled power distribution. In our design flow, SCA vulnerability should
be verified with these metrics at all implementation abstraction levels from logic (or gate) to
layout level. We estimate Kullback-Leibler divergence from the power distribution gathered
from the approximate and quick renewal process based logic level simulation. Once the SCA
metric at the higher logic abstraction level is within safe bounds, the design flow can enter the
next abstraction level refinement. This abstraction refinement (as in logic level to netlist level)
introduces details that may develop new SCA vulnerabilities. Hence an acceptable SCA metric
value at higher abstraction layers still necessitates SCA metric computation at lower levels. The
mutual information metric is computed at the layout level with multiple SPICE level circuit
simulations. The acceptable thresholds for SCA security metric are defined theoretically. If
any combinational module has a value larger than the threshold, it is flagged as a vulnerable
module. The vulnerable modules should be transformed into a secure module. One of the
methods to accomplish this is to use a secure logic design style such as t-private circuits [Ishai
et al. (2003)] or masked dual-rail dynamic logic [Mangard (2005)].
The chapter is organized as follows. The next section presents the basic definitions and
lemmas for power estimation. We develop the power leakage model using renewal process and
linear regression in Section 3.3. The SCA security metrics are presented in Section 3.4. The
recognition rate using maximum likelihood estimation is defined in Section 3.5. The recognition
rate is very related to KL divergence. Experimental results are presented in Section 3.6. Finally,
Section 3.7 concludes the chapter.
3.2 Basic Definition and Lemma
In this section, the basic definitions and lemmas for stochastic power estimation of combi-
national circuits are presented.
Definition 1 (Boolean difference). [ Mohyuddin et al. (2008)] The partial Boolean differ-





= fxi ⊕ fx′i
∂f
∂(xi1xi2 · · ·xik)
= fxi1xi2 ···xik ⊕ fx′i1x′i2 ···x′ik .
where fxi = f(x0, x1, . . . , 1, . . . , xn−1) and fx′i = f(x0, x1, . . . , 0, . . . , xn−1). The total Boolean
difference of f(x0, x1, . . . , xi, . . . , xn−1) with respect to a k-variable subset of its inputs is
defined as:
df



















i2 · · ·x′in−1xik
...
















i2 · · ·x∗ik(x∗i = xi or x′i).
Definition 2 (Observability). The observability of xi is the probability that xi is observable
at the output y = f(x0, x1, · · · , xi, · · · , xn−1) when the polarity of xi is changed. Using the






= Pr[fxi ⊕ fx′i ].
In general the kth order observability of a subset of inputs (xi1xi2 · · ·xik) at the output y =
f(x0, x1, · · · , xn−1) is defined as:
Oby(xi1 , xi2 , · · · , xik) = Pr
[
df
d(xi1xi2 · · ·xik)
]
.













Definition 3 (Logic network graph). [ Micheli (1994)] The logic network graph G(V,E,W (E))
is a directed acyclic weighted graph with the vertex set V which is in one-to-one correspondence
with the primary inputs, local functions and primary outputs and the weight set W (E) =
{w((vi, vj))|(vi, vj) ∈ E}. We denote a path P from the vertex v1 to another vertex vn by an
alternating sequence of distinct vertices and edges such as the following equation :
P = {v1, (v1, v2), v2, (v2, v3), . . . , vn}.
Definition 4 (Reconvergent node). Two distinct directed paths are reconvergent if they start
at a common vertex (va) and terminate at another common vertex (vb). The vertex va is called
a reconvergent fanout and the vertex vb is called a reconvergent node.
Definition 5 (Effective capacitance). We define the effective capacitance Cy(xi) as the average
of total switched capacitances of all logic gates on the path from the input xi to the output y when
the input xi is switched. We use the lumped-C model which describes the effective capacitance
Cy(xi) as a lumped capacitance containing the intrinsic and the extrinsic capacitance of all logic
gates.
The effective capacitance of a CMOS logic gate depends on the diffusion capacitance Cd of
the logic, the wiring capacitance Cw and the gate capacitance Cg of the following logic gates
[Weste and Harris (2010)]. The effective capacitance of a logic gate u is given by the following
equation:




where n is the number of logic gates ui driven by the logic gate u and Cg(ui) is the gate capac-
itance of each of the following logic gates. These capacitances Cd, Cw and Cg depend on the
physical properties of the process technology.
Assumption 1. For technology independent estimates at the logic level, we assume that
the mobility of the nMOS transistors is two times the mobility of the pMOS transistor and
that the transistor widths are chosen to achieve balanced rising and falling transition delays
27
[Weste and Harris (2010)]. We also assume that a unit transistor has the same gate capacitance
C as the source/drain diffusion capacitance (C = Cg = Cd) and Cw is equal to zero.
Lemma 1. The power consumption Py(xi) of logic gates on the path from the input xi to the
output y caused by switching the input xi depends on the output observability of the input xi and





where f is the frequency and VDD is the supply voltage.
Lemma 2. Given a logic network graph G(V,E,W (E)), where the weight set W = {w((vi, vj))|
w((vi, vj)) = Obj(i), (vi, vj) ∈ E}, the path observability Obian(a0) of the input a0 at the output








Generally, there exist various paths since the path Pi may have the reconvergent fanout rf
and reconvergent node rn. The observability of the input rf at the output rn is approximately





where m is the number of paths.
Proof. By the Shannon expansion, the output y0 is expressed by the following equations :
y0 = f(a0, · · · ) = a0fa0 + a′0fa′0 . (3.4)
Assuming f is decomposed into a1 = f
1(a0, . . .) and y0 = f








= {a0f1a0 + a′0f1a′0}f
r
a1 + {a0f1a0 + a′0f1a′0}
′f ra′1 . (3.5)
28
At (3.4), using (3.5) fa0 and fa′0 are the following equations :
fa0 = f(1, . . .)
= f1a0f
r
a1 + {f1a0}′f ra′1 .
fa′0 = f(0, . . .)
= f1a′0
f ra1 + {f1a′0}
′f ra′1 .
The boolean difference of f(a0, . . .) with respect to the variable a0 is
∂f
∂a0
= fa0 ⊕ fa′0
= [f1a0f
r
a1 + {f1a0}′f ra′1 ]⊕ [f
1
a′0
f ra1 + {f1a′0}
′f ra′1 ]







a1 + {f1a0}′f ra′1 ][{f
1
a′0














= (f1a0 ⊕ f1a′0)(f
r







Similarly, the logic function, f r is decomposed into n− 1 logic functions, ai = f i(ai−1, . . .)
















































Figure 3.1: Renewal process of logic network
Lemma 3. We let Cai(ai−1) for i = 1, . . . , n be the effective capacitance of each local logic
function, ai = f
i(ai−1, . . .) for i = 1, . . . , n. The effective capacitance Cy0(a0) of the complete
















If y0 and a0 are a reconvergent node and fanout pair, respectively and there exists m paths












where aji is the node in the jth path.
3.3 Power Model Using Renewal Process and Linear Regression
3.3.1 Renewal process
We propose new power estimation model using the renewal process and linear regression
in this section. We can model the switching behavior of logic circuits as a renewal process.
30
The transition or switching of each logic gate is regarded as a renewal. When switching events
propagate through connected logic networks, the input transition events cause renewals at
output nodes sequentially with renewal intervals between successive logic gates corresponding to
the gate delays. The expected number of renewals includes normal transitions and unintended
glitches due to variable delays and can be used for accurate power estimation. The accuracy
and computational complexity of power estimation depends on the probability density function
of the renewal intervals, Xi.
There exists a path P from the vertex v1 to another vertex vn in the logic network
G(V,E,W (E)). Note that the i− 1st logic gate should be triggered for switching the ith logic
gate. Some logic gates are triggered in sequential order from switching the primary input x at
time t0 with the probability p0. The logic gates v1, v2, · · · , vn are triggered at time t1, t2, . . . , tn
with the probability p1, p2, . . . , pn, respectively. The renewal process [Nelson (1995)] can be
used for modeling the behavior of the logic network. The transition points ti are renewal points.
Let Xn be the renewal interval between successive renewal points, tn− tn−1. Fig. 3.1 describes
the renewal process of the logic network.
We define S0 = 0 and
Sn
def




= max{n : Sn ≤ t}.
Recall that Sn is the time of the nth renewal and N(t) is the number of renewals that occur
within the interval (0, t]. We let Fn be the distribution of the sum of n independent random
variables distributed as Xi. Fn is defined as the nth-fold convolution of FXi , that is,
Fn(x)
def
= FX1 ∗ FX2 ∗ · · · ∗ FXn
and let fn(x) be the corresponding density function. We are concerned with properties of N(t).
31
Using Fn(x), the density of N(t) can be derived as
Pr[N(t) = n] = Pr[N(t) ≤ n]− Pr[N(t) ≤ n− 1]
= Pr[Sn+1 > t]− Pr[Sn > t]
= Fn(t)− Fn+1(t).












Note that Xi can be modeled as the time for transition event from the switching event
Xi−1 or it can be modeled as time to the clock edge. The first scenario models Xi as a random
variable with probability pi as a normal distribution with the mean µi and the variance σ
2
i .
The second scenario captures T − ti−1 with probability 1 − pi, where T is the period of clock
cycle. This means that if the logic node vi transitions with probability pi, Xi is a random value
which includes the logic gate delay and wire delay. Otherwise, Xi is the remaining time to the
period of the clock cycle.
We define the probability density of Xi as the followings:
fXi(t) = (1− pi)δ (t− (T − ti−1)) + pin(t;µi, σi)












, δ(t) is the impulse function or Dirac delta function,




































R(T ) means the expected number of switched signals on the path P by triggering a input
during a clock cycle. If multiple inputs are triggered, there exist multiple paths from the inputs
32
to outputs. Let P1 and P2 be the path from the input a and b to the output y, respectively.
If two paths share a common path from the node vi to the output, the number of transition
caused by triggering the input a and b varies according to the node vi and the arrival time at
the node vi. If the node vi is a XOR gate and the difference between two arrival time at the
node vi, denoted by δ is greater than 0, then the glitch at the output of the XOR gate occurs
and propagates to the output through the shared path. Otherwise, there exist no transitions
from the node vi. The probability that δ is equal to 0, denoted by pδ=0 can be obtained as
follows :
















where Sn1 and Sn2 are the time of the n1th and n2th renewal of each path from each input to
the node vi. The expected number of transitions R(T ) is equal to
















where R1(t) and R2(t) are the expected number of transitions on each path from each input
to the node vi. R3(T ) is the expected number of transitions when only one switching event
is injected due to one of the incoming paths. Note that the term (1 − pδ=0) captures the
probability that glitching occurs.
Similarly, if the node vi is a gate other than XOR, such as NAND or NOR, the term
2pδ=0R3(T ) is changed into
1
2pδ=0R3(T ) + (1− pδ=0)R3(T ). Fig. 3.3 shows the reason why the
term should be changed based on the truth table of different gates.
33
Figure 3.2: Renewal process caused by triggering two inputs
Figure 3.3: Different transition counts according to logic gate and δ
34
3.3.2 Graph based analysis
The logic network graph of the combinational logic circuit will be simplified through node
collapsing using the properties in Eq. (3.2), (3.3), (3.6) and (3.7). This method is called graph
based analysis. Let the corresponding logic network graph to be G(V,E,W (E),W (V )) with the
edge weight set W (E) = {w((vi, vj))|w((vi, vj)) = Obj(i), (vi, vj) ∈ E} and the vertex weight
vector set W (V ) = {~w(v)|~w(v) = [wi(v)], wi(v) = C(v) for i = 0, . . . , Indegree(v)− 1, v ∈ V }.
For example, given a logic network graph of y = g(a, b) = a · b, there exist four vertices
va, vb, vg, vy and three edges (va, vg), (vb, vg), (vg, vy). The components of W are the following:
w((va, vg)) = Obg(a) = Pr[fa ⊕ fa′ ] = Pr[b]
w((vb, vg)) = Obg(b) = Pr[fb ⊕ fb′ ] = Pr[a]
w((vg, vy)) = 1
~w(vg) = [w0(vg) w1(vg)] = [C(g) C(g)].
The power consumption caused by switching an input is given by the following equations:
Py(a) = α · w((va, vg)) · w((vg, vy)) · w0(vg) = αPr[b]C(g)
Py(b) = α · w((vb, vg)) · w((vg, vy)) · w1(vg) = αPr[a]C(g)
where α is 0.5V 2DDf . Other logic gates such as OR, NAND, NOR or XOR also correspond
to a logic network graph. Fig. 3.4 shows these logic network graphs of basic logic gates. The
logic network graph G(V,E,W (E),W (V )) can be simplified or reduced through node and edge
reduction primitives by using Lemma 2 and Lemma 3.
Node Reduction: Two gate vertices vg1 and vg2 connected by an edge (vg1, vg2) can
be united into a vertex vg1g2 with Indegree(vg1g2) = Indegree(vg1) + Indegree(vg2) − 1 and
Outdegree(vg1g2) = Outdegree(vg1) + Outdegree(vg2) − 1 after removing the edge (vg1, vg2).
The weights of indegree edges of vg1 are changed to w((vg1 , vg2)) times their weights given by Eq.
(3.2). The weights of incoming edges of vg2 are not changed. The weight vectors of the united
vertex vg1g2 are changed into ~w(vg1g2) = [wi(vg1g2)] where wi(vg1g2) = c(g1)/w((vg1 , vg2))+c(g2)
for i = 0, . . . , Indegree(vg1)−1 and wi(vg1g2) = c(g2) for i = Indegree(vg1), . . . , Indegree(vg1)+
Indegree(vg2)−2 by Eq. (3.6). This vertex reduction can be repeated until only the pairs of the
35
Figure 3.4: Logic network graphs of basic logic gates
reconvergent fanout vrf and node vrn with two or more edges between them remain, along with
the primary input and output nodes. Two or more edges of the pairs of the reconvergent fanout
and node can be reduced by the following edge reduction. Also, the pairs of the reconvergent
fanout and node with a reduced edge can be reduced by this node reduction except that the
weight of the vertex vrf,rn is derived by Eq. (3.7). Note that each node reduction step reduces
the node count by at least 1.
Edge Reduction: The edges between the reconvergent fanout vrf and node vrn are reduced
into a single edge with the weight Obrn(rf) given by Eq. (3.3). Finally, the simplified network
graph G′(V,E,W (E),W (V )) has only a single logic (function) node, the primary input nodes,
and the primary output node. Fig. 3.5 shows the vertex reduction in the logic network graph.
Algorithm 2 presents the reduction method to reduce a logic graph into a singleton graph in
order to compute the power consumption of the combinational circuit trivially.
If all effective capacitances in the logic network are set to 1, the expected number of switched
signals on each path is equal to the weight of the corresponding edge in the simplified network.
36
Figure 3.5: Reduction of Logic network graph
37
Algorithm 1 Reduction of G(V,E,W (E),W (V ))
Input : A logic network graph, G(V,E,W (E),W (V ))
V = {Vpi, Vpo, Vg|pi: privary inputs, po : privary outputs, g : local functions}
W (E) = {w(vi,vj)|w(vi,vj) = Obj(i), (vi, vj) ∈ E}
W (V ) = {~w(v)|~w(v) = [wi(v)], wi(v) = C(v)
for ∀ i = 0, . . . , Indegree(v)− 1, v ∈ V }
Output : A simplified network graph,
G′(V,E,W (E),W (V ))
for k = 1→ |Vg| − 1 do
Select two connected vertices vgi, vgj for ∀ vgi, vgj ∈ Vg
if vgi = Reconvergent fanout and vgj = Reconvergent node with two more edges then
Reduction edge (vgi, vgj), Reduction vertex (vgi, vgj)
else




The number of transitions of signals in the hardware implementation is highly correlated to
the dynamic power consumption [Mangard et al. (2005)]. In order to estimate power consump-
tion using the number of transitions, linear regression is used. Let X and Y be the random
variables of the number of transitions and power consumption, respectively. The estimator of
Y , denoted by Yˆ is the followings:
Yˆ = αˆ+ βˆX, βˆ =
Sxy
Sxx
, αˆ = Y − βˆX (3.9)
where Sxy =
∑n
i=1(xi−X)(yi− Y ), Sxx =
∑n
i=1(xi−X)2 and X and Y are the sample means
of X and Y , respectively. Thus, the probability density function of the power can be refered
to n(x;µYˆ , σYˆ ), where µYˆ = Yˆ and σYˆ =
√
βˆσ2X by a few number of samples. This leakage
distribution will be used to induce power based SCA security metrics in the following section.
3.4 SCA Security Metrics
Power based SCA security metrics were defined [Basel Halak (2013)] in order to measure the
effectiveness (inverse of robustness or resistance) of side-channel attacks on the target boolean
38
function : ~v = f(~k, ~x), where ~k is a part of the secret data and ~x is related to the plaintext or
ciphertext. The more distinguishable and identifiable power consumption is to different inputs,
the more vulnerable is the SCA security of the target boolean function. In order to quantify
SCA effectiveness, the normalized standard deviation was used in [Basel Halak (2013)]. The








where yi is a random sample of power consumption given any input pattern from the sample
space with the mean µ and the variance σ2, y is the sample mean of yi. As n goes to infinity,
the normalized standard deviation is equal to σ/µ.
Note that if we allow constant current components in the circuit, this metric is flawed. Given
a circuit C0 with mean µ0 and standard deviation σ0, the metric is altered from (σ0/µ0) to
(σ0/(µ0+µ1)) when another isolated, disconnected circuit C1 with constant current (σ1 = 0 and
mean µ1) is added. This indicates a quantitative reduction in the SCA effectiveness for no good
reason. Also, this metric has large value for countermeasure circuits with randomly independent
power consumption even if the circuits have robustness against SCA attacks. Additionally, there
is no obviously justifiable mechanism to determine a safety threshold for this metric to flag a
circuit as vulnerable when the metric exceeds the threshold. For these reasons, we propose a
new SCA security metric using Kullback-Leibler divergence in the following subsection.
3.4.1 Kullback-Leibler divergence
Let’s consider the failure probability that the adversary makes an incorrect inference using
the standard power based SCA attacks. A circuit with high failure probability should be more
secure than the circuit with low failure probability. First, we assume that the adversary wants
to know only an output bit Y by SCA attacks. We also assume that Pr[Y = 0] = Pr[Y = 1] as
a starting point to define the new SCA security metric. Let Pr[l|y0] be the probability density
function of the power leakage given that the output Y is 0 and Pr[l|y1] be the probability
density function of the power leakage given that the output Y is 1. Suppose that the conditional
probability density functions are normal distributions with the means µ0, µ1 (assuming that
39
µ1 > µ0) and the same variation σ
2
0. Assuming that the adversary knows the conditional




















The adversary should choose 0 output if the a posteriori probability Pr[y0|l] is greater than the
a posteriori probability Pr[y1|l]. Otherwise, s/he should choose 1.
The failing probability of the adversary is defined as the sum of the probability that given
the output 0, the hypothesis test H1 is selected and the probability that given the output 1,
the hypothesis test H0 is selected. That is





































where Q(x) is called the complementary error function. SCA security metric using the normal-






















Comparing Eq.(3.10) with Eq.(3.11), earlier SCA security metric for normalized variance does
not match with the failing probability (µ is not required). In this case, σ/σ0 is a better choice as
the SCA security metric. For example, if the circuit designer wants the SCA failing probability
40













3.4.1.1 Two normal distributions with different means and variances
Suppose that two conditional probability density functions Pr[l|y0] and Pr[l|y1] have differ-
ent means and variances (µ0 6= µ1, σ0 6= σ1). In this case, the failing probability PrF is equal



















. The above equation cannot be simplified
such as Eq.(3.10) and also it is difficult to obtain the exact value. In order to get simple and
approximate value of PrF in general cases, we assume that each conditional probability density
faction has the same variance σ0 at the following subsection.
3.4.1.2 N normal distributions with the same variance σ0
The power consumptions of any circuit can be classified as N normal distribution with
µ0, µ1, . . . , µN−1 and the same σ20 based on the number of outputs. The failing probability
PrF of the adversary is equal to the overlapping coefficient of N normal distributions. The
PrF is larger than the smallest overlapping coefficient between two normal distributions of N
normal distributions. The two normal distributions have the smallest mean and largest mean,
respectively. The smallest overlapping coefficient is selected as the threshold of the failing
probability denoted as PrFth. If the designer set PrFth to any value, the failing probability
should be larger than the value. In this case, the threshold of the failing probability and SCA
41
security metric are the following equations:
PrFth = 2Q
(
supi 6=j |µi − µj |
2σ0
)
SCA security metric =
supi 6=j |µi − µj |
2σ0
Generally, N normal distributions have different means and variances. In the general case, it
is difficult to compute the threshold of the failure probability and define SCA security metric.
Kullback-Leibler divergence is used to define new SCA security metric.
3.4.1.3 Kullback-Leibler divergence
Let fX(z) and fY (x) be the probability density functions of random variable X and Y ,
respectively. Kullback-Leibler divergence is defined as the following equation [S. Kullback and







If X and Y are the normal distribution with µ0, σ
2





n(x;µ0, σ0)(log n(x;µ0, σ0)− log n(x;µ1, σ1))dx
=
{
(µ0 − µ1)2 + σ20 − σ21
}
/(2σ21) + ln(σ1/σ0) (3.13)
Kullback-Leibler divergence of two random variables with the normal distribution can be
computed easily. The maximum of Kullback-Leibler divergence for allowable failure probability
can be obtained. For example, if we want the failure probability of more than 0.9, the Kullback-
Leibler divergence should be less than 0.03.
Also, Kullback-Leibler divergence is related to the number of traces N that is necessary to
assert with a confidence of (1 − α) that the two normal distributions X and Y are different.
The number of traces N is a significant contributor in quantifying a lower bound on the attack
complexity. The smallest number of traces to satisfy that Pr
[|X − Y − (µX − µY )| < ] =
(1− α) is
N ≥ (σ0 + σ1)
2









= 1 − α/2. Comparing to Eq. (3.13), as Kullback-Leibler divergence of two
random variables increases, the number of traces N decreases. Ideally, we would like to be able
to show that N has a non-trivial, super polynomial lower bound in n - the number of bits in
the secret.
3.4.1.4 SCA security metric using Kullback-Leibler divergence
Generally, suppose that the adversary knows N normal probability density functions with
different means and variances. We define SCA security metric as Maximum Kullback-Leibler
divergence of two random variables among N random variables:
SCA security metric = MAX
Xi∼N (µi,σ2i )




The second security metric to quantify DPA effectiveness is to use the mutual information
[Standaert et al. (2009)].
I( ~K; ~L) = H[ ~K]−H[ ~K|~L] (3.15)
where ~K is a variable containing a part of the secret data and ~L is a leakage observation such as
power consumption through the side channel. The entropy of ~K, denoted by H[ ~K] is log2| ~K|
assuming that ~K is uniformly distributed. The conditional entropy H[ ~K|~L] is the following
equation :






where Pr[k|l] = Pr[l|k]Pr[k]∑
k∗∈ ~K Pr[l|k∗]Pr[k∗]
.
In order to compute the mutual information, the conditional probability Pr[l|k] should be
estimated. Using simulation tools such as SPICE, the power consumptions can be measured
43
Figure 3.6: The failure probability PrF : Overlapping coefficient of two normal distributions
resulting in a sampled estimate of the probability distribution. Since the simulation-based
power estimation requires significant time due to detailed circuit level SPICE simulation, it
is important to determine the required minimum number of sample measurements to obtain
statistically significant probability distribution. Assuming the probability distribution is the
normal distribution with the mean µ and the variance σ2, the smallest number of measurements








= 1 − α is N = σ2
2
· z21−α/2. The mutual information based SCA
analysis will be exploited for more realistic and accurate verification at the physical transistor
or layout level.
3.5 Recognition Rate Using Maximum Likelihood Estimation
The maximum likelihood estimator of c is defined as the following:






where Tci is the test statistic for the class ci and fL|ci is the probability density function of the
side-channel leakage L given a class ci. It requires the log-likelihood of the correct class c
∗ be
larger than all other classes for the MLE to successfully recognize the side-channel leakage l
into the correct class c∗. The successful recognition rate is defined as the probability that the
44
test statistic for the correct class c∗, Tc∗ is larger than all {T{c}−c∗} [Fei et al. (2014)] :
SR = Pr[Tc∗ > {T{c}−c∗}] (3.16)
We first consider the recoginition rate when there exist two classes such as c1, c2. Assuming
that c1 is the correct class c
∗, the successful recongnition rate SR is equal to the following:
SR = Pr[Tc1 > Tc2 ] = Pr[Tc1 − Tc2 > 0] = Pr[∆c1,c2 > 0]
where





[ln fL|c1( ~lm)− ln fL|c2( ~lm)].
In general, the disassembler exploits only one leakage observation ~l1. For a leakage observation,
the mean and variance of ∆c1,c2 are given by the followings:
µ∆c1,c2 = EL|c1 [ln fL|c1(
~l1)− ln fL|c2(~l1)] (3.17)
σ2∆c1,c2
= VarL|c1 [ln fL|c1(~l1)− ln fL|c2(~l1)]. (3.18)
Definition 6 (Noncentral chi-square distribution). The random variable Y is said to have a
noncetral chi-square distribution [Mathai and Provost (1992)] with k degrees of freedom and
noncentrality parameter δ if Y has the density


















where 0 < x <∞, k = 1, 2, . . . , δ ≥ 0, which is denoted as Y ∼ χ2k,δ.
Theorem 4. If we assume that fL|ci is the normal density function with the mean µci and the
variance σci, then ∆c1,c2 has the linear transformed noncentral chi-square distribution with one







































































































where l is the realization of a random variable L which has the normal distribution with
the mean µ1 and the variance σ
2








. The probability density function of Y is given by
fY (y; k, δ) which is the noncentral chi-square density function with the k = 1 degree of freedom
and the noncentrality parameter δ = b
2
4a2
. By the transformation technique, the probability











































































Theorem 5. If we assume that fL|ci is the D-dimensional normal density function with the
mean ~µci and the variance Σci, then ∆c1,c2 has the linear transformed noncentral chi-square





















(~l − ~µ1)TΣ−11 (~l − ~µ1) +
1
2
















~ZTPP T ~Z +
1
2







(P T ~Z)T (P T ~Z) +
1
2


























































where fx(; k, δ) is the noncentral chi-square density function with the k degree of freedom and








P = [~p1, ~p2, . . . , ~pn], PP
T = I, ~pi is the eigenvector corresponding to λi, ~U = P
T ~Z, E[~Z] =
~0,Cov[~Z] = I,E[~U ] = ~0,Cov[~U ] = I, and ~b = P TΣ−11 ( ~µ1 − ~µ2).
Theorem 6. If there exist three classes c1, c2, c3 and the correct class of a sample is c1, the
range of successful recognition rate is the following :
min{Pr[∆c1,c2 > 0],Pr[∆c1,c3 > 0]} ≤ SR ≤ max{Pr[∆c1,c2 > 0],Pr[∆c1,c3 > 0]}
Proof. Let Tci be the test statistic for the class ci : Tci = ln fL|ci( ~lm). Since we assume that the
sample belongs to the class c1, the successful recognition rate SR is equal to Pr[max{Tc1 , Tc2 , Tc3} =
Tc1 ] or Pr[Tc1 > Tc2 , Tc1 > Tc3 ].
Given that Tc2 > Tc3 , Pr[max{Tc1 , Tc2 , Tc3} = Tc1 ] = Pr[Tc1 > Tc2 ]. Otherwise, that is,
given that Tc3 ≥ Tc2 , Pr[max{Tc1 , Tc2 , Tc3} = Tc1 ] = Pr[Tc1 > Tc3 ]. Thus, the successful
recognition rate is equal to the following:
SR = Pr[Tc2 > Tc3 ]Pr[Tc1 > Tc2 ] + Pr[Tc3 ≥ Tc2 ]Pr[Tc1 > Tc3 ]
= αPr[Tc1 > Tc2 ] + (1− α)Pr[Tc1 > Tc3 ]
= {Pr[Tc1 > Tc2 ]− Pr[Tc1 > Tc3 ]}α+ Pr[Tc1 > Tc3 ]
where α is Pr[Tc2 > Tc3 ] .
If Pr[Tc1 > Tc2 ] > Pr[Tc1 > Tc3 ], SR has the minimum of Pr[Tc1 > Tc3 ] at α = 0 and the
maximum of Pr[Tc1 > Tc2 ] at α = 1. If Pr[Tc1 > Tc2 ] < Pr[Tc1 > Tc3 ], SR has the minimum
of Pr[Tc1 > Tc2 ] at α = 1 and the maximum of Pr[Tc1 > Tc3 ] at α = 0. Thus, the range of
SR is between Pr[Tc1 > Tc2 ] and Pr[Tc1 > Tc3 ]. Fig. 3.7 shows the successful recognition rate
according to α in the both cases.
3.6 Experiment
We implemented AES SBOX based on composite finite field proposed by Satoh et al. [Satoh
et al. (2001)] to compute our SCA security metrics. We used Cadence RTL Compiler and
48
Figure 3.7: Successful recognition rate according to α (a) when Pr[Tc1 > Tc2 ] > Pr[Tc1 > Tc2 ]
(b) when Pr[Tc1 > Tc3 ] > Pr[Tc1 > Tc2 ]
OSU standard cell library based on AMI C5N 0.6 µ process as the logic synthesizer and the
technology library, respectively. The calculator of the expected number of transitions depending
on triggered input bits is programmed with Perl/Tk. It generates the logic network graph with
the synthesized netlist and searches all paths from triggered inputs to outputs. Shared paths
of all paths are split and the number of transitions on those paths is calculated differently
depending on the staring node of the shared path and the different between arrival times at
the node. The sum of the number of transitions on each path, denoted by R(T ) is the total
number of transitions in the SBOX during computation.
In order to compute coefficients of Eq.(3.9), 1000 random pairs of (xi, yi) ( which represent
the number of transitions and average power during computation, respectively) are sampled
using Cadence Spectre analog simulator for sampling yi and transition counter for sampling xi.
Fig. 3.8 shows scattered plots of 1000 sample pairs and the linear regression line. In this case,
βˆ and αˆ are computed as 0.085 and 1.05, respectively. That is, the mean of estimated power
of any input vector, µYˆ is equal to αˆ + βˆR(T ) and the variance σ
2
Yˆ
is βˆσ2R. The probability
density function of the estimated power has the normal distribution with the mean µYˆ and the
variance σ2
Yˆ
. 256 normal distributions which results from all possible input vector (28) can be
obtained by 1000 times simulations and renewal process based estimation. SCA security metric
using KL divergence is 3.72 which corresponds to about 17% failure probability.
Using simulated samples, correlation power analysis attack of this SBOX was executed. We
49
Figure 3.8: Scattered plots and linear regression (βˆ = 0.085, αˆ = 1.05) of 1000 random samples
assume that the correct key value is 19. The correlation coefficient ρ of 19 guess key has the
highest value (0.53) and the guess key is correct. Also, the success probability was measured
depending on the number of samples (N). The success probability is over 95% when N is more
than 220 samples. Fig. 3.12 shows the result of CPA attack. Thus, this SBOX should be
protected against power based SCA attacks.
3.7 Conclusion
We have developed (1) a quantitative metric to capture the SCA resistance of a combi-
national circuit, (2) developed and implemented a stochastic power estimation method using
renewal process and linear regression which is more efficient than simulation based method.
As an example, we applied our metric and estimation method to AES SBOX implementation
at the logic level. We will apply these techniques to many unprotected and protected crypto-
graphic implementations and develop secure implementations with DKL < 0.03 (which means
the threshold of the failing probablity is greater than 90%).
50
Figure 3.10 Correlation Power Analysis
attack of AES SBOX ( N = 1000 )
Figure 3.11 Success probability according
to the number of samples (N)
Figure 3.12: CPA attack of AES SBOX
51
CHAPTER 4. SECURE LOGIC STYLE
4.1 Introduction
In order to remove dependency between power consumption and intermediate values of the
executed cryptographic algorithm, the cryptographic hardware can be implemented with secure
primitive logic cells such as Sense Amplified Based Logic (SABL) [Tiri et al. (2002)], Wave
Dynamic Differential Logic (WDDL) [Tiri and Verbauwhede (2005)] and t-private logic circuit
[Ishai et al. (2003)]. These secure logic style have the different method to make independent
power consumption of the performed operation and the processed data values. SABL and
WDDL consume equal amounts of power consumption in each clock cycle, but on the other
hand, t-private logic circuit randomizes amounts of power consumption in each clock cycle.
In other words, SABL and WDDL implement the hiding countermeasure and t-private logic
circuit implements the masking countermeasure.
Also, all these secure cells have robustness against side-channel attacks but only t-private
logic circuit prevents from the probing attack by which an adversary can observe only t-limited
number of internal nodes per each clock cycle. In a view of the design implementation, t-private
logic circuit and WDDL are implemented with the general CMOS digital cell library but each
SABL cell should be full-customized. The area of t-private logic circuit has the largest among
these secure logic style but the power consumption of t-private logic circuit is the smallest. Since
SABL and WDDL have two phase (the precharge phase and the evaluation phase) during each
clock cycle in which phase signals are switched, the power consumption of SABL and WDDL
has larger value than the power consumption of t-private logic circuit. Table 4.1 shows the
summary of these secure logic style.
Oklahoma State University (OSU) digital cell library based on the FreePDK45 techonalogy
52
library is exploited to implement the secure logic style. For the logic synthesis and physical
layout, commercial EDA tools such as Cadence’s tools and Synopsys’s tools are used. Our logic
cell library consists of implemented secure logic cells, OSU digital cells and FreePDK45 analog
cells. This cell library defines the cell function, area, delay and power dissipation as the liberty
file format.
This chapter is organized as follows. Section 4.2 presents sense amplifier based logic (SABL).
Section 4.3 describes wave differential dynamic logic (WDDL). t-private logic circuits are pre-
sented in Section 4.4. Section 4.5 presents implementation of secure logic style. Finally, Section
4.6 summaries this chapter.
4.2 Sense Amplifed Based Logic (SABL)
Sense Amplifier Based Logic has been introduced by Tiri et al. [Tiri et al. (2002), Mangard
et al. (2007)]. SABLs are specially designed to have a constant internal power consumption
independent of the proposed logic values. SABLs are implemented as dual-rail precharge logic
styles which means that each input is encoded as the pair of wires consisting of the original
signal and inverted signal and all logic signals alternate between precharge values and evaluated
values. In the precharge phase, the values of the complementary wires are set to the precharge
value. During the evaluation phase, the values on the complementary wires are set to (0, 1) or
(1, 0) according to the the processed data. Assuming that the precharge value is 0 and that the
half of the clock cycle corresponds to the evaluation phase, one complementary output should
Table 4.1: Secure logic style
SABL WDDL t-private logic
SCA resistance 3 3 3
Probing resistance 5 5 3
Method Hiding Hiding Random masking
Design Full custom Semi custom Semi custom
Area Medium Low High
Power Medium High Low
53
perform the transitions 0 → 1 → 0 during a clock clock and another complementary output
has no transition. This means that SABL always performs the same transitions at its outputs
during each clock cycle independent of its inputs.
Fig. 4.1 shows the transistor schematic of a generic n-type SABL cell. The n-type SABL
cell consists of the differential pull-down network (DPDN) which is made of NMOS transistors
and the cross-coupled inverters I1 and I2 of which the output is connected to the input of
another inverter and vice versa.
An n-type SABL cell is in the precharge phage when the clock signal is 0. During the
precharge phase, the PMOS transistors M3 and M4 are turned on and then all internal nodes
of an n-type SABL cell are set to 1. As a result, the inverters I3 and I4 produce a precharge
value of (0, 0) at the complementary outputs.
When the clock signal is 1, the n-type SABL cell is in the evaluation phase. During the
evaluation phase, the input signals in1, in1, . . . , inn, inn are set to complementary values. The
NMOS transistor M2 is turned on and the PMOS transistors M3 and M4 are turned off. Thus,
the nodes n3 and n4 of the DPDN are set to 0. One of the nodes n1 and n2 is connected to one
of the nodes n3 and n4, which is determined by the structure of the DPDN. If n1 is connected
to 0 via the DPDN, the inverter I1 is operational. Since the input signal n6 of the inverter I1
is still 1, the output signal n5 of the inverter I1 is switched to 0. The node n5 also works as
the input of the inverter I2 and thus the output signal n6 of the inverter I2 stays at 1. The
complementary outputs out and out are set to (1, 0). If n2 is connected to 0 via the DPDN, the
inveter I2 is working. By the signal 1 of n5, the node n6 is switched to 0. The complementary
outputs out and out are set to (0, 1).
In order for n-type SABL cells to consume constant power, the DPDN in the cell must
be satisfied with some requirements, and the internal structure of the cells must be balanced.
DPDN requirements :
1) Every internal node of the DPDN should be connected to one of the four output nodes
n1, n2, n3 or n4. Together with the NMOS pass-transistor M1, this structure ensures that all
internal nodes of the DPDN are discharged to 0 during the evaluation phase and charged to 1
during the precharge phase.
54
Figure 4.1: Schematic of a n-type SABL cell
2) Every possible conducting path in the DPDN should have the same resistance.
3) Both wire of every complementary input wire pair must be connected to the same number
of gate terminals of transistors with identical parameters. This ensures that the capacitance of
complementary inputs of SABL cells are pairwise balanced.
4.3 Wave Dynamic Differential Logic (WDDL)
Wave Dynamic Differential Logic has been also introduced by Tiri et al. [Tiri and Ver-
bauwhede (2004), Mangard et al. (2007)]. WDDL cells can be built based on general logic
cells in the standard cell library. The structure of WDDL cells is much simpler than that of
SABL cells. This leads in general to less complex and significantly smaller circuits. Another
advantage of WDDL cells is that they can also be realized on FPGAs.
Fig. 4.2 shows the schematic of a combinational WDDL cell. A combinational WDDL cell
55
Figure 4.2: Schematic of a combinational WDDL cell
basically consists of two circuits that realize the Boolean function f1 and f2 such that
f1(in1, . . . , inn) = out
f2(in1, . . . , inn) = out
f1(in1, . . . , inn) = f2(in1, . . . , inn),
where (in1, in1, . . . , inn, inn) are complementary input signals and (out, out) are complementary
output signals. These Boolean functions must be positive monotonic in order to achieve the
same transitions at output signals during each clock cycle for all possible transitions of input
signals. The positive monotonic Boolean functions mean that if any input signals change in a
direction 0→ 1 or 1→ 0, either out or out must be switched in the same direction.
Assuming that the precharge value is set to 0, in the precharge phase, all complementary
input signals are set to 0. Since any 1 → 0 transitions of input signals result in only one
1 → 0 transitions of an output signal, complementary output signals must be set to 0. In the
evaluation phase, all input signals are set to complementary values such as (0, 1) or (1, 0). A
0 → 1 transition at either out or out node must occur because of 0 → 1 transitions of input
signals. As a result, either out or out always changes like 0→ 1→ 0 and another output signal
stays at 0 during a clock cycle.
56
4.4 t-private Private Circuit
We assume that an adversary can observe only limited number of internal nodes per clock
cycle. In other words, this adversary has bandwidth limitations. This is the t-observation
limited, interactive adversary of Ishai et al. [Ishai et al. (2003)]. We adopt a variant of Agrawal
and Aggarwal [Agrawal and Aggarwal (2001)] who provide an entropy based definition of pri-
vacy.





Note that ΩX is the domain of X and x is a value in ΩX . This is the classical information
theoretic definition of entropy for a variable X viewed as a random variable.
If this variable X’s privacy were to be enhanced by applying a perturbing variable R, we
can capture conditional entropy of X as follows.
Definition 8. Conditional privacy of a single variable X perturbed by a variable R is defined




fX,R(x, r) log fX|R(x|r)dXdR.
The loss of privacy for X resulting from the exposure of R is the key definition of privacy
developed in Agrawal and Aggarwal [Agrawal and Aggarwal (2001)].
Definition 9. The privacy loss for variable X resulting from the exposure of a perturbing






Note that ifR is a random variable chosen independently fromX (as is the case in [Messerges
(2000)] and [Ishai et al. (2003)]), the privacy loss is 0 since h(X|R) = h(X). Now we can define
the notion of privacy as used in Ishai et al. [Ishai et al. (2003)] .
Definition 10 (t-private circuits:). A variable x is designed to be t-private if when perturbed
by k ≤ t variables rx1 , rx2 , . . . , rxk , the privacy loss for x resulting from the exposure of any
subset of up to t perturbing variables is 0.
Note that this definition of privacy insists on maintaining 0 correlation between the pro-
tected variable x and any subset of its perturbing variables. In these schemes, x is represented
by at least t + 1 physical variables, also known as its shares xs0 , xs1 , . . . , xst . In other words,
almost all the shares of x carry 0 information about x in these schemes. We call such privacy
schemes information isolating schemes or information isolating shares.
Messerges [Messerges (2000)] splits each variable x into two shares rx (a random bit) and
rx ⊕ x. He calls this scheme a masking scheme. He also introduces a similar arithmetic mask-
ing variant. Ishai et al. generalize this scheme to split x into t + 1 shares xs0 = rx1 , xs1 =
rx2 , . . . , xst−1 = rxt , xst = rx1 ⊕ rx2 ⊕ · · · ⊕ rxt ⊕x. They then provide a transformation for the
Boolean basis of a NOT gate and an AND gate where each operand is a t+ 1 bit value.
4.4.1 Ishai’s t-private circuit
Definition 11 (Input Encoder I). Each input xi is split into t + 1 shares: First, t random
binary values, rx1 , rx2 , . . . , rxt are chosen for xs0 , xs1 , . . . , xst−1 using t random-bit gates. And
then xst is encoded into x⊕ rx1 ⊕ rx2 ⊕ · · · ⊕ rxt. The circuit I computes the encoding of each
input bit independently in this way.
Definition 12 (Output Decoder O). Each output of a circuit has t + 1 bits, ys0 , ys1 , . . . , yst,
which are decoded into ys0 ⊕ ys1 ⊕ · · · ⊕ yst in order to obtain real output.
58
Definition 13 (t-private NOT circuit). Only a wire of split inputs, xs0 , xs1 , . . . , xst is con-
nected to a NOT gate.
Definition 14 (t-private AND circuit). Consider an AND gate with inputs a, b and ouput c.
Let input shares of a and b be ai, bi for 0 ≤ i ≤ t, respectively and output shares of c be ci for
0 ≤ i ≤ t. In the transformation of an AND gate, we first compute intermediate values zi,j for
i 6= j. For each 0 ≤ i < j ≤ t, zi,j is a random bit and zj,i is equal to (zi,j ⊕ aibj)⊕ ajbi. Now,
we compute the output bits c0, c1, · · · , ct as




Definition 15 (t-private OR circuit). Consider an OR gate with inputs a, b and ouput c. Let
input shares of a and b be ai, bi for 0 ≤ i ≤ t, respectively and output shares of c be ci for
0 ≤ i ≤ t. For one ai and one bj, these bits should be inverted. In the transformation of an
OR gate, we first compute intermediate values zi,j for i 6= j. For each 0 ≤ i < j ≤ t, zi,j is a
random bit and zj,i is equal to (zi,j⊕aibj)⊕ajbi. Now, we compute the output bits c0, c1, · · · , ct
as




where one ci is connected to a NOT gate.
Fig. 4.3 describes the Ishai’s t-private AND and OR circuit when t is 1. The area and
energy overhead is of the order of t2 [Tyagi (2005)]. We develop the following schema with
smaller overhead.
59
Figure 4.3: The Ishai’s t-private circuits (t = 1).
4.4.2 The modified t-private circuit
Theorem 7 (AND-XOR network with a random bit). Fig. 4.4 AND-XOR network with a









= Prz(i) = 0.5
for i, j ∈ {0, 1}, x ∈ {x1, x2, xi1}.
Theorem 8 (Expanded AND-XOR network). We can expand an AND-XOR network with a
random bit by XORing it with another AND gate. This expanded network can be expanded
60
Figure 4.4: An AND-XOR network with a random bit.
Figure 4.5: An expanded AND-XOR network.
continuously using the same structure. These expanded AND-XOR networks also have per-
fect secrecy for all inputs and any intermediate value. In other words, Prz|x(i|j) is equal to
Prz(i) for i, j ∈ {0, 1}.
We modified Ishai’s t-private circuit [Ishai et al. (2003)] into a simpler t-private circuit using
the expanded AND-XOR network. This also requires fewer random bits.
Definition 16 (Modified t-private AND circuit). i)When t is an odd number,






for i = 0, 1, . . . , t
ii) When t is an even number,






for i = 0, 1, . . . , t− 1
ct = (a0bt ⊕ zt mod t+2
2







Table 4.2: Comparison between t-private AND circuits
Modified t-private AND circuit Ishai’s t-private AND circuit
outputs (t = 2) c0 = (a0b0 ⊕ z0)⊕ a1b1 ⊕ a2b2 c0 = a0b0 ⊕ z0,1 ⊕ z0,2
c1 = (a0b1 ⊕ z1)⊕ a1b2 ⊕ a2b0 c1 = a1b1 ⊕ z1,2 ⊕ {(a0b1 ⊕ z0,1)⊕ a1b0}
c2 = (a0b2 ⊕ z0 ⊕ z1)⊕ a1b0 ⊕ a2b1 c2 = a2b2 ⊕ {(a0b2 ⊕ z0,2)⊕ a2b0} ⊕ {(a1b2 ⊕ z1,2)⊕ a2b1}
# of random bits d t+12 e = O(t) t(t+1)2 = O(t2)
# of XOR gates Ishai’s model has additional t(t+ 1)− 2d t+12 e XOR gates compared to modified t-private model.
where zj is a random bit.
Table 4.2 shows comparison between the modified t-private AND circuit and the Ishai’s
t-private AND circuit. The modified t-private circuit has smaller number of random bits and
XOR gates and almost the same delay.
4.5 Design of Secure logic style
4.5.1 Design of SABL-NAND
Fig. 4.6 shows the transistor schematic of a n-type SABL NAND gate using Virtuoso
schematic editor with NCSU FreePDK45 technology library [NCSU (2011)] . It consists of
the differential pull-down network (DPDN) and the cross-coupled inverters. The DPDN is
made of NMOS transistors. The DPDN satisfies all requirments. First, all internal nodes for
complementary inputs signals are connected to one of the four output nodes of the DPDN.
Second, every conducting path goes through two NMOS transistors and thus has the same re-
sistance since all NMOS transistors in the DPDN are equally sized. Third, all complementary
input wires are connected to the same number of transistors.
4.5.1.1 Simulation of SABL-NAND
We simulate the SABL NAND gate using Cadence Spectre. Fig. 4.12 shows the waveforms
of inputs, outputs and currents. One output of output signals is switched such as 0→ 1→ 0.
62
Figure 4.6: Schematic of SABL-NAND gate
The waveforms of currents for all possible inputs have the same form and power consumptions
are almost constant. Table 4.3 shows power consumption and peak current for all possible
inputs.
Table 4.3: Power consumption of SABL NAND (45 nm process)
Input a Input b Power consumption (nW) Peak Current (mA)
0 0 5757.87 0.2544
0 1 5752.88 0.2536
1 0 5760.58 0.2544
1 1 5753.98 0.2526
- Average(µ) 5756.33 0.2538
- Standard deviation(σ) 3.5524 0.0008
- σµ 0.0006 0.0034
63
Figure 4.8 Input a = 0, b = 0 Figure 4.9 Input a = 0, b = 1
Figure 4.10 Input a = 1, b = 0 Figure 4.11 Input a = 1, b = 1
Figure 4.12: Waveform of SABL NAND gate
4.5.2 Design of WDDL
Oklahoma State University digital cell library based on the FreePDK45 technology library
[OSU (2008)] is used to design WDDL cells. Fig. 4.13 shows the schematic of a WDDL-NAND
gate using Virtuoso schematic editor.
4.5.2.1 Simulation of WDDL-NAND
We simulate the WDDL-NAND gate using Cadence Spectre. Fig. 4.19 shows the waveforms
of inputs, outputs and currents. One output of output signals is switched such as 1→ 0→ 1.
The waveforms of currents for all possible inputs have the same form and power consumptions
are almost constant. Table 4.4 shows power consumption and peak current for all possible
64
inputs.
Table 4.4: Power consumption of WDDL NAND (45 nm process)
Input a Input b Power consumption (nW) Peak Current (mA)
0 0 5939.15 0.4086
0 1 5927.37 0.4065
1 0 5885.04 0.4286
1 1 5872.73 0.4137
- Average(µ) 5906.07 0.4144
- Standard deviation(σ) 32.15 0.0099
- σµ 0.0054 0.024
4.5.3 Design of t-private logic cells
Fig. 4.20 and 4.21 show the schematic of t = 1-private NAND circuit and t = 1-private AND
circuit using Virtuoso schematic editor with the OSU FreePDK45 cell library [OSU (2008)].
4.5.3.1 Simulation of t-private logic circuit
We simulate t = 1-private NAND circuit and t = 1-private AND circuit using Cadence
Spectre. Table 4.5 and 4.6 show power consumption and peak currents for all possible output
transitions of t = 1-private NAND and AND circuit, respectively.
Figure 4.13: Schematic of WDDL-NAND gate
65
Figure 4.15 Input a = 0, b = 0 Figure 4.16 Input a = 0, b = 1
Figure 4.17 Input a = 1, b = 0 Figure 4.18 Input a = 1, b = 1
Figure 4.19: Waveform of WDDL NAND gate
4.5.4 Comparison of t-private NAND, SABL-NAND and WDDL-NAND
Table 4.7 shows the mean (µ) and the standard deviation (σ) of power consumption for
all possible transitions of output signals, the average peak current, the number of PMOS
transistors and the number of NMOS transistors. The power consumption of t(= 1)-private
NAND circuit has the smallest values even though it consumes the largest area. Since SABL-
NAND and WDDL-NAND require the precharge phase and the evaluation phase during a clock
cycle in which phase transitions of input and output signals occur, the power consumption of
SABL-NAND and WDDL-NAND has larger values than that of t-private NAND circuit. But
the peak current of t-private NAND is the highest. Based on the normalized variance metric,
SCA vulnerability of SABL-NAND is the lowest. Note that only t-private logic circuit has
66
Figure 4.20: Schematic of NAND2X1t1
Figure 4.21: Schematic of AND2X1t1
67
Table 4.5: Power consumption of NAND2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4194.55 0.719 64
0→ 1 4173.27 0.745 192
1→ 0 4194.40 0.668 192
1→ 1 4178.08 0.701 576
Average(µ) 4185.08 0.709 -
Standard deviation(σ) 121.73 0.001 -
σ
µ 0.029 0.0014 -
Table 4.6: Power consumption of AND2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4209.30 0.717 576
0→ 1 4230.54 0.657 192
1→ 0 4207.52 0.725 192
1→ 1 4225.71 0.701 64
Average(µ) 4215.76 0.699 -
Standard deviation(σ) 77.15 0.0009 -
σ
µ 0.018 0.0013 -
robustness against both SCA attack and probing attack even though it has the largest area
and peak current. In order to prevent from both SCA attacks and probing attack, t-private
logic circuits should be utilized.
Table 4.7: Comparison of t-private NAND, SABL-NAND and WDDL-NAND
t-private NAND SABL-NAND WDDL-NAND
Average (µ) of Power (nW) 4185.08 5736.33 5906.07
Standard deviation (σ) of Power (nW) 121.73 3.55 32.15
Average of Peak current (mA) 0.709 0.2538 0.4144
σ
µ 0.029 0.0006 0.0054
Number of PMOS 36 6 6
Number of NMOS 36 12 6
4.5.5 SCA attacks of t-private logic circuit
In order to verify SCA vulnerability of t-private logic circuits, profiling SCA attacks are
performed. Simulation results of Cadence Specte are used for profiling (or training). LS-
SVM and QDA classifier recognize power traces as one of 4 classes which corresponds to 0 →
68
Table 4.8: Successful recognition rate of t-private circuits using LS-SVM and QDA classifiers
LS-SVM QDA
NAND2X1t1 19.26 % 31.44 %
AND2X1t1 24.85 % 25.09 %
OR2X1t1 14.81 % 15.52 %
NOR2X1t1 15.50 % 16.21 %
0, 0 → 1, 1 → 0 and 1 → 1 transition of the output signal. QDA classifier has the largest
successful recognition rate (31.44%) of t-private NAND circuit. This value is only 6 % larger
than randomly selected recognition rate which is equal to 25%. In other cases, successful
recognition rates are less than 25%. As a result, these t-private logic circuits are mostly secure
against SCA attacks.
4.6 Conclusion
In this chapter, SABL cells, WDDL cells and t-private logic cells have been implemented.
These secure logic styles are necessities for SCA robust hardware implementation. They are
included in the technology library. We verify SCA vulnerability of t-private logic circuits using
machine learning technique such as LS-SVM and QDA.
69
CHAPTER 5. FPGA IMPLEMENTATION AND ASIC
IMPLEMENTATION
5.1 Introduction
In this chapter, we propose the methodology of secure hardware implementation on two
different hardware, FPGA and ASIC. FPGA chip is made up of a finite number of configurable
logic blocks (CLBs) with programmable interconnects to implement a reconfigurable digital
circuit. The CLBs are the basic logic unit of an FPGA and made up of two basic components
: flip-flops and lookup tables (LUTs). On the other hand, ASIC is implemented with standard
cells which consist of digital logic gates such as AND, OR, NAND, INVERTER, XOR, flip-
flops, buffers and so on. All kinds of secure logic styles can be utilized on ASIC implementation
but secure logic styles to synthesize on FPGA are WDDL cells are t-private logic circuits.
We focus on t-private logic circuits for both FPGA and ASIC design. For more suitable and
efficient design on FPGA, t-private logic circuits are modified. The modified version is called
tail-recursive t-private circuits. In Section 5.2, the tail recursive t-private circuit is defined and
we deal with how to map the secure logic style on FPGA. Typical ASIC design flow requires the
standard cell library for logic synthesis, place & route, physical layout and timing verification.
t-private logic circuit as well as general digital logic cells should be included in the standard
cell library. In Section 5.3, we propose the method to build the secure logic cell library and to
implement secure ASIC design. Finally, Section 5.5 concludes this chapter.
70
5.2 FPGA Implementation
5.2.1 The tail recursive t-private circuit
Ishai [Ishai et al. (2003)] describes a transformation that is best applied in topological order
with input bits transformed first. An alternate way would be to apply the recursion at the
output node. This is what we call tail-recursive private circuits.
Consider a function f(x1, x2, . . . , xn) of n bits. If we wanted t-privacy, we will first determine
t random shares just as in Ishai’s schema. However, these random shares are at the granularity
of function truth tables. Hence we will generate f ri (x
0




2, . . . , x
t
2, . . . , x
0
n, . . . , x
t
n) as a
random truth table [tfi0 , t
fi
1 , . . . , t
fi
2nt−1] for i = 0, 1, . . . , t − 1. The (t + 1)st share would be
derived from the other random t shares so that ft(x
0




2, . . . , x
t
2, . . . , x
0
n, . . . , x
t
n) has the
truth table [tft0 , t
ft
1 , . . . , t
ft




j ⊕ tf1j ⊕ tf2j ⊕ · · · ⊕ tft−1j for 0 ≤ j ≤ 2nt − 1.
For the perfect secrecy, each function, f ri , ft should meet the following condition:
Prfi|xj (p|q) = Prfi(q) for p, q ∈ {0, 1}.
Definition 17 (The tail recursive t-private circuit). Let a original function with n inputs be




m, . . . , x
t−1
m , and an encoded bit,









where I = {0, 1, . . . , t− 1},
M ⊆ {1, 2, . . . , n},
1 ≤ |M | ≤ n.
Note that M is a random subset of {1, . . . , n} and f ri in (5.1) is a random function.
71
Proof. Since xim for i ∈ I is a random variable,
Prfri |xim(p|q) = Prfri (p) = 0.5, where p, q ∈ {0, 1}.
Thus, f ri has perfect secrecy for all inputs.
Let us verify whether ft has perfect secrecy. An n-variable Boolean function f can be expressed
in the following Canonical Reed-Muller expansion [Reed (1954)] of 2n terms:
f(x1, x2, . . . , xn) = a0 ⊕ a1x1 ⊕ a2x2 ⊕ · · · ⊕ a2n−1x1x2 · · ·xn,
where ai ∈ {0, 1}.
If we substitute ai(x
0
i ⊕ x1i ⊕ · · · ⊕ xti) for aixi, then
f(x1, . . . , xn) = a0 ⊕ a1(x01 ⊕ x11 ⊕ · · · ⊕ xt1)
⊕ a2(x02 ⊕ x12 ⊕ · · · ⊕ xt2)⊕ · · ·
















1 ⊕ a2x12 ⊕ · · · ⊕ anx1n
)





























1 ⊕ a2x12 ⊕ · · · ⊕ anx1n ⊕ f r1
)























= f r(x01, . . . , x
t−1
1 , . . . , x
0
n, . . . , x
t−1
n )⊕ f(x1, . . . , xn) (5.3)
= ft(x
0
1, . . . , x
t
1, . . . , x
0
n, . . . , x
t
n). (5.4)
Since Prfr|xim(p|q) = Prfr(p) = 0.5 and Prxm|xim(p|q) = Prxm(p) = 0.5 in (5.3),
Prft|xim(p|q) = Prft(p) = 0.5, p, q ∈ {0, 1} in (5.4).
Thus, ft also has perfect secrecy for all inputs, x
i
m.
5.2.2 Mapping into k-LUTs with unlimited number of inputs
FPGAs have k-LUT granularity truth tables built in their architecture. From the power
probing point of view each LUT is a black-box. This is because SRAMs precharge both bit and
bit lines for all bits. Exactly one bit-line discharges. Hence, the proposed tail-recursive private
circuits are ideally suited for FPGA architectures.
Each of the randomized truth tables can be mapped to its own LUT. Hence, all the t
function level shares are isolated. The key assumption is that the t probes an adversary can
73
Figure 5.1: Transformation into LUT-based t-private circuit
use do not go inside a truth table. If we also assume that k-LUTs has sufficiently many inputs
so that |xim| = nt ≤ k for m ∈ {1, .., n}, i ∈ {0, ..., t}, the number of k-LUTs increases to t
times the number needed for the original functions. With the LUT blackbox assumptions, we
get t-privacy at a somewhat lower area and delay cost.
Lemma 9. We assume that an adversary cannot probe internal nodes of LUTs and k ≥ nt.
The number of LUTs used increases linearly with t to achieve t-privacy. In order words,
the complexity of the LUT-based t-private circuits is O(t) and the depth of this circuit is O(1).
Fig. 5.1 shows a function f mapped into a k-LUT and then transformed into t+ 1 k-LUTs
in order to make it secure.
5.2.3 Mapping into k-LUTs with limited number of inputs
Most commercial FPGAs have from 4-LUT to 6-LUT granularity. With this choice for k,
most LUTs utilize all their inputs after technology mapping. Given this practical constraint
74
Figure 5.2: Full adder cell schemetic
on k-LUT granularity, our assumption should be changed to k < nt.
Lemma 10. We assume that an adversary cannot probe internal nodes of LUTs and k < nt.
The complexity of LUT-based t-privacy is O(t + logk t) and the depth of this circuit is
O(logk t).
5.2.4 Implementation of t-private full adder
We synthesized adders in the Ishai’s framework and in the LUT based tail-recursive model.
We used Xilinx ISE tools for the synthesis. The target device is Xilinx Virtex-5 FPGA
(XC5VFX70T-3FF1136). Fig. 5.2 shows a reference full adder. Fig. 5.3 shows a schematic
for the modified Ishai’s (t=1)-private full adder. Fig. 5.4 shows a chart comparing various
adder implementations with respect to the number of LUTs (n-bit ripple carry adder based
on Ishai’s model with t = 1, 2 and 3; and the tail-recursive LUT based model with t = 1, 2).
Fig. 5.5 shows the critical path delay for the same set of adders. The key point to note here is
that the tail-recursive design takes approximately 50% area of Ishai scheme for similar privacy.
The delay advantage of tail-recursive scheme is about 33%.
75
Figure 5.3: (t = 1)-private full adder cell schematic
Figure 5.4: LUT costs of various t-private adders
Figure 5.5: Delay costs of various t-private adders
76
5.3 ASIC Implementation
We introduce ASIC design implementation of t-private system using commercial EDA tools
such as Cadence’s tools and Synopsys’s tools.
5.3.1 t-private Logic synthesis
After general logic synthesis, modules with low SCA resistance are flagged by the graph
based SCA analysis. The flagged modules should be resynthesized at the logic level so that
KL divergence metric is zero or almost zero. We call this re-synthesis t-private logic synthesis
since t-private logic [Ishai et al. (2003)] will be employed. These t-private logic circuits have
SCA resistance, which means that the normalized standard deviation of t-private logic circuits
is almost zero. We will verify that these primitives of t-private logic synthesis have SCA
robustness at the physical layout level in the following subsection. If each gate is replaced with
the corresponding t-private logic circuit in such a way that AND gate is replaced with t-private
AND circuit, the area of the module increases significantly [Park and Tyagi (2012)]. In order
to reduce area, t-private XOR or NXOR are a better choice since these circuits have smaller




⊕x∗0x∗1 · · ·x∗n−1, (5.5)
where x∗i can be 1, xi or x
′
i and
∑⊕ represents the EXOR sum-of-products (ESOP) [Sasao and
Fujita (1996)]. The minterms and products of Eq. (5.5) can be replaced with t-private AND
and XOR circuits, respectively.
Lemma 11. If the boolean function y = f(x0, x1, . . . , xn−1) =
∑⊕x∗0x∗1 · · ·x∗n−1 can be synthe-
sized with t-private AND and XOR circuits, SCA effectiveness of the resulting combinational
circuit is zero.
Proof. First, consider the observability of a 2-input XOR gate c = a⊕ b.
Obc(a) = Obc(b) = Pr[fa ⊕ fa′ ] = 1
Obc(a, b) = Pr[(a⊕ b)⊕ (a′ ⊕ b′)] = 0.
77
Thus, Pc(a) is equal to Pc(b) and the SCA effectiveness is zero. The observability of xi at the
primary output y is 0.5 because the observability is the multiplication of the observability of
the input at a t-private AND circuit and all the observability of the input at an XOR gate :
0.5×1×· · ·×1. The effective capacitances Cy(xi) are almost equal. Consequently, Var[Py(xi)]
is zero. This means that the combinational circuit is robust against SCA attacks.
5.3.2 Design Flow
The design flow of the ASIC design of t-private system is shown in Fig. 5.6. All design
procedures except for t-private logic synthesis is the same as the general ASIC design process.
First, cryptographic system is designed at the behavioral level using HDL language such as
Verilog or VHDL.
Second, the behavioral design is transformed into technology dependent gate level by logic syn-
thesizer such as Candence’s RTL Compiler or Synopsys’s Design Compiler with the technology
library. We call this process logic synthesis. We use RTL Compiler and OSU standard cell
library based on NCSU FreePDK 45nm process as the logic synthesizer and the technology
library, respectively. The technology library has liberty file format and the file extension of .lib
which is the semiconductor industry’s most widely supported library standard. These exist no
difference with the general design flow until the second step.
The third process is to transform the vulnerable design based on the security metrics into t-
private logic design which has robustness against the tth order side-channel attacks. We call
this procedure t-private logic synthesis. This process is divided into two sub-steps. The first
sub-step is to change each general gate into matched t-private gate in such a way that AND
gate is changed into t-private AND circuit. It is performed automatically by Perl script. The
following step is to optimize t-private logics depending on the time constraint using RTL Com-
piler. For this logic synthesis, we use our technology library including t-private logic cells such
as AND2X1t1, NAND2X1t2, XOR2X1t1 and so on. The name of the t-private logic
cells represents operation function, the number of inputs, drive strength and t parameter in
sequential order. For example, AND2X1t1 means that this cell is X1 2-input AND with t = 1.
78
After the t-private logic synthesis, the structural Verilog file consisting of t-private logic cells
is generated.
The back-end design starts from the fourth process for the final physical layout. We use the
SOC Encouter tool from Cadence for the floorplan, place and route. The required files are our
technical library file (.lib), cell abstract information file (.lef), the structural Verilog (.v) and
delay constraint information file (.sdc), which the last two files are outputs of the previous
process. The generated layout should pass DRC and LVS and is saved as the gds file format
(.gds).
Finally, we should verify whether our implementation has security against the t−th order side-
channel attacks or not based on power simulation using Spectre analog simulator from Cadence.
5.3.3 Technology Library
In order to perform t-private logic synthesis and physical layout, our technology library
should be required. The technology library defines the cell function, area, delay and power
dissipation of each t-private logic cell. The cell definition of AND2X1t1 as liberty file format
is show in Listing 5.1. To generate our technology library, several steps should be required as
the following:
1) Draw the schematic of each t-private logic cell using Virtuoso schematic editor like Fig. 5.8.
2) Make a structural Verilog file based on the schematic like Fig. 5.15.
3) Synthesize the t-private logic cell using RTL Compiler like Fig. 5.10.
4) Generate a layout of the t-private logic cell using SOC encounter like Fig. 5.11.
5) Check DRC and LVS.
6) Extract timing and power characteristics of the t-private logic cell using Spectre Analog En-
vironment.
Since t-private logic cells are made of gates of OSU standard digital cell library, OSU stan-
dard cell library is used for logic synthesis and layout. After Step 3, generated Verilog may
be different from the structural Verilog at Step 2. Power and area can be estimated after
logic synthesis. Table 5.1 shows area, power and delay time estimation of 5 (t = 1)-private
79
Figure 5.6: The design flow of the ASIC implementation
80
logic cells. We generate layouts and liberty descriptions of basic 8 (t = 1)-private logic
cells (AND2X1t1, NAND2X1t1, OR2X1t1, NOR2X1t1, XOR2X1t1, XNOR2X1t1,
BUFX2t1, INVX2t1) through the above method.
Table 5.1: Area, power and delay estimation of each t-private logic cell after logic synthesis
cell Area Leakage Power (nW) Dynamic Power (nW) Delay (ps)
NAND2X1t1 31 1.19 4185.08 55
AND2X1t1 31 1.19 4112.97 55
NOR2X1t1 32 1.052 4418.91 66
OR2X1t1 32 1.052 4407.66 66
XNOR2X1t1 10 0.36 4433.56 14
XOR2X1t1 10 0.361 4439.20 13
5.3.4 Verification of robustness
After finishing layout of basic t-private logics, we also verify the robustness against power
analysis attacks. For the verification, we measured the power and current of logic cells using
Spectre Analog Environment with the analog extracted view of the cell which includes all par-
asitic capacitances. The power consumption of logic gates in general standard cell libraries
depends on transitions of the output. For example, the power consumption of NAND2X1
of OSU standard cells varies according to how the output is changed. When transition of the
output occurs, power of the supply is dissipated significantly compared to the power consump-
tion in case of no transition. It also has difference between the transition from 0 to 1 and
the transition from 1 to 0. This NAND2X1 does not have robustness against power analysis
attacks since the power consumption depends on processed data.
Basic t-private logic cells are simulated for all possible input pattern and the corresponding
power and peak current were measured in each case. Two input t-private logics except for XOR
and XNOR has 42(t+1)+r possible input patterns where r is equal to dt+1e2 and the number of
required random bits for perfect secrecy of internal nodes. Since t-private XOR and XNOR
does not require additional random bits for the perfect secrecy, the number of all possible input
pattern is 42(t+1). The measured powers and peak currents were classified according to the
81
Listing 5.1: A sample example liberty description of AND2X1t1
c e l l (AND2X1t1) {
area : 3168 ;
c e l l l e a k a g e p o w e r : 1 . 1 9 ;
pin (A0) {
d i r e c t i o n : input ;
capac i tance : 0 . 021674 ;
r i s e c a p a c i t a n c e : 0 . 021579 ;







d i r e c t i o n : output ;
capac i tance : 0 ;
r i s e c a p a c i t a n c e : 0 ;
f a l l c a p a c i t a n c e : 0 ;
max capacitance : 0 . 924889 ;
func t i on : ”(A0∗B0 ˆ R ˆ A1∗B1 ) ” ;
t iming ( ) {
r e l a t e d p i n : ”A0” ;
t im ing s en s e : p o s i t i v e u n a t e ;
c e l l r i s e ( de lay template 5x5 ) {
index 1 ( ” 0 . 0 5 , 0 . 1 , 0 . 2 , 0 . 6 , 1 . 2 ” ) ;
index 2 ( ” 0 . 0 6 , 0 . 18 , 0 . 42 , 0 . 6 , 1 . 2 ” ) ;
va lue s ( \
. . .
}
r i s e t r a n s i t i o n ( de lay template 5x5 ) {
. . .
}
c e l l f a l l ( de lay template 5x5 ) {
. . .
}




t iming ( ) {
r e l a t e d p i n : ”A1” ;
. . .
}
i n t e rna l power ( ) {
r e l a t e d p i n : ”A0” ;
r i s e powe r ( energy template 5x5 ) {
. . .
}







Figure 5.8 Schematic of AND2X1t1
Figure 5.9 Verilog description of AND2X1t1
Figure 5.10 Synthesized logic design
Figure 5.11 Layout of AND2X1t1
Figure 5.12: The steps to create AND2X1t1
83
Figure 5.14 Peak currents of NAND2X1t1 Figure 5.15 Powers of NAND2X1t1
Figure 5.16: Distribution of powers and peak currents of NAND2X1t1
output transition (0 → 0, 0 → 1, 1 → 0 and 1 → 1) and the powers and peak currents in each
group were averaged. If there is no dependency of power consumption on the input pattern,
the logic gate has resistance against power analysis attacks. In other words, if it is difficult to
distinguish averaged powers and peak powers of each group, the logic gate is robust. Table 5.2
shows the averaged power consumption, peak current and the number of cases of NAND2X1t1
in each group according to output transition. The powers and peak currents are almost equal
so that it is difficult to distinguish. We utilize the ratio of standard deviation(σ) to average(µ)
called the coefficient of variation in order to quantify the dependency or robustness. The
larger the value the larger dependency on output transition(or input pattern) or the smaller
robustness against power analysis attacks. The coefficient of variation of NAND2X1t1 is
too smaller than the coefficient of variation of NAND2X1. Fig. 5.16 shows the distribution
of power consumptions and peak currents of NAND2X1t1. Table 5.3 5.4 5.5 5.6 5.7 show
power consumption and peak current of AND2X1t1,NOR2X1t1,OR2X1t1,XOR2X1t1
and XNOR2X1t1, respectively.
84
Table 5.2: Power consumption of NAND2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4194.55 0.719 64
0→ 1 4173.27 0.745 192
1→ 0 4194.40 0.668 192
1→ 1 4178.08 0.701 576
Average(µ) 4185.08 0.709 -
Standard deviation(σ) 121.73 0.001 -
σ
µ 0.029 0.0014 -
Table 5.3: Power consumption of AND2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4209.30 0.717 576
0→ 1 4230.54 0.657 192
1→ 0 4207.52 0.725 192
1→ 1 4225.71 0.701 64
Average(µ) 4215.76 0.699 -
Standard deviation(σ) 77.15 0.0009 -
σ
µ 0.018 0.0013 -
5.4 Example : SBOX design
We implemented the AES S-Box through our proposed SCA-secure design methodology for
a preliminary validation. The AES S-Box operation of the AES encryption or decryption in
the first round or last round is especially vulnerable to DPA attacks [Mangard et al. (2005),
Prouff and Rivain (2007)]. The vulnerable AES S-Box should be synthesized with t-private
primitives into a secure layout with our design flow. As a baseline, insecure AES S-Box based
Figure 5.17: Layout of the secure AES S-Box
85
Table 5.4: Power consumptions of NOR2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4807.01 0.711 576
0→ 1 4836.72 0.709 192
1→ 0 4786.24 0.699 192
1→ 1 4864.25 0.712 64
Average(µ) 4823.55 0.708 -
Standard deviation(σ) 34.13 0.0059 -
σ
µ 0.007 0.008 -
Table 5.5: Power consumption of OR2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 4894.25 0.703 64
0→ 1 4786.24 0.711 192
1→ 0 4836.72 0.698 192
1→ 1 4807.01 0.722 576
Average(µ) 4831.06 0.709 -
Standard deviation(σ) 46.95 0.104 -
σ
µ 0.009 0.014 -
on composite finite field proposed by Satoh el al. [Satoh et al. (2001)] is implemented. It is
re-synthesized with t-private re-synthesis using RTL Compiler. After t-private synthesis, the
cell area and critical path delay are compared to the reference baseline design. The cell area
increases by a factor 5.77 and and delay goes up by a factor 1.69 as compared to the reference
design. The result of the layout shows that the die size of the secure S-Box is 4.37 times larger.
But the DPA security metric (σ/µ) is reduced by 59% and it has robustness against the first
order probing attack. Table 5.8 shows the comparison of the secure and insecure S-box designs.
Fig. 5.17 shows the layout of the secure AES S-Box.
5.5 Conclusion
In this chapter, SCA resistant hardware implementation for FPGA and ASIC design has
been proposed using t-private logic circuits. The standard cell library including t-private logic
circuits can be used for logic synthesis, place & route and physical layout. Vulnerable modules
to be flagged by SCA security metrics should be re-synthesized with t-private logic cells. After
86
Table 5.6: Power consumption of XOR2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 1078.51 0.358 64
0→ 1 1077.23 0.339 64
1→ 0 1076.82 0.327 64
1→ 1 1077.46 0.379 64
Average(µ) 1077.51 0.351 -
Standard deviation(σ) 0.72 0.023 -
σ
µ 0.0007 0.065 -
Table 5.7: Power consumption of XNOR2X1t1 (45 nm process)
Transition of output Power consumption (nW) Peak Current (mA) Number of Transitions
0→ 0 997.12 0.388 64
0→ 1 997.14 0.375 64
1→ 0 997.62 0.328 64
1→ 1 998.33 0.370 64
Average(µ) 997.55 0.365 -
Standard deviation(σ) 0.567 0.026 -
σ
µ 0.0005 0.071 -
the physical layout, SCA vulnerability of the hardware implementation can be verified by
security metrics and simulating attacks.
Table 5.8: Comparison of insecure and secure S-Box
cell area(µm2) delay(ns) σ/µ
insecure 332.23 0.427 0.48
secure 1919.96 0.723 0.07
87
CHAPTER 6. t-PRIVATE SYSTEMS: UNIFIED PRIVATE MEMORIES
AND COMPUTATION
6.1 Introduction
The goal of countermeasures against side channel attacks is to significantly reduce or remove
the correlation between side channel leakage and the data or state processed by the compu-
tational system. A representative approach to counteract side channel attacks is to mask
intermediate values with randomized bits at the gate level. Ishai et al. [Ishai et al. (2003)]
proposed t-private circuit using such a masking method. They assume that an adversary can
probe or observe up to t nodes in the circuit. Their assumption is that the adversary is perfect,
and hence able to probe the circuit state of the logic with 100% certainty. The Ishai’s t-private
circuits need at least t random bits to ensure zero correlation between t probed nodes each
clock cycle. This makes information loss to the adversary equal to 0.
t-private logic only targets the privacy of computation. However, cryptographic systems
also include some memory, particularly, memories that hold private keys which are typically
Read Only Memory (ROM). Many secret keys associated with a cryptographic system are
stored in ROMs. For instance, hundreds of 1024-bit RSA private keys are not uncommon for a
Trusted Platform Module (TPM) [Group (2013)]. ROMs are especially vulnerable to t-probing
adversary of Ishai since their state does not change over time unlike computation. Moreover,
these keys in memory can be targeted directly by physical attacks [Samyde et al. (2002)]. The
adversary with physical access to the secret key part of the chip can succeed even if power has
been turned off. The physical access based attacks slice the silicon until individual transistors
are exposed by a Focused Ion Beam (FIB). An electron microscope is used to examine the
silicon. Halderman et al. [Halderman et al. (2008)] proposed “cold-boot attack” which is a
88
method to extract a significant fraction of data stored in a powered-off memory (e. g. DRAM)
by cooling the chip to around −50◦C. Valamehr et al. [Valamehr et al. (2012)] developed
several masking methods to prevent such memory attacks. The simplest of them is Ishai’s
[Ishai et al. (2003)] t-private coding applied to memory resident data. The key idea is that
the secret key (xi) does not need to be stored in the memory in its original form. Instead, a
t+1-tuple [r1, r2, . . . , rt, xi⊕r1⊕· · ·⊕rt] is stored. We call this memory masking with t random
bits a t-private memory. An adversary must learn all the t random bits and the encoded bit in
order to reveal even a single bit of the secret key. The adversary attack model for ROM is based
on the persistent physical access attack - not the transient probing attack for computational
logic. The memory attack has statistical observation limitations. Therefore, Valamehr et al.
[Valamehr et al. (2012)] assume that it succeeds only with probability p for each bit. Unlike
Ishai’s perfect secrecy analysis model, they define the success probability Psucc of this memory
attack as a new figure of merit. It captures the event that at least one bit of the secret key has
been learned. Even though a successful outcome of Psucc event does not break a cryptographic
system, the possible key space can be reduced considerably when other side channel attacks
are combined.
Practical computing systems consist of both memory and computational logic components.
In order to build a t-private system, we need both a t-private memory and t-private logic that
integrate seamlessly. Ishai’s t-private scheme is not the most efficient one when applied to mem-
ory protection. Most of Valamehr’s memory protection schemes [Valamehr et al. (2012)] are
not computable in the sense that a computational logic schema does not exist within the coded
domain (unlike Ishai scheme). These stored coded keys have to be decoded first before being
used for computation, hence exposing them to probing attacks. This is a big weakness. In this
paper, we develop a unified computable coding scheme applicable to both memory and compu-
tation logic. This scheme is more efficient than Valamehr’s schemes in their memory analysis
framework. It also shows zero information loss in the Ishai’s analysis framework. We believe
that our proposed coding scheme is an ideal candidate to build t-private systems unifying the
memory and computing logic. In summary, this chapter makes the following contributions:
89
1) We analyze the storage overhead and the success probability (Psucc) of various t-private
memory schemas within a unified framework that is easier to understand than Valamehr’s.
However, it may overestimate Psucc. We also quantify and describe a trade-off between these
two attributes – storage overhead and Psucc.
2) We introduce a new notion of computable encoding method for t-private memories to cap-
ture the schemes which can compute with the encoded keys using a complementary t-private
logic. We also propose a new, computable, t-private, inspection resistant memory with a
corresponding computable encoding method. This new approach requires new t-private logic
combinational gates which are more efficient than Ishai’s [Ishai et al. (2003)] t-private circuits
in their use of random bits without any loss of privacy.
3) We propose new combinational logic circuits suitable for our new memory scheme.
We define our adversary model and the notation (variables/parameters used) in Section 6.2.
Our new more general analysis of t-private memories is presented in Section 6.3. Section 6.4
develops our proposed t-private memory scheme. Logic schema for our proposed memory is
presented in Section 6.5. Hardware implementation results are presented in Section 6.6. Finally,
Section 6.7 concludes the paper.
6.2 Assumptions and Notation
We assume that the memory leaks information in contrast to Micali’s paper [Micali and
Reyzin (2003)] in which they assume that only computation leaks information. An adversary
conducts experiments to reveal the bits stored in the memory with a measurement apparatus.
Let L be the leakage function selected by an adversary. The value of leakage of any bit xi
in the memory M is converted to the finite field GF (2) based on the ability of an adversary:
f : L(xi)→ {0, 1} for xi ∈M.
We assume that an adversary has limited capability to learn any memory resident bit exactly
due to noisy measurement apparatus. Hence, we define the limited leakage probability of a bit
as Pr[f(L(xi)) = xi] = p ∀xi ∈M.
90
Table 6.1: Variables used in this chapter
k key length
p leakage probability for 1 bit
Psucc probability of successful attack
ri random bit
xi one-bit secret key
t the number of random bits
tp the number of probing nodes per clock cycle
n the number of keys
c the number of bits to be stored per key
T random bit matrix
Tij the ith row and jth column element of T
~a = [a1, . . . , at] a binary vector
x¯ complement of x
∧ bit-wise AND operation
This p is the characteristic of the memory (encoding) schema. If adversary’s target is
computational circuit C, our assumption is the same as Ishai’s adversary model [Ishai et al.
(2003)]. In other words, an adversary can probe tp nodes every cycle: Pr[f(L(yi)) = yi] =
1 ∀yi ∈ Y, Y ⊂ C, |Y | = tp.
A memory attack is a set of such experiments that are possibly adaptively controlled. We
assume that the goal of a memory attack is to reveal at least one bit in the memory with
probability 1. Success probability of a memory attack captures this goal.
Definition 18 (success probability). We define the success probability Psucc of a memory attack
as the probability that at least one bit of the original secret key has been revealed.
Memory may store multiple keys with the same key length k. The parameters/variables of
the memory schema, adversary experiments, and memory attacks are defined in Table 6.1. If
not otherwise stated, these variables hold for the rest of the chapter.
91
6.3 t-Private Memory: Schemas, Architecture, and Analysis
The k raw bits of a key [xk, xk−1, . . . , x1] can be stored in memory in many ways. The
t-privacy schemes could conceivably be transistor level schemes. However, encoding schemes
applied at the write-port of a memory are more obvious and effective. A memory schema is a
pair of encoding & decoding functions for memory. The base case is to do nothing - just store
and retrieve the raw bits - with a schema of the identity function. All the following memory
schemas except for t-private system are from Valamehr et al. [Valamehr et al. (2012)]. The
unified analysis is ours.
A bit xi of the secret key can be hidden by creating t+1 random shares using t random bits
[r1, r2, . . . , rt, xi⊕ r1⊕ r2⊕ · · · ⊕ rt] where ri’s are random bits. The t random bits constitute t
shares. The (t+ 1)st share is derived by an XOR of the t random bits and the original bit xi.
The easiest memory architecture for the secrecy is to store all the t+ 1 share bits of a raw
bit of the secret key. Therefore the total number of stored bits for a secret key of length k is
k(t+ 1). In this schema, each key bit uses a different set of t random bits. The set of random
bits can be re-used or shared between various key bits. Depending on this reuse and sharing
of random bits, the storage overhead and the success probability of the memory attack can
vary. There are four memory schemes in [Valamehr et al. (2012)] which will be analyzed in
this section (all except the dynamic matrix scheme using hash function). Fig. 6.6 shows these
architectural memory schemes.
6.3.1 Original memory scheme without secrecy
Original memory refers to raw memory without any protection against memory attacks.
The total number of bits stored for the n secret keys with key length k is nk. This value is
the storage reference/baseline. We define the storage overhead as the ratio of the number of
bits used for the secret keys storage to the storage reference. The success probability Psucc of
memory attacks is 1− (1− p)k, where (1− p)k is the probability of the adversary experiments
failing on all of the k key bits.
92
Figure 6.2 The original memory scheme Figure 6.3 The t-private memory scheme
Figure 6.4 The t-private memory scheme
with a random matrix Figure 6.5 The hybrid memory scheme
Figure 6.6: 4 architectural memory schemes
6.3.2 t-private memory scheme
Each bit xi of the secret key is represented by t random bits and the encoded bit ei =
xi ⊕ r1 ⊕ . . . ⊕ rt which are stored in the memory. Each key bit uses its own set of t random
bits. Total number of bits stored for n secret keys is cn = (t+ 1)k ·n and therefore the storage
overhead is t+ 1. The success probability is
Psucc = 1− (1− p′)k (6.1)
where p′ = pt+1, which is the probability that an adversary learns t random bits and the
encoded bit to reveal xi. p
′ is less than p since 0 ≤ p ≤ 1. As noted earlier, this scheme mirrors
the t-private circuits introduced in Ishai et al. [Ishai et al. (2003)].
6.3.3 t-private memory scheme using a random matrix T
The straightforward t-private memory requires t random bits per key bit. This may be
an unreasonably large random bit overhead. This scheme attempts to reduce the number of
93
random bits needed for the entire schema. Randomly selected ti random bits Ri = {rj |rj ∈
R, |Ri| = ti} from a set of t random bits R = {r1, r2, . . . , rt} per key bit are used to encode




. The position/index j
of randomly selected ti random bits are stored in a fixed t× k random matrix T. For example,
if r1, r2, r5 are randomly selected for encoding x1, the first column T1 of the random matrix T
is [1, 1, 0, 0, 1, 0, ...]T . The random matrix T is used for decoding xi = ei ⊕
[⊕t
j=1 rj · Tji
]
. In
this case, c is t+ k and total number of bits stored for n secret keys including a t× k random
matrix table is equal to (t+ k)n+ tk. The storage overhead is
(t+ k)n+ tk
nk









In order to reveal a single secret key-bit xi, all of the t random bits and the ith column Ti of
the random matrix T should be required:




 , where rj ∈ R, Tji ∈ Ti.
The failing cases of our memory attack scenario are divided into two cases. The first case is
that an adversary does not know all the random bits. The second case corresponds to the case
that an adversary does not know the ith column of the random matrix T even though all the
random bits are known. Note that we assume that the leakage probability of the matrix T’s
random bit is also p, which is independently distributed. Thus, the failure probability Pfail of
this attack is equal to the sum of the probabilities of two cases . The success probability Psucc
is given by the following equations:
Psucc = 1− Pfail = 1− { 1− pt︸ ︷︷ ︸
the first case’s probability
+ pt(1− pt+1)k︸ ︷︷ ︸
the second case’s probability
}
= pt{1− (1− pt+1)k}. (6.2)
Compared with Eq (6.1), the success probability of the t-private memory scheme using a random
matrix is pt factor less than the success probability of the t-private scheme for the same t.
94
6.3.4 Hybrid memory scheme
The hybrid scheme is a combination of t-private memory scheme and t-private memory
scheme using a fixed random matrix. This scheme is devised in [Valamehr et al. (2012)] in
order to minimize psucc per random bit. Intuitively, it uses a few of the t bits to reduce p with
the classical t-private scheme. The rest of the t private bits are used in a random matrix schema.
The details of the hybrid schema and analysis in [Valamehr et al. (2012)] are ambiguous. In
the following, we have chosen a version of many possible designs for the hybrid schema.
The number of random bits ti to encode each secret key bit xi with the t-private scheme is a
parameter individualized to each xi. We let the set of the random bits be R
′
i = {ri1, ri2, . . . , riti}.
Another set of random bits per secret key R = {r1, r2, . . . , rt} is required for the encoding
method with a t× k random matrix T. Each secret key bit xi can be encoded by the following
equation:




 for 1 ≤ i ≤ k
where Ri is a randomly selected subset of R = {r′1, . . . , r′t}.























The failing cases for an adversary are also divided into two cases as in the t-private scheme
using a random matrix. The first case is that an adversary does not know all of the t random
bits {r1, r2, . . . , rt} to encode with the random matrix. The second case is that an adversary
does not know the ith column of the random matrix T and all ti random bits for the t-private
encoding even though (conditioned on) all the random bits {r1, r2, . . . , rt} are known. The
success probability Psucc is
Psucc = 1− Pfail = 1− { 1− pt︸ ︷︷ ︸




(1− pti+t+1)︸ ︷︷ ︸











The t-private memory scheme with a random matrix is the special case of this hybrid memory
scheme when all ti for 1 ≤ i ≤ k is zero. Compared to the t-private memory scheme with a
random matrix when both t is equal and all ti’s are the same, the success probability of the
hybrid scheme decreases slightly since pt+1 in Eq. (6.2) is larger than pti+t+1 Eq. (6.3). But
the storage overhead increases by ti.
6.3.5 Comparison
Table 6.2 shows the storage overhead and the success probability of the 4 architectural
schemes. We assume that the key length k is 128 bits and the number of secret keys n is 10
and the leakage probability of each bit p is 0.9. Fig. 6.10 shows the storage overhead and the
success probability of the t-private scheme, the t-private scheme with a random matrix and the
hybrid memory scheme with ti = 10 parametrized by the number of random bits t. Compared
to the t-private memory scheme with a random matrix, the hybrid memory scheme does not
have any advantage since the storage overhead is larger without a significant reduction in the
success probability. In the following sections, our proposed memory scheme will be compared
to the t-private memory scheme with a random matrix.
6.4 New Approach
Note that all the encoding schemes in Section 6.3 except for the classical t-private memory
scheme require the stored keys to be decoded before they can be used in a cryptographic
computation (such as AES encryption). A more secure and private system can be designed
if the computation with the key is also implemented as private logic (along the lines of Ishai
Table 6.2: The storage overhead and the success probability of the 4 architectural schemes
Original t-private t-private with T Hybrid





















Figure 6.8 The success probability Figure 6.9 The storage overhead
Figure 6.10: Comparison between t-private scheme, t-private scheme with a random matrix
and the hybrid scheme when p = 0.9, k = 128, n = 10, ti = 10
Figure 6.11: t-Private: (Left) Encoding; (Right) Decoding
97
scheme [Ishai et al. (2003)]). A memory encoding scheme that does not require the key to be
decoded so that the key can participate in a computation implemented with private logic is
called a computable encoding or schema. In such cases, a private logic family consistent with
the memory encoding must exist. In a memory schema that is not computable, the decoded
key can be attacked dynamically in flight. The only attacks that a non-computable memory
schema prevents against are static memory attacks such as chip slicing based observation of
transistor fatigue.
t-private encoding is obviously a computable schema. The t-private storage can be used
directly in the t-private encryption/decryption implementation without additional decoding.
Hence, the t-private memory scheme should be selected in order to prevent the adversary from
attacking the raw key at the decoding step even though it does not have the best success
probability and storage overhead tradeoff.
Basic Encoding Scheme: t-private implementations require many random bits - they
do not share/reuse random bits (unlike the random matrix schema). They pose a t2 factor
area overhead and a factor t delay overhead. Our goal was to come up with a computable
version of random matrix method. Alternately, we need a scheme that reuses random bits
in a t-private logic implementation. We propose the computable and t-private encoding with
these properties. We could use addition like invertible function with the t-private masking
method to reduce the success probability. Note that such a function is not commutative in the
bits of its operand. In other words, unlike the t random bits in Ishai’s t-privacy schema, the
order of these bits within the coding operand matters. Each ordering of t random bits gives
a different seed and hence a different encoding. This allows any permutation of t random bits
to give a different random seed from the encoding perspective. This results in a possibility of
t!/(a!b!) ≈ t!/((t/2)! ∗ (t/2)!) reuses of t random bits, where a is the number of 1’s and b is the
number of 0’s of the t random bits.
Fig. 6.11 shows the basic idea. We add two t + 1-bit words for encoding. One operand is
derived by concatenating the bit to be encoded x with t random bits rt, rt−1, . . . r1. This word
is added to another random constant c (either one c per chip or one c per x). Note that different
98
permutations of the t random bits rit , rit−1 , . . . ri1 lead to different encoded result when added
to c. Decoding consists of simply subtracting c from the encoded word et+1et . . . e1. The most
significant bit of the decoded word is x.
Refined Encoding Schema: The basic encoding schema has some flaws that expose the
bit x when forming complex entangling gates such as AND and OR as discussed in Section 6.5.
In order to fix that, instead of x at the MSB of arithmetic word with random bits, we use the
Ishai code x⊕ rt ⊕ · · · ⊕ r1.
We define the computable and t-private encoding for xi (bit to be coded) as follows:
~ei = Encode(xi) = [xi ⊕ rit ⊕ rit−1 ⊕ · · · ⊕ ri1, ~ri] + ~ci




t−1, . . . , ri1] and constant bits [cit+1, cit,
. . . , ci1] respectively. Note that this schema uses a constant word per xi. We form an arithmetic
word comprising of t random bits and xi. By placing xi at the most significant end we allow all
the t random bits to effect its encoding. A simpler encoding would have added [xi, rt, . . . , r1]
to a constant vector per chip or per computation session. Note that since the constant vector
~c is constant over longer periods - entire computation session, entire boot-up phase, to be
conservative, it may not contribute to the entropy of encoding. We must assume that the
adversary knows such a persistent ~c.
The decoding can then be done as follows:
~di = Decode(~ei) = ~ei − ~ci = [xi ⊕ rit ⊕ · · · ⊕ ri1, rit, . . . , ri1].
Most significant bit of ~di is xi⊕rit⊕· · ·⊕ri1. The decoded vector ~di can be directly connected to t-
private encryption/decryption logic. This computable and t-private encoding method does not
reveal the original key bit after this decoding process. Algorithm 2 represents our computable
t-private encoding/decoding method. Note that this algorithm creates all m reuses of each bit
within the encoding of the same key. Such a localized reuse may not be optimal in practice. It
is presented in the algorithm for its simplicity. In practice, for CAD, we will likely incorporate
global randomized reuse. Also note that we have used a random instance of a permutation of t
99
bits pir picked uniformly from t! space. pir(i) = j maps the ith bit position to jth bit position.
Fig. 6.12 shows our proposed computable and t-private memory scheme.





Algorithm 2 Computable t-private memory encoding/decoding scheme
Encoding
Input : A k-bit secret key ~x = [xk, xk−1, . . . , xi, . . . , x1]; g = dk/me distinct t-bit ran-
dom vectors ~r0 = [r0t , r
0




t−1 , . . . , r
g−1
1 ];
constant vector (per chip or per computation session) ~c = [ct+1, ct, . . . , c1]
Output : Encoded secret key bit vectors, ~ei for i = 1, 2, . . . , k such that e(~x) = ~ek ~ek−1 . . . ~e1
for i = 1→ k do
j ← k % g
Key bit xi is XORed with the t random bits in jth random vector : yi = xi⊕ rjt ⊕ rjt−1⊕
· · · ⊕ rj1






















Input : Encoded secret key vectors, ~ei for i = 1, 2, . . . , k; constant vector ~c
Output : Decoded secret key vectors, ~di = [yi, rt, . . . , r1] for i = 1, 2, . . . , k
for i = 1→ k do




t, . . . , e
i





t−1, . . . , r
j
1] for j = k % g
end for
Constant vector ~c storage/routing: The constant vector ~ci need not to be stored in
memory. Its lifetime is only from the producer gate to the consumer gate. It can be hardwired in
the routing of wires from the producer gate to the consumer gate. For a per chip or per session
constant ~c, similar hardwiring will work with a bootup or session-startup initialization step.
For a random choice of ~ci per xi, we assume that the adversary learns each bit with probability
0.5 randomly. This requires the adversary to conduct all possible 2t+1 ~ci experiments to reveal
100
Figure 6.12: The proposed memory scheme
a key bit. The success probability Psucc then is
1
2t+1
(1− (1− pt+1)k). (6.4)
However, since the goal of this paper is to save on random bits, henceforth in this paper, we
assume that ~c is a constant per chip or per computation session. Furthermore, the adversary
knows ~c. Hence we cannot use the entropy of ~c in our security analysis.
(1− (1− pt+1)k). (6.5)
If we assume instead the memory attack model with probability p to reveal each bit of ~ci
then the success probability is Psucc = p
t+1× (1− (1−pt+1)k). Similarly, if we assume that the
constant vector is fixed for the chip design or for each boot-up session, we give the benefit of
doubt to the adversary leading to Psucc = (1− (1− pt+1)k). Effectively, this gives us two types
of t-private systems: (1) ones with constant ~c with higher success probability but with lower
number of random bits requirement (which is the one analyzed in the following), (2) constant
~ci per xi with lower success probability at the cost of higher number of random bits.
When a permutation of a vector of t random bits is reused upto m times for encoding
other information/key bits, we need to consider two cases for revealing the coded bits. In
the earlier analysis, we have assumed probability p for slicing attack to succeed at revealing a
101
Figure 6.13: The success probability according to m reused random bits when p = 0.9, t = 91
specific coded bit bi. The other possibility due to reuse is that another bit al might be revealed
through slicing attack with probability p, and it is reused at the bit position of bi. Eq. (6.5)
should be changed into the following equation to account for such reuse:
Psucc reuse =
(
1− (1− (p+ (1− p)q)t+1)k
)
(6.6)
where q is the probability that a reused bit al is revealed through slicing attack and is routed







In Eq. (6.7), pt is the probability that a reused bit bi is revealed by slicing attack of another
bit al. It results from the leakage/slicing attack success probability p of another bit al and the
probability that the reused bit al is routed to bi’s position. Note that a random permutation
pir maps a bit position i to another bit position j with probability 1/t over all t! permutations.
When slicing memory inspection of a bit fails with probability (1 − p), the event that a reuse
might reveal needs to be considered resulting in the success probability Psucc reuse to increase
by the factor of (1− p)q.
Fig. 6.13 shows the success probability parametrized by reuse factor m when p is 0.9 and
t is 91. The success probability is 0.1 when the reuse factor m is 30. For m = 86, the
success probability goes up to 0.9. Fig. 6.17 shows the success probability of our proposed
102
Figure 6.15 The success probability
Figure 6.16 The number of random
bits(t) when Psucc = 0.0078
Figure 6.17: Performance comparison between proposed scheme and t-private schemes
memory scheme and t-private schemes. Our proposed schema requires only 5 random bits for
Psucc = 0.0078 as in Fig. 6.17.(b).
Now let us consider the complexity of the t + 1-bit ripple carry adders used for encoding
and decoding in terms of number of logic gates. Since one of the adder operands is a constant,
a full adder bit-slice design can be made simpler than the typical full adder. If a constant bit
b0 is 0, the carry-out bit c1 is a0c0 where a0 and c0 is an input and a carry-in bit, respectively.
The sum bit s0 is a0 ⊕ c0. If a constant bit b0 is 1, the carry-out bit c1 is a0 + c0 and the sum
bit s0 is (a0 ⊕ c0)′. Only 2 logic gates are needed for a specialized full adder leading to total
number of logic gates for the t+ 1-bit adder as 2(t+ 1).
6.5 New Computable And t-private Logic Schema And Gates
Consider an inverter y = x¯. If x is encoded with our schema, the incoming (t + 1)-tuple
represents the encoding (x, ~rx, ~cx). The inverter needs to recode the output, however, with
respect to the vector (y, ~ry, ~cy). This will require first decoding the incoming (t+ 1)-tuple and
then recoding it. Had we used the basic encoding schema, this would have revealed x in the
open temporarily, open to a probing attack. No bit xi should be in-flight in the raw form even
103
momentarily creating a weak link. We overcome this by using xi ⊕ ri1 ⊕ ri2 ⊕ · · · ⊕ rit as MSB
in addition.
With this scheme, the MSB of the decoded vector ~di = [xi ⊕ rit ⊕ · · · ⊕ ri1, rit, . . . , ri1] is
identical to Ishai encoding of private circuits [Ishai et al. (2003)], and hence can be connected
to Ishai’s t-private combinational logic gates. The classical t-private scheme has t2 area and
t time overhead. We only save on the random bits by adopting this approach. We however
propose a more efficient combinational logic using the decoded vectors which have the same
functionality as the traditional logic operation with lower overhead.
6.5.1 AND operation
Let two encoded bit vectors be ~e1 = [x1⊕r1t⊕· · ·⊕r11, ~r1]+~c1 and ~e2 = [x2⊕r2t⊕· · ·⊕r21, ~r2]+~c2
from the memory. They are decoded by the decoder, which are denoted by ~d1 and ~d2. First,
consider the simple case in which t is 1. Two decoded bit vectors are ~d1 = [x1 ⊕ r1, r1] and
~d2 = [x2 ⊕ r′1, r′1]. The result of the AND operation should be [x1 · x2 ⊕ r′′1 , r′′1 ]. How can we
obtain the result and r′′1 ? Let us perform the following computation:
~d1 ∧ ~d2 = [(x1 ⊕ r1) · (x2 ⊕ r′1), r1 · r′1]
= [x1 · x2 ⊕ r1 · x2 ⊕ x1 · r′1 ⊕ r1 · r′1, r1 · r′1]
x1 ·x2⊕r1 ·x2⊕x1 ·r′1⊕r1 ·r′1 in the above equation should be changed into x1 ·x2⊕r1 ·r′1 in order
to obtain desired result and thus additional computations are required to remove r1 ·x2⊕x1 ·r′1.
We define the AND operation in this case (t = 1) as the following equations:
AND( ~d1, ~d2) = [x1 ⊕ r1, r1] AND [x2 ⊕ r′1, r′1]
= [(x1 ⊕ r1) · (x2 ⊕ r′1)⊕(x1 ⊕ r1) · r′1 ⊕ (x2 ⊕ r′1) · r1︸ ︷︷ ︸
additional computations
, r1 · r′1]
= [x1 · x2 ⊕ r′′1 , r′′1 ]
where r′′1 is equal to r1 · r′1.
Let us now increase the value of t to develop our intuition. Two decoded vectors are
~d1 = [x1 ⊕
⊕
rj , ~r] and ~d2 = [x2 ⊕
⊕
r′j , ~r′]. In this case, the AND operation is equal to the
104
following equation:
AND( ~d1, ~d2) = [x1 ⊕
⊕




























r′j) · ~r] (6.8)
=
[














rj = r1 ⊕ r2 ⊕ · · · ⊕ rt and (
⊕
r′j) · ~r = [(r′1 ⊕ · · · ⊕ r′t)r1, . . . , (r′1 ⊕ · · · ⊕ r′t)rt]. The
number of gates required is t + 7 for t + 1 AND gates and 6 additional operations. Thus, the
area/gate complexity of this AND operation is O(t). This is more efficient than Ishai’s t-private
model which has the area complexity of O(t2) [Ishai et al. (2003)]. Moreover, this computation
can be performed in O(log t) time as opposed to O(t) in the original private circuits.
6.5.2 OR operation
We define the OR operation as follows:
OR( ~d1, ~d2) = [x1 ⊕
⊕











































An OR gate is a logic dual of an AND gate. Hence, OR operation logic also has the same




The NOT operation is modeled by the following equations:




6.5.4 The perfect secrecy
The original secret bit xi must not be revealed when the adversary probes tp ≤ t nodes in a
t-private logic circuit. The t-privacy parameter determines the bounds of probing experiments
for perfect secrecy. In Ishai’s privacy model, there is no grey zone analysis - you either have
perfect secrecy (p = 0) or you are unacceptably compromised. We develop a t-private circuit





r1j in the proposed AND or OR logic circuit exactly, x1 is leaked easily.
Assuming that the adversary can access any circuit node equally likely with 100% certainty,












where n is the number of total nodes. Since n is much larger than t generally, Psucc is very low.
For example, when n and t are 100 and 10, respectively, Psucc is 2.6× 10−12. In order to make




r′j) which consists of two terms in Eq. (6.8) or Eq. (6.9)
can be resolved into
⊕{(xi ⊕⊕ rj) · r′j} which consists of t terms.
The perfect secret circuit is defined as a circuit that appears like a pseudo-random num-
ber generator. There is no appreciable (poly adversary limited or whatever other restrictions
106
Figure 6.18: An output of AND operation for the perfect secrecy
are placed on the adversary) correlation between inputs and outputs. Given any input, the
probability of any output vector should be the same. It does not depend on the input :
Pr[y|xi] = Pr[y] ∀xi.
where xi is the input and y is the output. This is the same property required of encryption
functions. For example, the traditional AND gate does not have perfect secrecy since the
output depends on inputs. AND-XOR network with a random bit has the perfect secrecy for
inputs of AND gates [Park and Tyagi (2012)]. Fig. 6.18 shows the schematic of the first bit of
the vector term in Eq. (6.8) which needs the perfect secrecy. For the perfect secrecy, additional
XOR gates and new random bits are inserted. Numbers in the logic circuit represents the
probability that the node is one. The probability that the output is one is always equal to 0.5,
does not depend on inputs. Also, the vector (
⊕
r′j) · ~r in Eq. (6.8) should be changed into
[(
⊕
r′j)rt ⊕ r′′1 , (
⊕
r′j)rt−1 ⊕ r′′1 , . . . , (
⊕
r′j)r2 ⊕ r′′dt/2e, (
⊕
r′j)r1 ⊕ r′′dt/2e] for the perfect secrecy.
This technique can also be applied to OR logic circuit for the perfect secrecy in a similar
manner. We compare the number of intermediate random bits for the perfect secrecy of three
t-private AND circuits which are Ishai’s t-private model, our earlier modified t-private model
[Park and Tyagi (2012)] and computable t-private model. Table 6.3 shows the comparison of the
number of intermediate random bits per AND/OR gate for our HOST scheme, Ishai’s t-private
107
Table 6.3: Number of Random Bits Used for an AND Gate and for an N -gate Circuit
AND Gate Modified t-private (HOST) Ishai’s t-private Computable t-private Computable t-private - perfect secrecy
# of random bits d t+12 e = O(t) t(t+1)2 = O(t2) 2 d t2e
N -gate circuit Modified t-private (HOST) Ishai’s t-private Computable t-private Computable t-private - perfect secrecy
# of random bits Nt Nt2 N ∗ ((t/m) + 2) N ∗ ((t/m) + d t2e+ 2)
scheme, proposed computable t-private without perfect secrecy, and proposed computable t-
private with perfect secrecy. The last two rows show the total number of random bits used
among these private schemes for a circuit with N gates.
6.6 Hardware Implementation
Table 6.4: Hardware Implementation on FPGA
t-private t-private with R.M proposed computable and t-private
# keys 10 10 10
# bits of a key 128 128 128
t 63 19 4
Psucc 0.14 0.135 0.016
Block RAM 1024 * 80 (35*80) + (304*8) 80*80
# decoded bits per 1 clock 16 16 16
Input bits of decoder 64*16 = 1024 19+16+(19*16) = 339 80
# LUTs 208 25 16
Delay(ns) 1.926 1.998 0.931
We implemented t-private memories including the random matrix method and our proposed
computable and t-private memory. We used Xilinx ISE tools for the synthesis and the target
device is Xilinx Virtex-5 FPGA (XC5VFX70T-3FF1136). Table 6.4 shows the parameters and
the number of used Block RAMs, LUTs and delay for each decoder. In case of t-private memory,
63 random bits are required for Psucc = 0.14. The stored bits of encoded keys in memory total
nk(t+ 1) = 10 ∗ 128 ∗ (63 + 1). Since the width of Block RAM in FPGA is limited to 1152 bits,
we set the width of the Block RAM to be 1024. Thus, 16 decoded bits (1024 / 64) per 1 clock
can be generated and 8 clock cycles are needed for decoding 1 key, which is the reference clock
to compare used LUTs and delays for decoders of t-private memories. Since we set the total
clock cycles for decoding a key to be 8, 35 bits which include 19 bits for random bits and 16
encoded bits of 16 secret-key bits are released from a block RAM and 304 bits (16 × 19) also
108
are output from another block RAM for a random matrix simultaneously.
Our proposed memory scheme has lower storage needs (only 7% of t-private memory) even
though the success probability is almost 10% lower than the t-private memory. Also, the
decoder of our proposed memory has lower area and time overhead – specifically it requires
92% lower area, 51% less delay and 36% less area, 53% less delay compared to t-private memory
and t-private memory with a random matrix, respectively.
6.7 Conclusion
Side channel attacks and static inspection attacks on silicon chips have necessitated tech-
niques to make circuit implementations resistant (private) to these probes and inspections.
t-private circuits protect the privacy of the data in flight during computation. Memories (on-
chip or off-chip) however are not protected by t-private circuits.
Valamehr et al. [Valamehr et al. (2012)] introduced a few memory protection schemes.
We introduce a unified analysis framework to compare these schemes. Effectiveness metrics for
these schemes include area/gate count overhead, time overhead, number of random bits needed,
and adversary success probability per random bit. In this chapter, we specifically analyzed the
storage overhead and the success probability of t-private memories, t-private memories with
random matrix (for random bits reuse), and a hybrid private memory.
Ideally, we would like to design a private computing circuit with unified private memory.
In such a computing system, data and keys never appear in their raw form, thereby protecting
privacy of data and keys. We consider a memory scheme to be computable if the encoded
stored keys can be directly used in t-private computations.
Most of the memory schemes presented in Valamehr et al. [Valamehr et al. (2012)] are not
computable. The main new interesting technique they develop is to judiciously reuse random
bits while still limiting the adversary to low success probability. We develop a new memory
schema that is computable, and yet reuses many random bits by bringing in an arithmetic
function into encoding. We present the computable and t-private encoding method and cor-
109
responding logic operations (AND, OR and NOT) suitable for our memory scheme. The new
private circuits are more efficient than Ishai’s t-private model (only t area overhead compared to
t2 area overhead of Ishai). We verified that our memory model has advantages in performance
(the success probability and delay) and area cost by implementing it on FPGA.
110
CHAPTER 7. CONCLUSION AND FUTURE WORK
7.1 Conclusion
In this thesis, the methodology to implement secure hardware design against side-channel
attacks has been proposed. Unsafe modules in the cryptographic system are searched by SCA
security metrics based on normalized standard deviation, KL divergence or mutual information.
If security metrics of any modules are out of the boundary range or threshold, the modules
are vulnerable against side-channel attacks. In order to find the boundary or threshold, secu-
rity metrics are compared with the result value of simulated side-channel attacks such as the
successful probability or the successful recognition rate. The range between 0 and allowable
successful recognition rate is mapped on the range of security metrics. In order to make more
strict boundary, various side-channel attacks using LDA, QDA, na¨ıve Bayes classifier and SVM
are performed.
Vulnerable modules are transformed into secure modules by re-synthesizing with secure
logic styles such as SABL, WDDL or t-private logic cells. These secure logic styles are satisfied
with the secure condition based on the security metrics. Designers can select secure logic style
suitable for the hardware specification and constraints.
Memories also should be protected from physical access such as probing to reveal secret
information stored in the memory. For the protection, we develop a new computable t-private
memory schema which reuses many random bits by bringing in an arithmetic function into
encoding. The computable and t-private encoding method can be applied to combinational
logic operation. The new private circuits are more efficient than Ishai’s t-private model (only t
area overhead compared to t2 area overhead of Ishai). Consequently, the secure logic package
including secure logic styles and private memories should be required to implement secure ASIC
111
or FPGA hardware system against SCA attacks.
7.2 Future Work
There exist several challenging problems in future work in the area of secure hardware
implementation. Our graph-based power estimation method using the renewal theory and
linear regression may be too time-consuming to estimate power of large-size digital module even
though this method is faster than SPICE simulation. For fast and reliable security testing, high
performance computing using GPU or hardware accelerators can be an alternative to solve the
problem. The graph-based algorithm can be mapped on GPU.
We do not deal with how to generate random bits in this thesis. t-private logic circuits must
require a lot of random bits for the perfect security. PUF-based random number generators
will be good choice. Also, the distribution of random bits to t-private logic circuits will be
significant issue. Ideally, refreshed random bits must be provided to every private circuits in
each clock cycle but it causes large power consumption and large area increasing. Efficient
distribution of random numbers temporally and spatially should be researched.
112
APPENDIX A. THE ADVANCED ENCRYPTION STANDARD [FIPS
(2001)]
A.1 Algorithm













AddRoundKey(state, w[Nr ∗Nb, (Nr + 1) ∗Nb − 1])
out = state
end
// Nr: the number of rounds, Nb : the number of columns (32-bit words) comprising the
State
Algorithm 3 Pseudo Code for AES encryption
A.1.1 SubBytes
The SubBytes step is the only non-linear transformation of the cipher. SubBytes is a
bricklayer permutation consisting of an S-box applied to the bytes of the state. Fig. A.1
illustrates the effect of the SubBytes step on the state. The S-box function should be satisfied
with the following conditions:
113
Figure A.1: SubByte ( ) applies the S-box to each byte of the State
1. The maximum input-out correlation amplitude must be as small as possible.
2. The maximum difference propagation probability must be as small as possible.
3. The algebraic expression of S-box in GF(28) has to be complex.
The S-box is defined as the following equations:
Sbox(a) = f(g(a))
g : a→ b = a−1 ( mod x8 + x4 + x2 + x+ 1 ) in GF(28)












1 1 1 1 1 0 0 0
0 1 1 1 1 1 0 0
0 0 1 1 1 1 1 0
0 0 0 1 1 1 1 1
1 0 0 0 1 1 1 1
1 1 0 0 0 1 1 1
1 1 1 0 0 0 1 1

























The ShiftRows step is a byte transposition that cyclically shifts the rows of the state over
different offsets. Row 0 is shifted over C0 bytes, row 1 over C1 bytes, row 2 over C2 bytes and
row 3 over C3 bytes, so that the byte at position j in row i moves to position (j−Ci) mod Nb.
114
Figure A.2: ShiftRows ( ) cyclically shifts the last three rows in the State
The shift offsets C0, C1, C2 and C3 depends on the value of Nb. Table A.1 shows shift offsets
depending on Nb. Fig. A.2 illustrates the ShiftRows transformation.
Table A.1: ShiftRows: shift offsets for different block lengths
Nb C0 C1 C2 C3
4 0 1 2 3
5 0 1 2 3
6 0 1 2 3
7 0 1 2 4
8 0 1 3 4
A.1.3 MixColumns
The MixColumns step is a bricklayer permutation operating on the state column by column.
The columns are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a
fixed polynomial a(x), given by
a(x) = {03}x3 + {01}x2 + {01}x+ {02}








02 03 01 01
01 02 03 01
01 01 02 03










Figure A.3: MixColumns( ) operates on the State column-by-column
Figure A.4: AddRoundKey( ) XORs each column of the State with a word from the key schedule
Fig. A.3 illustrates the MixColumns transformation.
A.1.4 AddRoundKey
The key addition is denoted AddRoundKey. In this transformation, the state is modified by
combining it with a round key with the bitwise XOR operation. Each round key consists of Nb








3,c] = [S0,c, S1,c, S2,c, S3,c]⊕ [wround∗Nb+c] for 0 ≤ c < Nb.
Fig. A.4 illustrates the AddRoundKey operation.
116
A.1.5 Key Schedule
The key schedule consists of two components: the key expansion and the round key selection.
Alg. 4 represents pseudo code for key expansion. SubWord() is a function that takes a four-byte
input word and applies the S-box to each of the four bytes to produce an output word. The
function RotWord() takes a word [a0, a1, a2, a3] as input, performs a cyclic permutation, and
returns the word [a1, a2, a3, a0]. The round constant word array, Rcon[i], contains the value
given by [xi−1, {00}, {00}, {00}], with xi−1 being powers of x (x is denoted as {02}) in the field
GF(28)).




while i < Nk do
w[i] = word (key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]
i = i + 1
end while
i = Nk
while i < Nb ∗ (Nr + 1) do
tamp = w[i-1]
if i mod Nk = 0 then
temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]
else if Nk > 6 and i mod Nk = 4 then
temp = SubWord(temp)
end if
w[i] = w[i-Nk] xor temp
i = i + 1
end while
end
// Note that Nk = 4, 6 or 8 when key lengths are 128, 192 or 256 bits, repectively
Algorithm 4 Pseudo Code for Key Expansion
117
APPENDIX B. TOOL SCRIPTS
B.1 Setup (FreePDK45)
1. Download FreePDK45 design kit
at http://www.eda.ncsu.edu/wiki/FreePDK45:Contents.
2. Make setup script.
#! / bin /bash
#######################################################
# FreePDK Setup S c r i p t
# 3/21/2016 by Jungmin Park ( jmpark00@iastate . edu )
#######################################################
# Set the CDK DIR v a r i a b l e s
export CDK DIR=/usr / l o c a l / cadence / i c l o c a l /ncsu−cdk −1 . 6 . 0 . beta
# Set the PDK DIR v a r i a b l e s to the root d i r e c t o r y o f the FreePDK
d i s t r i b u t i o n
export PDK DIR=$PWD/FreePDK45
# Set CDSHOME to the root d i r e c t o r y o f the Cadence ICOA i n s t a l l s t i o n
export CDSHOME=$IC
i f [ ! −f ”$PWD/ . cdsenv ” ]
then
cp / remote/ ncsu oa / l o c a l / cdssetup / cdsenv $PWD/ . cdsenv
f i
118
i f [ ! −f ”$PWD/ . c d s i n i t ” ]
then
cp $PDK DIR/ nc su ba s ek i t / cdssetup / c d s i n i t $PWD/ . c d s i n i t
f i
i f [ ! −f ”$PWD/ cds . l i b ” ]
then
cp $PDK DIR/ nc su ba s ek i t / cdssetup / cds . l i b $PWD/ cds . l i b
f i
i f [ ! −f ”$PWD/ l i b . d e f s ” ]
then
cp $PDK DIR/ nc s u b a s ek i t / cdssetup / l i b . d e f s $PWD/ l i b . d e f s
f i
i f [ ! −f ”$PWD/ . runset . c a l i b r e . drc ” ]
then
cp $PDK DIR/ nc s u b a s ek i t / cdssetup / runset . c a l i b r e . drc $PWD/ . runset .
c a l i b r e . drc
f i
i f [ ! −f ”$PWD/ . runset . c a l i b r e . l v s ” ]
then
cp $PDK DIR/ nc s u b a s ek i t / cdssetup / runset . c a l i b r e . l v s $PWD/ . runset .
c a l i b r e . l v s
f i
i f [ ! −f ”$PWD/ . runset . c a l i b r e . l f d ” ]
then
cp $PDK DIR/ nc s u b a s ek i t / cdssetup / runset . c a l i b r e . l f d $PWD/ . runset .
c a l i b r e . l f d
119
f i
i f [ ! −f ”$PWD/ . runset . c a l i b r e . pex” ]
then
cp $PDK DIR/ nc s u b a s ek i t / cdssetup / runset . c a l i b r e . pex $PWD/ . runset .
c a l i b r e . pex
f i
export pre sent=$PYTHONPATH
i f [ $present = ”” ]
then
export PYTHONPATH=$PDK DIR/ nc s u b a s ek i t / t e c h f i l e / cn i
e l s e
export PYTHONPATH=$PYTHONPATH: $PDK DIR/ nc s u ba s ek i t / t e c h f i l e / cn i
f i
export MGC CALIBRE DRC RUNSET FILE=./. runset . c a l i b r e . drc
export MGC CALIBRE LVS RUNSET FILE=./. runset . c a l i b r e . l v s
export MGC CALIBRE PEX RUNSET FILE=./. runset . c a l i b r e . pex
3. Modify cds.lib file
DEFINE analogLib $CDSHOME/ t o o l s / d f I I / e t c / c d s l i b / a r t i s t / analogLib
DEFINE US 8ths $CDSHOME/ t o o l s / d f I I / e t c / c d s l i b / s h e e t s /US 8ths
DEFINE bas i c $CDSHOME/ t o o l s / d f I I / e t c / c d s l i b / ba s i c
DEFINE cdsDefTechLib $CDSHOME/ t o o l s / d f I I / e t c / cdsDefTechLib
DEFINE NCSU TechLib FreePDK45 $PDK DIR/ nc s u ba s ek i t / l i b /
NCSU TechLib FreePDK45
DEFINE NCSU Devices FreePDK45 $PDK DIR/ nc s u ba s ek i t / l i b /
NCSU Devices FreePDK45
DEFINE NCSU Analog Part $CDK DIR/ l i b /NCSU Analog Parts
120
DEFINE OSU $PDK DIR/ osu soc / l i b / f r e e p d k 4 5 c e l l s
4. Modify .cdsenv file
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
; s p e c t r e environment v a r i a b l e s
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
s p e c t r e . envOpts mode lF i l e s s t r i n g ”$PDK DIR/ osu soc / l i b / f i l e s /
gpdk45nm .m”
s p e c t r e . envOpts controlMode s t r i n g ” batch ”
5. Execute Cadence virtuoso.
$ source setup . sh
$ v i r t u o s o &
B.2 RTL Complier Tcl Script
#############################################
# S c r i p t f o r Cadence RTL Compiler s y n t h e s i s
# Use with syn−r t l −f <r t l−s c r i p t>
#############################################
# Set the search paths to the l i b r a r i e s and the HDL f i l e s
# Remember that ” .” means your cur rent d i r e c t o r y
s e t a t t r i b u t e hd l s ea r ch path { . . / f u n c t i o n a l } ;
s e t a t t r i b u t e l i b s e a r c h p a t h { . . / l i b d i r } ;
s e t a t t r i b u t e l i b r a r y [ l i s t gscl45nm . l i b ] ;
121
s e t a t t r i b u t e i n f o r m a t i o n l e v e l 6 ; # See a l o t o f warnings .
s e t myFiles [ l i s t v e r i l o g . v ] ;
s e t basename AND2X1t1 ; # top module
s e t runname RTL;
#s e t myPeriod ps 10000
#s e t myInDelay ps 250
#s e t myOutDelay ps 250
#############################################
# below here shouldn ’ t neet to be changed
#############################################
# Analyze and Elaborate the HDL f i l e s
r ead hd l ${myFiles}
e l a b o r a t e ${basename}
# Apply Const ra int s and generate c l o c k s
# s e t c l o ck [ d e f i n e c l o c k −per iod ${myPeriod ps} −name ${myClk} [
c l o c k p o r t s ] ]
# e x t e r n a l d e l a y −input $myInDelay ps −c l o ck ${myClk} [ f i n d / −port
p o r t s i n /∗ ]
# e x t e r n a l d e l a y −output $myOutDelay ps −c l o ck ${myClk} [ f i n d / −port
po r t s ou t /∗ ]
# Sets t r a n s i t i o n to d e f a u l t va lue s f o r Synopsys SDC format ,
# f a l l / r i s e 400 ps
# dc : : s e t c l o c k t r a n s i t i o n . 4 $myClk
# check that the des ign in OK so f a r
check des i gn −unreso lved
122
r epor t t iming − l i n t
# Synthes i z e the des ign to the t a r g e t l i b r a r y
s y n t h e s i z e −to mapped
# Write out the r e p o r t s
r epor t t iming > ${basename} $ {runname} t iming . rep
r epor t gate s > ${basename} $ {runname} c e l l . rep
r epor t power > ${basename} $ {runname} power . rep
r epor t area > ${basename} $ {runname} a r ea . rep
# Write out the s t r u c t u r a l Ver i l og and sdc f i l e s
w r i t e h d l −mapped > . . / encounter /${basename} $ {runname } . v
w r i t e s d c > . . / encounter /${basename} $ {runname } . sdc
B.3 Encounter Script
B.3.1 Configuration file (encounter.conf)
################################################
# #
# FirstEncounter Input c o n f i g u r a t i o n f i l e #
# #
################################################
# Spec i f y the name o f your t o p l e v e l module
s e t my top leve l AND2X1t1
s e t RTL RTL
################################################
# No changes r equ i r ed below
################################################
123
g l o b a l env
#s e t OSU FREEPDK $env (PDK DIR) / osu soc
g l o b a l rda Input
s e t rda Input ( u i n e t l i s t ) $my toplevel$RTL . v
s e t rda Input ( u i t i m i n g c o n f i l e ) $my toplevel$RTL . sdc
s e t rda Input ( u i t o p c e l l ) $my top leve l
s e t rda Input ( u i n e t l i s t t y p e ) {Ver i l og }
s e t rda Input ( u i i l m l i s t ) {}
s e t rda Input ( u i s e t t o p ) {1}
s e t rda Input ( u i c e l l l i b ) {}
s e t rda Input ( u i i o l i b ) {}
s e t rda Input ( u i a r e a i o l i b ) {}
s e t rda Input ( u i b l k l i b ) {}
s e t rda Input ( u i k b o x l i b ) ””
s e t rda Input ( u i t i m e l i b ) ” . . / l i b d i r /gscl45nm . t l f ”
s e t rda Input ( ui smodDef ) {}
s e t rda Input ( ui smodData ) {}
s e t rda Input ( u i dpath ) {}
s e t rda Input ( u i t e c h f i l e ) {}
s e t rda Input ( u i i o f i l e ) ””
s e t rda Input ( u i b u f f o o t p r i n t ) {buf}
s e t rda Input ( u i d e l a y f o o t p r i n t ) {buf}
s e t rda Input ( u i i n v f o o t p r i n t ) { inv }
s e t rda Input ( u i l e f f i l e ) ” . . / l i b d i r / gscl45nm . l e f ”
s e t rda Input ( u i c o r e c n t l ) { aspect }
s e t rda Input ( u i a s p e c t r a t i o ) {1 .0}
s e t rda Input ( u i c o r e u t i l ) {0 .7}
s e t rda Input ( u i c o r e h e i g h t ) {}
s e t rda Input ( u i c o r e w i d t h ) {}
s e t rda Input ( u i c o r e t o l e f t ) {}
124
s e t rda Input ( u i c o r e t o r i g h t ) {}
s e t rda Input ( u i c o r e t o t o p ) {}
s e t rda Input ( u i co r e to bo t tom ) {}
s e t rda Input ( u i max i o he i gh t ) {0}
s e t rda Input ( u i r ow he i gh t ) {}
s e t rda Input ( u i i sHorTrackHa l fP i t ch ) {0}
s e t rda Input ( u i i sVerTrackHa l fP i t ch ) {1}
s e t rda Input ( u i i o O r i ) {R0}
s e t rda Input ( u i i s O r i g C e n t e r ) {0}
s e t rda Input ( u i e x c n e t ) {}
s e t rda Input ( u i d e l a y l i m i t ) {1000}
s e t rda Input ( u i n e t d e l a y ) {1000.0 ps}
s e t rda Input ( u i n e t l o a d ) {0 .5 pf }
s e t rda Input ( u i i n t r a n d e l a y ) {120 .0 ps}
s e t rda Input ( u i c a p t b l f i l e ) {}
s e t rda Input ( u i c a p s c a l e ) {1 .0}
s e t rda Input ( u i x c a p s c a l e ) {1 .0}
s e t rda Input ( u i r e s s c a l e ) {1 .0}
s e t rda Input ( u i s h r s c a l e ) {1 .0}
s e t rda Input ( u i t i m e u n i t ) {none}
s e t rda Input ( u i c a p u n i t ) {}
s e t rda Input ( u i s i g s t o r m l i b ) {}
s e t rda Input ( u i c d b f i l e ) {}
s e t rda Input ( u i e c h o f i l e ) {}
s e t rda Input ( u i q x t e c h f i l e ) {}
s e t rda Input ( u i q x l i b f i l e ) {}
s e t rda Input ( u i q x c o n f f i l e ) {}
s e t rda Input ( ui pwrnet ) {vdd}
s e t rda Input ( u i gndnet ) {gnd}
s e t rda Input ( f l i p f i r s t ) {1}
s e t rda Input ( double back ) {1}
s e t rda Input ( a s s i g n b u f f e r ) {0}
125
s e t rda Input ( u i p g c o n n e c t i o n s ) [ l i s t \
{PIN : vdd :} \
{PIN : gnd :} \
]
s e t rda Input (PIN : vdd : ) {vdd}
s e t rda Input (PIN : gnd : ) {gnd}
B.3.2 tcl file (encounter.tcl)
###################################
# Run the des ign through Encounter
###################################
# Setup des ign and c r e a t e f l o o r p l a n
loadConf ig . / encounter . conf
#commitConfig
# Create I n i t i a l F loorp lan
f l o o r p l a n −r 1 . 0 0 .85 0 0 0 0
# Create Power s t r u c t u r e s
#addRing −spacing bottom 5 −w i d t h l e f t 5 −width bottom 5 −width top 5 −
spac ing top 5 −l ayer bottom metal5 −width r i gh t 5 −around core −cente r
1 − l a y e r t o p metal5 −s p a c i n g r i g h t 5 −s p a c i n g l e f t 5 − l a y e r r i g h t
metal6 − l a y e r l e f t metal6 −nets { gnd vdd }
# Place standard c e l l s
amoebaPlace
# Route power nets
s route −noBlockPins −noPadRings
126
# Perform t r i a l route and get i n i t i a l t iming r e s u l t s
t r i a l r o u t e
#buildTimingGraph
#setCteReport
#reportTA −nworst 10 −net > t iming . rep . 1 . p laced
# Run in−p lace opt imiza t i on
# to f i x setup problems
#setIPOMode −mediumEffort −fixDRC −addPortAsNeeded
#initECO . / ipo1 . txt




#reportTA −nworst 10 −net > t iming . rep . 2 . ipo1
# Run Clock Tree Synthe s i s
#createClockTreeSpec −output encounter . c t s −bufFootpr int buf −invFootpr int
inv
#spec i fyClockTree − c l k f i l e encounter . c t s
#ckSynthes i s −rgu ide c t s . rgu ide −r epor t r epo r t . c t s r p t −macromodel r epo r t .
ctsmdl − f i x a d d e d b u f f e r s
# Output Resu l t s o f CTS
#t r i a l R o u t e −h i g h E f f o r t −guide c t s . rgu ide
#extractRC
#reportClockTree −postRoute −loca lSkew −r epor t skew . p o s t t r o u t e l o c a l .
c t s r p t
#reportClockTree −postRoute −r epor t r epo r t . p o s t t r o u t e . c t s r p t
# Run Post−CTS Timing a n a l y s i s




#reportTA −nworst 10 −net > t iming . rep . 3 . c t s
# Perform post−CTS IPO
#setIPOMode −h i g h E f f o r t −f i xDrc −addPortAsNeeded −i nc rTr ia lRoute −r e s t r u c t
−topomap
#initECO ipo2 . txt
#setExtractRCMode −d e f a u l t −assumeMetFil l
#extractRC
#f i x S e t u p V i o l a t i o n −guide c t s . rgu ide
# Fix a l l remaining v i o l a t i o n s
#setExtractRCMode −d e t a i l −assumeMetFil l
#extractRC
#i f { [ isDRVClean −maxTran −maxCap −maxFanout ] != 1} {




# Run Post IPO−2 t iming a n a l y s i s
#buildTimingGraph
#setCteReport
#reportTA −nworst 10 −net > t iming . rep . 4 . ipo2
# Add f i l l e r c e l l s
a d d F i l l e r −c e l l FILL −p r e f i x FILL −f i l lBoundary
# Connect a l l new c e l l s to VDD/GND
globalNetConnect vdd −type t i e h i
128
globalNetConnect vdd −type pgpin −pin vdd −o v e r r i d e
globalNetConnect gnd −type t i e l o
globalNetConnect gnd −type pgpin −pin gnd −o v e r r i d e
# Run g l o b a l Routing
g loba lDeta i lRoute
# Get f i n a l t iming r e s u l t s




#reportTA −nworst 10 −net > t iming . rep . 5 . f i n a l
# Output GDSII
streamOut f i n a l . gds2 −mapFile . . / l i b d i r / gds2 encounter . map −s t r i p e s 1 −
un i t s 1000 −mode ALL
s a v e N e t l i s t −exc ludeLea fCe l l f i n a l . v
# Output DSPF RC Data
rcout −sp f f i n a l . dspf
# Run DRC and Connection checks
ver i fyGeometry
v e r i f y C o n n e c t i v i t y −type a l l
win
puts ”∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗”
puts ”∗ Encounter s c r i p t f i n i s h e d ∗”
puts ”∗ ∗”
129
puts ”∗ Resu l t s : ∗”
puts ”∗ −−−−−−−− ∗”
puts ”∗ Layout : f i n a l . gds2 ∗”
puts ”∗ N e t l i s t : f i n a l . v ∗”
puts ”∗ Timing : t iming . rep . 5 . f i n a l ∗”
puts ”∗ ∗”





Agrawal, D. and Aggarwal, C. C. (2001). On the design and quantification of privacy preserving
data mining algorithms. In Symposium on Principles of Database Systems.
Alioto, M., Poli, M., and Rocchi, S. (2010). A general power model of differential power analysis
attacks to static logic circuits. IEEE Trans. Very Large Scale Integr. Syst., 18(5):711–724.
Alpaydin, E. (2010). Introduction to Machine Learning. The MIT Press, 2nd edition.
Basel Halak, Julian Murphy, A. Y. (2013). Power balanced circuits for leakage-power-attacks
resilient design. Cryptology ePrint Archive, Report 2013/048. http://eprint.iacr.org/.
Fei, Y., Ding, A. A., Lao, J., and Zhang, L. (2014). A statistics-based fundamental model for
side-channel attack analysis. Cryptology ePrint Archive, Report 2014/152. http://eprint.
iacr.org/.
FIPS (2001). Federal information processing standards publication (FIPS 197). Advanced
Encryption Standard (AES).
Group, T. C. (2013). Trusted Platform Module Specification and Architecture. Online at
http://www.trustedcomputinggroup.org/resources/tpm_main_specification/.
Halderman, J. A., Schoen, S. D., Heninger, N., Clarkson, W., Paul, W., Cal, J. A., Feldman,
A. J., and Felten, E. W. (2008). Least we remember: Cold boot attacks on encryption keys.
In In USENIX Security Symposium.
Ishai, Y., Sahai, A., and Wagner, D. (2003). Private circuits: Securing hardware against probing
attacks. In Advances in Cryptology - CRYPTO 2003, 23rd Annual International Cryptology
131
Conference, Santa Barbara, California, USA, August 17-21, 2003, Proceedings, volume 2729
of Lecture Notes in Computer Science, pages 463–481. Springer.
Kocher, P., Jaffe, J., and Jun, B. (1999). Differential power analysis. In Proceedings of the
19th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO ’99,
pages 388–397. Springer-Verlag.
Leuven, K. (2011). Ls-svmlab v1.8. Online at http://www.esat.kuleuven.be/sista/
lssvmlab/.
Mac, F., Standaert, F.-X., and Quisquater, J.-J. (2007). Information theoretic evaluation of
side-channel resistant logic styles. In Paillier, P. and Verbauwhede, I., editors, CHES, volume
4727 of Lecture Notes in Computer Science, pages 427–442. Springer.
Mangard, S. (2005). Masked dual-rail pre-charge logic: Dpa-resistance without routing con-
straints. In Systems ? CHES 2005, 7th International Workshop, pages 172–186. Springer.
Mangard, S., Oswald, E., and Popp, T. (2007). Power Analysis Attacks: Revealing the Secrets of
Smart Cards (Advances in Information Security). Springer-Verlag New York, Inc., Secaucus,
NJ, USA.
Mangard, S., Pramstaller, N., and Oswald, E. (2005). Successfully attacking masked aes hard-
ware implementations. In Cryptographic Hardware and Embedded Systems - CHES 2005,
7th International Workshop, Edinburgh, UK, August 29 - September 1, 2005, Proceedings,
volume 3659 of Lecture Notes in Computer Science, pages 157–171. Springer.
Mathai, A. and Provost, S. (1992). Quadratic Forms in Random Variables. Statistics: A Series
of Textbooks and Monographs. Taylor & Francis.
Messerges, T. S. (2000). Securing the aes finalists against power analysis attacks. In Fast
Software Encryption, 7th International Workshop, FSE 2000, New York, NY, USA, April
10-12, 2000, Proceedings, Lecture Notes in Computer Science, pages 150–164. Springer.
132
Messerges, T. S., Dabbish, E. A., Sloan, R. H., and Member, S. (2002). Examining smart-
card security under the threat of power analysis attacks. IEEE Transactions on Computers,
51:541–552.
Micali, S. and Reyzin, L. (2003). Physically observable cryptography. In TCC 2004, LNCS,
pages 278–296. Springer.
Micheli, G. D. (1994). Synthesis and Optimization of Digital Circuits. McGraw-Hill Higher
Education, 1st edition.
Mohyuddin, N., Pakbaznia, E., and Pedram, M. (2008). Probabilistic error propagation in logic
circuits using the boolean difference calculus. In Computer Design, 2008. ICCD 2008. IEEE
International Conference on, pages 7 –13.
Monteiro, J. C., Devadas, S., Ghosh, A., Keutzer, K., and White, J. K. (1997). Estimation of
average switching activity in combinational logic circuits using symbolic simulation. IEEE
Trans. on CAD of Integrated Circuits and Systems, 16(1):121–127.
Najm, F. N. (1994). A survey of power estimation techniques in vlsi circuits. IEEE Trans.
Very Large Scale Integr. Syst., 2(4):446–455.
NCSU (2011). Version 1.4 of freepdk45 kit. Online at http://www.eda.ncsu.edu/wiki/
FreePDK45:Contents.
Nelson, R. D. (1995). Probability, stochastic processes, and queueing theory - the mathematics
of computer performance modeling. Springer.
OSU (2008). Osu freepdk45 kit. Online at http://vlsiarch.ecen.okstate.edu/flow/#.
Park, J. and Tyagi, A. (2012). t-private logic synthesis on fpgas. In HOST, pages 63–68. IEEE.
Park, J. and Tyagi, A. (2014a). t-private systems: Unified private memories and computation.
In Security, Privacy, and Applied Cryptography Engineering - 4th International Conference,
SPACE 2014, Pune, India, October 18-22, 2014. Proceedings, pages 285–302.
133
Park, J. and Tyagi, A. (2014b). Towards making private circuits practical: DPA resistant
private circuits. In IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2014,
Tampa, FL, USA, July 9-11, 2014, pages 528–533.
Park, J. and Tyagi, A. (2016). Security metrics for power based SCA resistant hardware
implementation. In 29th International Conference on VLSI Design and 15th International
Conference on Embedded Systems, VLSID 2016, Kolkata, India, January 4-8, 2016, pages
541–546.
Prouff, E. and Rivain, M. (2007). A generic method for secure Sbox implementation. Informa-
tion Security Applications, pages 227–244.
Quisquater, J.-J. and Samyde, D. (2001). Electromagnetic analysis (ema): Measures and
counter-measures for smart cards. In Proceedings of the International Conference on Research
in Smart Cards: Smart Card Programming and Security, E-SMART ’01, pages 200–210,
London, UK, UK. Springer-Verlag.
Reed, I. (1954). A class of multiple-error-correcting codes and the decoding scheme. Information
Theory, IRE Professional Group on, 4(4):38 –49.
S. Kullback and R. A. Leibler (1951). On Information and Sufficiency. The Annals of Mathe-
matical Statistics, 22(1):79–86.
Samyde, D., Skorobogatov, S., Anderson, R., and Quisquater, J.-J. (2002). On a new way to
read data from memory. In Proceedings of the First International IEEE Security in Storage
Workshop, SISW ’02, pages 65–, Washington, DC, USA. IEEE Computer Society.
Sasao, T. and Fujita, M., editors (1996). Representations of Discrete Functions. Kluwer
Academic Publishers, Norwell, MA, USA.
Satoh, A., Morioka, S., Takano, K., and Munetoh, S. (2001). A compact rijndael hardware
architecture with s-box optimization. In Boyd, C., editor, ASIACRYPT, volume 2248 of
Lecture Notes in Computer Science, pages 239–254. Springer.
134
Sentovich, E., Singh, K., Lavagno, L., Moon, C., Murgai, R., Saldanha, A., Savoj, H., Stephan,
P., Brayton, R. K., and Sangiovanni-Vincentelli, A. L. (1992). Sis: A system for sequential
circuit synthesis. Technical report, EECS Department, University of California, Berkeley.
Standaert, F.-X., Malkin, T. G., and Yung, M. (2009). A unified framework for the analysis of
side-channel key recovery attacks. In Proceedings of the 28th Annual International Confer-
ence on Advances in Cryptology: The Theory and Applications of Cryptographic Techniques,
EUROCRYPT ’09, pages 443–461, Berlin, Heidelberg. Springer-Verlag.
Tiri, K., Akmal, M., and Verbauwhede, I. (2002). A dynamic and differential cmos logic with
signal independent power consumption to withstand differential power analysis on smart
cards. In Solid-State Circuits Conference, 2002. ESSCIRC 2002. Proceedings of the 28th
European, pages 403–406.
Tiri, K. and Verbauwhede, I. (2004). A logic level design methodology for a secure dpa resistant
asic or fpga implementation. In Proceedings of the Conference on Design, Automation and
Test in Europe - Volume 1, DATE ’04, pages 10246–, Washington, DC, USA. IEEE Computer
Society.
Tiri, K. and Verbauwhede, I. (2005). A VLSI Design Flow for Secure Side-Channel Attack
Resistant ICs. In Proceedings of the Conference on Design, Automation and Test in Europe
- Volume 3, DATE ’05, pages 58–63, Washington, DC, USA. IEEE Computer Society.
Tyagi, A. (2005). Energy-privacy trade-offs in vlsi computations. In Progress in Cryptology -
INDOCRYPT 2005, 6th International Conference on Cryptology in India, Bangalore, India,
December 10-12, 2005, Proceedings, volume 3797 of Lecture Notes in Computer Science, pages
361–374. Springer. A version titled Energy-Privacy-Time Tradeoffs in VLSI Computations
under revision for IEEE Trans. on Computers.
Valamehr, J., Chase, M., Kamara, S., Putnam, A., Shumow, D., Vaikuntanathan, V., and Sher-
wood, T. (2012). Inspection resistant memory: architectural support for security from phys-
ical examination. In Proceedings of the 39th Annual International Symposium on Computer
Architecture, ISCA ’12, pages 130–141, Washington, DC, USA. IEEE Computer Society.
135
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag New York,
Inc., New York, NY, USA.
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-Interscience.
Wasserman, L. (2006). All of Nonparametric Statistics (Springer Texts in Statistics). Springer-
Verlag New York, Inc., Secaucus, NJ, USA.
Weste, N. and Harris, D. (2010). CMOS VLSI Design: A Circuits and Systems Perspective.
Addison-Wesley Publishing Company, USA, 4th edition.
