On Physical Disorder Based Hardware Security Primitives by Vijayakumar, Arunkumar
University of Massachusetts Amherst 
ScholarWorks@UMass Amherst 
Doctoral Dissertations Dissertations and Theses 
November 2016 
On Physical Disorder Based Hardware Security Primitives 
Arunkumar Vijayakumar 
Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2 
 Part of the VLSI and Circuits, Embedded and Hardware Systems Commons 
Recommended Citation 
Vijayakumar, Arunkumar, "On Physical Disorder Based Hardware Security Primitives" (2016). Doctoral 
Dissertations. 763. 
https://scholarworks.umass.edu/dissertations_2/763 
This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at 
ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized 
administrator of ScholarWorks@UMass Amherst. For more information, please contact 
scholarworks@library.umass.edu. 
ON PHYSICAL DISORDER BASED HARDWARE
SECURITY PRIMITIVES
A Dissertation Presented
by
ARUNKUMAR VIJAYAKUMAR
Submitted to the Graduate School of the
University of Massachusetts Amherst in partial fulfillment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
September 2016
Electrical and Computer Engineering
c© Copyright by Arunkumar Vijayakumar 2016
All Rights Reserved
ON PHYSICAL DISORDER BASED HARDWARE
SECURITY PRIMITIVES
A Dissertation Presented
by
ARUNKUMAR VIJAYAKUMAR
Approved as to style and content by:
Sandip Kundu, Chair
Maciej Ciesielski, Member
Daniel Holcomb, Member
Hava Siegelmann, Member
Christopher V. Hollot, Department Head
Electrical and Computer Engineering
DEDICATION
To my parents
ACKNOWLEDGMENTS
First and foremost, I am grateful to my advisor Prof. Sandip Kundu for his
guidance during my stay at UMass. I am forever indebted for the engineering skills
and the general knowledge he has passed through his guidance. His dedication in
teaching and research has motivated me immensely and will always be a great source
of inspiration. I would like to thank Prof. Maciej Ciesielski, Prof. Dan Holcomb and
Prof. Hava Siegelmann for agreeing to be part of my committee. Their constructive
inputs have played a great part in shaping this research.
I thank my collaborators for their contribution on various projects. Special thanks
to Prof. Christof Paar and Prof. Dan Holcomb for their inputs in various projects. I
am greatly indebted to Dr. Charles Prado of Inmetro, Brazil. I had a very productive
period during his temporary visit to our lab. I would like to thank members of Intel
Circuit Research Labs (CRL), Sanu Mathew and Ram Krishnamurthy in particular,
for their involvement in this work. Collaboration with Intel CRL helped us to direct
this research towards practical applications. Also, I would like to thank my lab-mate
Vinay Patil for his contributions in various projects. Technical discussions with him
have been a great learning curve. I also thank Georg Becker for our discussions on
Strong PUFs.
Thanks to all my current and past lab members, who have created a great work
culture and ambiance in our lab. I will forever cherish my experiences and the friends
I made at Amherst. Without their support and insights, my Amherst experience
would not be the same. I would also like to thank my friends from undergraduate
days for their constant support.
v
My graduate life would have not been possible without the sacrifices of my parents.
I will always be thankful for the love and support of my parents and sister.
vi
ABSTRACT
ON PHYSICAL DISORDER BASED HARDWARE
SECURITY PRIMITIVES
SEPTEMBER 2016
ARUNKUMAR VIJAYAKUMAR
B.E., ANNA UNIVERSITY, INDIA
M.S, UNIVERSITY OF MASSACHUSETTS AMHERST
Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST
Directed by: Professor Sandip Kundu
With CMOS scaling extending transistors to nanometer regime, process varia-
tions from manufacturing impacts modern IC design. Fortunately, such variations
have enabled an emerging hardware security primitive - Physically Unclonable Func-
tion. Physically Unclonable Functions (PUFs) are hardware primitives which utilize
disorder from manufacturing variations for their core functionality. In contrast to
insecure non-volatile key based roots-of-trust, PUFs promise a favorable feature - no
attacker, not even the PUF manufacturer can clone the disorder and any attempt at
invasive attack will upset that disorder. Despite a decade of research, certain practi-
cal problems impede the widespread adoption of PUFs. This dissertation addresses
the important problems of (i) post-manufacturing testing, (ii) secure design and (iii)
cost efficiency of PUFs. This is with the aim of making PUFs practical and also
learning hardware design limitations of disorder based systems.
vii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
CHAPTER
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Disorder based Security Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classification and Application of PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Weak PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Strong PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Other PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Application of PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Ideal Properties of PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Security/Unclonability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Scope of this Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2. IMPROVING UNIQUENESS OF PUFS . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Metrics for Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
viii
2.2.2 Arbiter PUF for Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Testing Strong PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 CRPs for Uniqueness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Fast Search Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Multi-Index Hashing for Rapid Search . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Improving Test Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 Uniqueness Test Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.5 Associated DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Test Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 Yield Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.3 Impact of Systematic Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Testing Weak PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3. IMPROVING MACHINE LEARNING RESISTANCE OF
STRONG PUFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Modeling attacks on Strong PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Proposed PUF Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.1 Main idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Operation of the circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.4 SVM Machine Learning Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1 Circuit modifications for Reliability Enhancement . . . . . . . . . . . . . 36
3.5 Analysis of PUF Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.1 PUF Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.1.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.1.2 Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
ix
3.5.1.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.2 Other Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Fast PUF simulation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6.1 Gradient Boosting Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4. INVESTIGATING MACHINE LEARNING RESISTANCE OF
STRONG PUFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Bagging and Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.3 Logistic Regression and Support Vector Machine . . . . . . . . . . . . . . 53
4.3 Foundational Principles for Modeling-Attack Resistance . . . . . . . . . . . . . . 53
4.3.1 Abstract model: Function Composition Model . . . . . . . . . . . . . . . . 54
4.3.2 Effectiveness of Cascading Block Architecture . . . . . . . . . . . . . . . . . 55
4.3.3 Characteristics of the Circuit Sources . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.4 Impact of Digital Non-Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.4.1 XOR vs Cardinality of Function . . . . . . . . . . . . . . . . . . . . . 61
4.3.4.2 Impact of Feed-forward Loops . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Discussion and Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Experimental Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.3 Practical Evaluation of a Non-linear PUF . . . . . . . . . . . . . . . . . . . . 66
4.5 Exploring Alternative Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.1 Definition - Entropy Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5.2 Log-switch Structure - Function Composition . . . . . . . . . . . . . . . . . 69
4.5.2.1 Equally weighted challenges . . . . . . . . . . . . . . . . . . . . . . . . 71
x
4.5.2.2 Feed-forward loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2.3 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.3 Machine Learning Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.3.1 Comparison to Cascaded Switch architecture . . . . . . . . . . 74
4.5.3.2 Entropy Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.4 Additive Log Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.4.1 Entropy Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5. IMPROVING RELIABILITY OF PUFS . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.1 SRAM PUF and Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.2.1 Error-Correcting Codes and Fuzzy Extractor . . . . . . . . . . 86
5.2.2.2 Circuit and Manufacturing Technology Solutions . . . . . . 86
5.3 Proposed Up/Down counter (UDC) based Technique . . . . . . . . . . . . . . . . . 88
5.3.1 Harnessing Statistical Bias for Improving Reliability . . . . . . . . . . . 88
5.3.2 Temporal Majority Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.3 New Voter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.4 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.5 Error Rate from Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Analysis of UDC based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4.1 Operation of The Proposed Voter as Random Walk . . . . . . . . . . . . 93
5.4.2 Error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.3 UP/DOWN Counter vs TMV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4.4 DFT based on Trials to Settlement . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5 UDC technique - Results and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5.1 Case Study: Redundancy to Improve Yield and Error
Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5.2 Area and Performance Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 101
xi
5.6 Circuit Design Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6.1 Modeling Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.6.2 Thermal Noise Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.6.3 Simple 6T SRAM-based Weak PUF (Reference circuit) . . . . . . . . 106
5.6.4 Study of various Cell Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6.4.1 Simple active loads (D1) . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6.4.2 Stacked active loads (D2) . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.6.4.3 Parallel active loads (D3,D4) . . . . . . . . . . . . . . . . . . . . . . 110
5.6.4.4 Current Mirror loads (D5) . . . . . . . . . . . . . . . . . . . . . . . . 112
5.7 Circuit Design Alternatives - Results and Discussion . . . . . . . . . . . . . . . . 113
5.7.1 Error rate Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7.2 Flipping point based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7.3 Reducing ECC Circuitry Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
xii
LIST OF TABLES
Table Page
2.1 Experimental results for Uniqueness test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 PUF Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Truth table for 2-bit Log-switch architecture example . . . . . . . . . . . . . . . . . 74
4.2 Summary of Results on Machine Learning Resistance of Strong
PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1 Definition of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Area estimates of proposed voting scheme using Nangate Cell Library
[52] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 Low process variation results for various SRAM cell
configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Area estimates of proposed circuit alternatives using Nangate Cell
Library [52] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
xiii
LIST OF FIGURES
Figure Page
1.1 Primary classification of PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 SRAM PUF [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Strong PUF authentication: after challenges 1 and 2 are applied, they are
deleted from the server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Arbiter PUF [40] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Arbiter PUF used for analysis [40] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 HD estimate error with 100-bit and 1000-bit response . . . . . . . . . . . . . . . . 15
2.3 HD estimate for d100 when d1000 ≤ 10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Illustration of testing procedure for Strong PUF . . . . . . . . . . . . . . . . . . . . . 20
2.5 Yield loss and process variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Effect of fault size on Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Illustration of uniqueness testing procedure for Weak PUF . . . . . . . . . . . . 25
3.1 Non-linear Voltage Transfer Characteristic under process variation . . . . . 30
3.2 Proposed circuit. (a) Non-linear VTC block and (b) Complete circuit
diagram of a 64-bit PUF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Voltage difference of VTC blocks at each stage in a 64-bit PUF . . . . . . . . 32
3.4 Machine learning resistance of proposed VTC PUF and Arbiter PUF
[40] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Response errors due to voltage variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
xiv
3.6 Reliability enhanced circuit: (a) Circuit modification and (b) bias
circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Reliability of modified circuit: (a) Response errors with supply
voltage variation and (b) Response errors with temperature
changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Machine learning resistance of Reliability enhanced PUF . . . . . . . . . . . . . . 38
3.9 Histogram of PUF metrics: (a) Uniqueness, (b) Uniformity and (c)
Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.10 Methodology for PUF simulation: (a) Generation of VTC curves
database (b) Generation of CRPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11 Circuit of VTC block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.12 Distribution of Gradient Boosting prediction rate . . . . . . . . . . . . . . . . . . . . 47
4.1 Overview of PUF Machine Learning Attack . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 PUF Machine Learning Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Function Composition Representation of Strong PUFs . . . . . . . . . . . . . . . . 54
4.4 Representation of PUF using Non-Linear Tables . . . . . . . . . . . . . . . . . . . . . 55
4.5 Results for various machine learning attacks on cascaded switch
architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Prediction rate for 4 bit table Function Composition structure . . . . . . . . . 58
4.7 Results from GA ML attack for various table sizes and training
sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.8 Change in Modeling accuracy with Gradient Boosting on Cascaded
switch architecture against Table size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.9 Comparison of Gradient Boosting ML attack on Uniform and
Normally distributed values for a table size of 16 . . . . . . . . . . . . . . . . . . 61
4.10 Comparison of Gradient Boosting ML attack on Uniform and biased
distribution for table size of 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.11 Comparison of XOR and Cardinality of function . . . . . . . . . . . . . . . . . . . . . 63
xv
4.12 Impact of XOR: Results from Gradient Boosting ML attack . . . . . . . . . . . 64
4.13 Modeling accuracy of Feed-Forward non-linear PUF and Table size
S=8 with Gradient Boosting ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.14 PDF of non-linear VTC PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.15 ML attack comparison between Gradient Boosting and SVM [75] for
non-linear VTC PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.16 Log-switch Strong PUF architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.17 2-bit Log-switch architecture example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.18 Comparison of prediction distribution (Gradient boosting) for Table
size S = 16: (a) Cascaded switch structure and (b) Log-switch
structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.19 Log-switch architecture with additive delay elements . . . . . . . . . . . . . . . . . 77
4.20 Prediction rate distribution for 64-bit Additive log structure with
high entropy allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1 Typical Weak PUF based key generation setup . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Traditional 6T SRAM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Enrollment procedure for SRAM PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 UP/DOWN counter based voter scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Modified SRAM cell for multiple evaluations . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6 Error rate results from simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Markov Chain model for the voter scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.8 Comparison of error rate reduction for TMV and UP/DOWN
counter(UDC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.9 Expected number of trials to reach saturation (decision) in a 4-bit
UP/DOWN counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.10 Testing/DFT method for identifying high error-rate cells: (a)
Testing/Enrollment and (b) Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 99
xvi
5.11 Histogram of number of times SRAM cell was read before
saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.12 Simple SRAM with cross-coupled inverters . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.13 SRAM cell modified for simulating multiple power-ups . . . . . . . . . . . . . . . 107
5.14 SRAM cell with only pull-down network and active resistive loads . . . . . 109
5.15 SRAM cell with stacked active loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.16 SRAM cell with parallel active loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.17 SRAM cell with current mirror loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.18 Flipping voltage comparison between Reference and D5 . . . . . . . . . . . . . 116
xvii
CHAPTER 1
INTRODUCTION
1.1 Disorder based Security Primitives
Worldwide spending on information technology already exceeds 3.5 trillion dollars
per year as our lives become increasingly dependent on electronic systems [60]. In-
creasing reliance on electronic systems makes security breaches more costly in terms of
financial losses, loss of privacy and safety. Many applications, ranging from financial
systems, health care systems, social administration systems to cyber-physical systems
need to securely authenticate and identify users or system components. As many of
such system’s security primarily rely on the hardware they operate on, hardware
roots-of-trust are often assumed to be available. For example, a typical authentica-
tion system consists of a low-cost token such as Smartcard, or RFID tag that stores
digital key(s) and a server. The server is typically resource rich and physically secure.
The token by contrast is lightweight and its security relies on the secret key in it.
Traditionally non-volatile memory (NVM) was used to store the secret keys. Un-
fortunately, various security vulnerabilities of hardware primitives based on non-
volatile key storage have been exposed. It has been demonstrated that an adversary
having physical access to such tokens is able to mount various forms of attacks on it,
including side-channel attacks such as power measurement, semi-invasive attacks such
as fault injection by over-clocking and even invasive attacks such as de-capsulation,
de-layering and probing to access information stored in the token [2] [21]. In his
keynote speech at 2011 CRYPTO conference, Ron Rivest said “merely calling a bit-
string secret, does not make it so; rather it identifies it as an interesting target for an
1
Figure 1.1: Primary classification of PUFs
.
adversary” [61]. Indeed history is replete with public outing of secret keys. Attack
on secret-key stored in Infineon TPM chip for Xbox is one such example [51].
As an alternative to NVM based secret storage, silicon Physically Unclonable
Functions (PUFs) were proposed [21]. One of the main motivations for development
of PUFs is its promise to protect secret keys. In principle, PUFs rely on a physically
disordered system. It is assumed that no one – not even the PUF manufacturer can
clone or duplicate such disorder and any attempt at invasive attack will upset that
disorder. This comes from the fact that current Integrated circuits (IC) manufacturing
fabrication is complex, inducing process variations to create such unclonable property
in each PUF. Therefore PUFs, at least in theory are all unique and distinct from each
other that cannot be deconstructed by invasive attack.
Definition: A PUF P implements a unique function fP (c) that maps a m-bit
input challenge c ∈ {0, 1}m to a n-bit response r, where r ∈ {0, 1}n. The tuple (c, r)
is called a challenge-response pair (CRP) of the PUF P. A set of CRPs for PUF P
defines a CRP Table.
2
1.2 Classification and Application of PUFs
The principle of using random intrinsic physical features for authentication is
not new; biometric authentication dates back to 18th century [24]. Unclonability
figures prominently in development of currency notes [4, 66]. Synthesizing these two
principles led to the development of unclonable physical random functions [56]. Since
then a number of researchers have explored construction of PUFs exploiting physical
disorder. Optical PUF [56], coating PUF [72], RF COA [14], LC-PUF [23], SRAM
PUF [26], Arbiter PUF [40], butterfly PUF [36], reconfigurable PUF [49], controlled
PUF [20], phosphor PUF [30] are examples to name a few such PUFs. Silicon PUFs
were first proposed by Gassend et al. [21]. In the last 12 years, a large number of
silicon based PUFs and their applications have been proposed. Despite of this, certain
practical issues still exist in large scale deployment of PUFs.
1.2.1 Weak PUFs
PUFs with a limited challenge-set, (in many case just one response) are classified
as Weak PUFs. Weak PUFs are primarily used to create digital secret keys analogous
to non-volatile memory (NVM) based keys but with higher secure features. As men-
tioned before, secret keys generated from Weak PUFs are destroyed by any invasive
attacks thereby creating far more secure key storage than NVM. As Weak PUFs have
few CRPs, they cannot be used directly in authentication systems, as an adversary
can easily mine the entire CRP table. Hence the key generated by Weak PUF is
assumed as a secret for application involving deployment of Weak PUFs.
Example - SRAM PUF: SRAM PUF is a widely studied Weak PUF [26, 39].
They are based on SRAM cells that are constituent of embedded memories, typically
consist of cross coupled inverters connected by access transistors. Figure 1.2 shows
a typical 6-Transistor SRAM cell. Due to intrinsic process variations, a SRAM cell
on start-up would typically settle in either of logic-0 or logic-1 value. The settle-
3
Figure 1.2: SRAM PUF [26]
.
ment state is determined by mismatch in process variations in the cell transistors.
Settlement to consistent yet random states allow values from multiple cells to be col-
lected for use as a key or identifier. The key generated during enrollment phase is
registered as the ideal key. An SRAM PUF is expected to produce this key each and
every time in power-up operation. Unfortunately, noise during start up can impact
the settlement state of the PUF resulting in unreliability. Specifically, cells with low
mismatch (between cross-coupled inverters) due to process variations are more sensi-
tive to noise than cells with greater mismatch. Cells with greater mismatch produce
sufficient differential drive to overcome any impact of noise. Along with various noise
sources, variations in ambient conditions and supply voltage and parametric changes
due to aging of the transistors also impacts reliability.
IntrinsicID, an industrial manufacturer of SRAM PUF reports that approximately
10% to 20% of SRAM bits power-up up to different state [65]. This leads to noisiness
of a PUF response. An analogous situation also occurs in bio-metrics with noisy data.
To address this problem in bio-metrics, secure sketch was developed as a reconstruc-
4
# Challenge Response
1 c1 r1
2 c2 r2
3 c3 r3
4 c4 r4
. . . . . . . . .
Figure 1.3: Strong PUF authentication: after challenges 1 and 2 are applied, they are
deleted from the server
tion tool. The area overhead needed to implement such reconstruction increases with
the error rate of the silicon PUF. Hence key extraction can involves significant logic.
Depending on the required performance it can easily range from 5, 000 to 10, 000 gate
equivalents (area of standard 2-input Nand gate) [74]. Hence reliable digital output
generation from Weak PUFs are of paramount importance.
1.2.2 Strong PUFs
In contrast to Weak PUFs, Strong PUFs offer an extremely large number of CRPs
and ideally have a complex mapping between challenges and responses. As Strong
PUFs have exponential number of CRPs, an adversary who has temporary access to
the PUF cannot store the complete CRP table. In addition, the authentication agent
does not repeat the same challenges when the PUF is deployed. This prevents direct
replay attack by a man-in-the-middle attacker. In this model, since the adversary
does not know in advance the challenge(s) that will be issued during verification and
the adversary cannot store the complete CRP table, the adversary cannot possibly
build a forged PUF that can supply the correct response. An implicit assumption in
this protocol is that the response from the PUF for a random challenge cannot be
predicted by the adversary. However, it has been shown by many researchers that
by using machine learning algorithms on a limited set of CRPs [63], a prediction
model can be built which can be simulated to produce a response. If the prediction
accuracy is sufficiently high, then a forged PUF can indeed be built from a good PUF
5
Figure 1.4: Arbiter PUF [40]
.
which breaks an authentication system with a large probability of success. This is a
critical issue as such attacks indirectly remove the “unclonable” property of the PUFs.
Hence a large number of works have focused on solving this problems in addition to
researchers using machine learning techniques to attack them [35, 63, 48, 33, 75].
Example - Arbiter PUF: Arbiter PUF [40] is one of the earliest proposed Strong
PUF which is well studied and used for construction of other PUFs. Arbiter PUF
shown in Figure 1.4, has multiple delay elements, challenge signals and an Arbiter at
the end which creates a 0/1 digital signal. When challenges are applied, two unique
paths are chosen depending on the challenge and a common signal is allowed to race
through these paths. As the delay of each element is set by process variation, each
instance of the PUF has a unique challenge-response mapping. Also the challenge
application creates exponential number of paths to the Arbiter. The Arbiter at the
end resolves the response to logic-0 or logic-1 depending on which signal arrives first.
As the final delay of the signals that arrives to the Arbiter is a linear sum of individual
delays, machine learning techniques were able to model the PUF easily [63]. Hence
its is vitally important that the challenge-response mapping is complex enough to
protect against modeling attacks.
1.2.3 Other PUFs
In order to address the hard constraints on creating modeling resistant Strong
PUFs, other types of PUFs were proposed [20] [49]. Controlled PUFs rely on Strong
PUFs but aims at preventing man-in-the-middle attack. In this case, an attacker who
6
can observe the CRPs can build a model if the Strong PUF is not complex enough
to model. Controlled PUF protects the Strong PUF through a access algorithm [20].
Public PUF (PPUF) is another PUF proposal in which the model of the Strong PUF
is made public. It operates under the constraint that no numerical model can create a
response as fast as the original Strong PUF, thereby creating a timed-authentication
protocol [49]. As these PUF variants still rely on Strong PUFs, improvement in
Strong PUFs would directly improve these variants.
1.2.4 Application of PUFs
Weak PUFs are envisioned as natural replacement for NVM key storage [21] but
with higher security. Also various other applications such as prevention of over-
production of ICs [1], software licensing and IP protection [67] have been proposed
with Weak PUFs. Strong PUFs as mentioned before have been proposed for authen-
tication and identification but key exchange protocols based on Strong PUFs have
also been proposed [56] [73]. Various security techniques and protocol based on PUFs
assume that the underlying PUF meets certain properties which are discussed next.
1.3 Ideal Properties of PUFs
PUFs can be typically created by designing circuits which harness the manufactur-
ing variations. The number of challenge-response pairs (CRPs) for each PUF depends
on the physical implementation. Irrespective of the physical implementation, PUF
circuits are expected to exhibit high Uniqueness, Reliability and Unpredictability(
security).
1.3.1 Security/Unclonability
Unclonability is the most important requirements for PUFs in ensuring that it
is practically impossible to clone a PUF. In Weak PUFs, this implies that it should
be impossible to guess the key generated in a PUF. In Strong PUFs, the security of
7
the PUF implies the complexity of modeling the challenge-response mapping. If the
mapping is easy to model, a software model can be created which can act as a rogue
PUF that is indistinguishable from the original PUF.
1.3.2 Uniqueness
Each PUF is expected to have a unique input-output mapping. Low uniqueness
among a population of PUFs indicates that the probability of finding similar map-
ping in the population increases. This directly reduces the security of any system
deploying PUFs as the promised property - “unclonability” is affected. In addition,
a semiconductor manufacturer can suffer high yield-loss to manufacture PUFs with
high uniqueness. This can stem from the systematic bias and ability of the PUF
circuit to harness the process variations. Hence, uniqueness is implicitly related to
security.
1.3.3 Reliability
As circuit characteristics vary with environmental conditions, the mapping of the
PUF could also vary. This impacts the usability of the PUF circuit. Hence PUFs
are expected to provide stable responses. Reliability is important in Weak PUFs
as low reliable keys have to be corrected with area-expensive error-correction and
thereby increasing the cost. Unreliability in responses of Strong PUFs results in the
scenario where the authentication threshold for protocol using Strong PUFs have to
be lower 100% accuracy. This reduce the accuracy requirement for modeling attacks
and thereby reducing the security.
Along with these PUF related metrics, other standard metrics such as area, power
and ease-of-integration into standard design flow should also be satisfied by the PUF
circuit.
8
1.4 Scope of this Work
Building secure protocols is often inter-disciplinary with contributions from math-
ematicians, cryptographers to hardware designers working with necessary abstractions
to build successful systems. This abstraction allows hardware researchers and design-
ers to optimize the implementation of security primitives. Our work is of this nature,
improving PUFs, an emerging hardware primitive through various hardware design
techniques. Even within hardware design, modern IC design relies heavily on design
abstraction from architecture specification, RTL design to circuit design, manufac-
turing and testing. This necessitates improvements in various aspects of hardware
design to make a new concept practical. In addition, PUFs are based on physical dis-
order and in principle different from conventional number-theory based cryptographic
primitives [62]. Since they are based on process variation for their core functionality,
hardware design and implementation is of paramount importance for secure design.
In this spirit, we tackle a variety of problems - from design, testing to security anal-
ysis of PUFs. In particular, we aim at improving security, reliability and uniqueness
of Weak and Strong PUFs. We believe this research is of merit, as it aims at solving
core problems and also at understanding the limitation of PUFs.
1.5 Dissertation Outline
This dissertation document is organized as follows. In Chapter 2, we discuss the
problem of testing PUFs for uniqueness and present an efficient testing technique to
improve the uniqueness of a population of PUFs. In Chapter 3, we present and analyze
a new PUF design which has orders of magnitude machine learning resistance than
traditional Strong PUFs . We further investigate the problem of machine learning
resistance in Chapter 4 and present related experimental results. In Chapter 5, we
present techniques to improve the reliability of PUFs. We conclude the work in
Chapter 6 with insights into future works.
9
CHAPTER 2
IMPROVING UNIQUENESS OF PUFS
2.1 Introduction
PUFs are expected to have high uniqueness for satisfying the promise of “un-
clonability”. Such property cannot be ensured by design alone. Since PUFs rely on
manufacturing process variations, there are no guarantees that two PUFs will never
have identical properties. Therefore, testing becomes necessary to screen out PUFs
that violate the above property. Despite decades of research on PUFs, there has been
scant attention to the problem of testing PUFs for uniqueness.
A manufacturing plant can produce millions of PUFs. Along with random process
variations which enable PUFs their core functionality, systematic variations can re-
duce the uniqueness of PUFs. Lithographic lens de-focus is one well known such issue
[37]. In addition, as manufacturing process are typically tuned to minimize variations
for benefits of other regular circuits, the amount of process variation reduces with
each iteration of a particular fabrication process. Hence part of these PUF circuits
produced are expected to have lower uniqueness.
A straightforward method to ensure high uniqueness in a population of PUF chips
produced is to collect responses from all PUFs and perform an offline comparison.
This process requires multi-socketing where the chips are tested multiple times though
the testing production line. This increases the testing cost and thereby the cost of
each chip. Previous works on evaluating PUF characteristics have primarily focused
on design time evaluation [46] which can be used for multi-socketing directly. They
are not suitable for high-volume testing of manufactured PUFs as the decision to
10
accept or reject a part, or sort the parts into separate bins should be made at the
tester. Hence new testing techniques and design-for-testability methods tailored for
high-volume testing are needed. These techniques should have minimal testing time
to minimize cost. To address this problem, we propose methodologies for testing
uniqueness and design-for-test (DFT) mechanisms to support these methods. Our
primary contributions are:
• We propose a technique to estimate the uniqueness of PUF circuit under test.
We use multi-index hashing along with a hybrid scheme for time and yield
efficient test.
• We propose a method to test Weak PUFs using the pre-existing hardware cryp-
tographic blocks.
• We present experimental analysis of impact of process variation and manufac-
turing faults on uniqueness to understand the sources of low uniqueness.
2.2 Related Works
Systematic methods and metrics to evaluate PUFs were proposed by Maiti et al.
[46]. Majzoobi et al. have proposed techniques to evaluate PUFs and have shown
that security of various PUFs such as linear PUFs and feed-forwards PUFs may not
be adequate [48]. These metrics and evaluation techniques are suitable for analysis
of PUFs during design phase or pre-high volume manufacturing phase. Due to the
computation and data intensity involved, the techniques are suited for benchmarking
and offline evaluation. As tester time in high-volume manufacturing directly con-
tributes to cost, the above metrics and techniques cannot be practically employed to
detect manufacturing faults. Hence designing methods and DFT techniques tailored
for hardware testers are required and are the objective of our work. In related con-
text, Built-in-self Test (BIST) based technique to test Fuzzy Extractor of Weak PUFs
11
Figure 2.1: Arbiter PUF used for analysis [40]
.
have also been proposed [13]. Hussain et al. have proposed an online built-in-self-test
scheme to evaluate unpredictability and reliability of PUFs [28] but do not address
the problem of testing uniqueness of PUFs.
2.2.1 Metrics for Uniqueness
Various metrics to determine the quality of PUF have been proposed previously
[46]. As testing of PUF involves assessing those qualities, similar metrics can be used.
Hamming distance (HD) based metric can be used to evaluate the uniqueness. Inter-
class HD (dinter) is a metric for quality assessment, which averages the hamming
distance of multiple responses of various chip [46] and is given by the equation below:
dinter =
2
m(m− 1)
m−1∑
p=1
m∑
q=p+1
Hamming Distance(Rp, Rq)
k
(2.1)
where m and k are the number of PUF instances and number of challenge bits used,
respectively. Rp and Rq represent responses from a pair of PUF instances.
2.2.2 Arbiter PUF for Analysis
The test methods we propose are agnostic to specifics of Strong PUF design. How-
ever, we base our analysis on Arbiter PUFs shown in Figure 2.1 which was discussed
in Chapter 1. Without loss of generality, we focus our discussion on a 64 inputs and
single output PUF. The proposed techniques are easily adapted to other classes of
PUFs such as light-weight PUF [47] and XOR Arbiter PUF [68] which use Arbiter
PUF as the core building block.
12
2.3 Testing Strong PUFs
In this section we discuss the problem of uniqueness testing, with emphasis on
Strong PUFs.
2.3.1 Problem Statement
A single manufacturing facility can manufacture millions of PUFs; they are re-
quired to be tested for uniqueness. For illustration of the problem, let us assume that
Npassed is the population size of the chips that have been tested for uniqueness and
passed by the tester. Let us also assume that n-bit responses were collected from each
chip to evaluate the uniqueness. Let Ri be the n-bit response of the current chip, i,
under test. The problem at hand is (i) how to compare the response Ri of the current
chip with responses from the previously passed Npassed chips in real-time and (ii) how
to make a pass/reject decision for uniqueness based on the responses. In a reasonable
solution, the tester has to search for any other previous response which is within the
Hamming radius, r, from the current response, where r is the similarity threshold for
rejection. If there already exists a passed chip within similarity threshold, the current
chip has to be rejected. This is a search problem in Hamming space.
2.3.2 CRPs for Uniqueness Evaluation
The search time of the above problem depends on the length of the responses
collected from the chips. This topic merit a discussion.
Arbiter PUFs have exponential number of input combinations which makes ex-
haustive challenge application for any evaluation impractical. For practical evaluation
of metrics using simulation, 10, 000-13, 000 CRPs have been suggested as statistically
significant subset [34]. However, evaluating using CRPs in the order of 10, 000s is not
practical for manufacturing testing. Increase in test application time, storage require-
ment at tester end, Built-in-self-test (BIST) area overhead, and searching through
large volumes of data in real time are some of the practical constraints. To under-
13
stand the required number of CRPs for evaluation we present some experimental
results below.
First we simulate 1000 pairs of chips and evaluate their Hamming distance for
variable response length. Let n be the length of response of the PUF circuits (n-bit
response). Assuming a 64-input Arbiter PUF with 1-bit response, n 64-bit challenges
have to be applied to obtain a n-bit response. As 10, 000 CRPs are statistically
significant, we assume the Hamming distance computed withN = 10, 000 as reference.
Next, we evaluate the error in evaluation of Hamming distance using N = 1000 and
N = 100 CRPs. Random challenges were used to generate the result. As shown in
the Figure 2.2, mean error for N = 1000 is as low as 2.47%, whereas for N = 100 it is
around 8.1%. Hence, if just 100-bits are used for Hamming Distance (HD) evaluation
the above error can get translated as yield loss or faulty chips being passed (faulty
chip is defined as a chip with low uniqueness in a chip population. Due to the smaller
error estimates, N = 1000 bits can be chosen as empirical response length for practical
tester evaluation. This can result in significant search time as explained in subsequent
subsections.
2.3.3 Fast Search Problem
Consider an example where each test chip is characterized by a 1000-bit response.
Each and every time the tester collects these 1000-bits, it has to compare it against
existing database and check for any other 1000-bit response previously collected (from
other chips which have been passed). As low-cost testing ideally requires single sock-
eting, where each chip is tested just once, the decision is a one time process. Also, the
test application time should be minimized to minimize production cost. For example,
let us assume there are Np chips which have been passed and whose responses are
stored in database. One trivial solution would be to compare the current 1000-bits
with all the Npassed response which would require an O(n) run-time. This run-time
14
(a)
(b)
Figure 2.2: HD estimate error with 100-bit and 1000-bit response
.
15
grows with number of chips tested and can become prohibitively large as the manu-
factured chips number is in the millions.
Hash table/dictionary based storage and retrieval can also be employed, but would
be efficient only for exact pattern matching. For example, for a 1000-bit response,
all the combinations of the response which are at most 10% HD would differ at most
by 100 bits. This would require
(
1000
100
)
look-ups which is a prohibitively large number
(≈ 10139). Similar issue exists even when we use only 100-bit response directly with
hash table (1013 look-ups for 10% HD). Our solution aims at reducing the run-time
of this testing problem.
2.4 Proposed Solution
In this section, we present our proposed solution for uniqueness testing. All ex-
perimental results presented in this work were generated using Arbiter PUF with
a linear delay model [63]. Arbiter PUF consists of switch components connected
serially. Each switch component contributes one of the two differential delays: dif-
ferential delay from straight connection and differential delay from cross connection.
These delay differential values of straight and cross connections are chosen from an
independent normal distribution with mean of 0 and standard deviation of 1 [63].
2.4.1 Main Idea
Our method has two main objectives: (i) To determine the uniqueness of a chip
while testing it and make a pass/fail decision by comparing the chips response to an
existing population of chip response. This decision should be fast to minimize test
time and (ii) also, achieving acceptable yield loss and false-accepts (number of faulty
chips that are passed).
For the first objective, we use multi-index hashing [54] to reduce search time. This
search is performed on fewer numbers of bits than collected (100 bits in our case).
16
However, using just 100 bits can cause yield loss/false-accepts as explained in previous
section 2.3.2. So, we exploit the correlation between estimation of HD between using
N = 100 bits and N = 1000 bits. We use this correlation data to guide the search
radius.
2.4.2 Multi-Index Hashing for Rapid Search
Multi-index hashing(MIH) is a fast search technique to search nearest neighbors
in Hamming space [54]. The discussion on MIH in this subsection is based on the
previous work by Norouzi et al. [54]. MIH primarily relies on dividing the binary
code into multiple disjoint sub-strings and creating multiple hash tables to speed up
the search. MIH search has sub-linear run-time for binary codes which have a uniform
distribution.
The working principle of MIH search is described below with an example from
our PUF uniqueness testing problem. For a 100-bit response of chips, we would like
to find all neighbors which are at 10 Hamming distance from the 100-bit response
(a threshold of 10%). In MIH search, the 100-bit response is divided into multiple
disjoint sub-strings. Let us assume that each 100-bit response is divided into five
20-bit sub-strings. If two 100-bit responses differ by at most 10 bits, at least one
of the corresponding sub-strings differ at least by 10/5 = 2 elements. This can be
generalized as follows: if two binary strings of length N differ by r, and if we divide
them into b sub-strings, then at least one of the substring differs by
⌊
r
b
⌋
elements.
The proof of this lemma is derived from Pigeonhole principle [54]. Now, 5 hash tables
are created with five of these sub-strings as key. For each query, the query is once
again divided into sub-strings and all these 5 hash tables are searched for r = 2
radius neighbors. So, instead of 100C10 or 1.7 ∗ 1013 lookups, the number of lookups
reduce to 5 ∗ 20C2 or 950 lookups. Along with the look-ups, each matched sub-strings
has to be verified whether it is a true r-distant neighbor or not. This divide-and-
17
Figure 2.3: HD estimate for d100 when d1000 ≤ 10%
.
conquer approach results in tremendous speedup with ability to search millions of
records of 128-bit codes within a second for a search radius of 30 bits. Also, the
algorithm ensures exact match without any approximation, which results in low yield
loss from the test technique. Further details on theoretical analysis and performance
estimations can be found in related work [54].
2.4.3 Improving Test Quality
Even though MIH search ensures exact search in Hamming space, as we are using
only 100 bits for evaluating the PUFs uniqueness, there could be substantial yield
loss and false-accepts. Instead, we perform a r radius search using MIH, where r
is larger than the required threshold. Once candidates are shortlisted, the complete
1000 bits are used to filter out false matches. The rational to chose a search radius
larger than the require threshold is explained next.
Let us consider an example in which 1000 pairs of Arbiter PUF circuits are sim-
ulated under process variation. The goal is to estimate an acceptable search radius
18
for creating the hybrid approach. We estimate two Hamming distances, one with 100
random challenges and another one with 1000 random challenges. Let d100 and d1000
be the corresponding HDs calculated. In Figure 2.3, we plot the distribution of d100
for cases when d1000 ≤ 10%. Even though many of the samples of d100 are greater than
10%, still the distribution is bounded by maximum estimate of 20%. This primarily
arises from the fact there are limited number of inputs in the Arbiter PUF (64 in our
case) and the quantization at the Arbiter in the circuit.
Hence, a search radius of 20-bits can be used instead of 10-bits as a guide in the
fast search to shortlist the candidates. These candidates are further evaluated using
the complete 1000-bit response as there may exist false positives (cases where the
HD is more than 10%). This evaluation minimizes the yield loss and faulty parts to
acceptable numbers. This empirical guidance creates a hybrid approach where test
time is minimized without impacting test quality. Using just 100 bits for searching
ensures fast search and using the complete 1000 bits of information ensures the quality
of test on the shortlisted data from the fast search.
To generalize the solution, the first step in testing involves searching through the
database using MIH with a search radius larger than the required HD threshold.
This search is performed on a reduced response size (100-bits in our example). This
steps yields a list of possible candidates for further evaluation. Next, the true HD
threshold is used to compare the complete response (1000-bit in our example) to
determine whether a similar chip already has been passed. Pass/reject decision is
made based on this step.
2.4.4 Uniqueness Test Procedure
The complete procedure for testing uniqueness is illustrated in Figure 2.4.
• Step 1 : Required number of patterns are applied (through Tester or BIST). We
use 1000 patterns in our experiments.
19
Figure 2.4: Illustration of testing procedure for Strong PUF
.
20
• Step 2 : The results are collected and scanned-out to the tester.
• Step 3 : The tester uses a substring of the response bit to perform MIH search.
Substring length is 100 in our example.
• Step 4 : The shortlisted candidates are analyzed for accurate estimate of HD.
The complete responses of the chips are used.
• Step 5 : If there exists a chip with response with search radius lower the deci-
sion threshold, the current chip is rejected. Else, the chip is accepted and the
response is added to the database.
2.4.5 Associated DFT
In order to create responses from the circuit under test, a common random pattern
must be applied across all the chips. This common random pattern can be generated
at tester and can be applied through a scan-chain. Since only 1000 patterns are
required for evaluation, the test application time is low. For example, to apply 1000
random patterns through a single scan chain, only around 65,000 cycles are required.
This corresponds to less than 1 ms for test clock of 100 MHz. In contrast, multiple
scan-chains can be employed depending on the time/cost/pin-count trade-offs.
2.5 Experimental Results
2.5.1 Test Quality
In order to test the effectiveness of the proposed method we performed the fol-
lowing experiment. We simulated a low yield process (a process resulting in low
uniqueness) by controlling the standard deviation of the normal distribution for gen-
erating the differential delay values of Arbiter PUF. For the experiments, we assume
10% Hamming distance as the uniqueness threshold. We control the manufacturing
process in the simulation to yield an average HD of 10%. This would result in a case
21
Table 2.1: Experimental results for Uniqueness test
Case % of chips
Bad and rejected 47.55
Good and accepted 52.44
Good but rejected (yield loss) 0.00
Bad but accepted (faulty chip) 0.01
where half of the chips would be rejected due to low uniqueness on an average, thus
simulating a low yield process. We implemented the MIH search in Python and sim-
ulated the problem for 100,000 chips. The statistics of the test result are tabulated in
Table 2.1. The test technique has low yield loss due to the hybrid scheme proposed.
Also, it has very low false-acceptance ratio, where chips with low uniqueness were
passed incorrectly. The empirical run-time for our 100,000 chip simulation was 0.55
seconds. As the implementation was programmed in scripting language, this is a very
pessimistic estimate. As the run-time efficiency of the MIH search has been analyzed
in previous work [54], we do not focus on time efficient implementation (using faster
languages like C/C++). Also, previous work has already shown that the MIH search
scales well even for data set of billion values with search time less than 0.15 second.
2.5.2 Yield Loss
In this subsection, we show the relation between low process variation and yield
loss during uniqueness testing. The results were generated for 1000 chips with 1000
random pattern. The standard deviation of the differential delay of the PUF are
varied to simulate the process variation.
Metrics like inter-class Hamming distance are typically used to determine the
uniqueness of a population. Nevertheless in practice yield-loss is an important met-
ric in high volume manufacturing. Hence, to study this, the following experiment
was performed. Consider an example where a population of Arbiter PUF chips are
22
Figure 2.5: Yield loss and process variation
.
produced and the chips are tested sequentially using the proposed method. Let the
criteria for rejecting two chips be similarity of less than 10% HD. We simulated this
experiment and calculate the amount of yield-loss for different amount of process vari-
ation. The result is plotted in Figure 2.5. As expected, the yield loss reduces with
increase in process variation. Experimental analysis similar to this can be performed
to test the efficiency of the testing scheme and understand the other sources of yield
loss.
2.5.3 Impact of Systematic Faults
Similar to low process variation, systematic faults can also cause low uniqueness.
Consider the Arbiter PUF shown in Figure 2.1. Let Di,a and Di,b be the differential
delay of element i under challenge i = 0 and i = 1, respectively. Let us assume
that there exists systematic error in manufacturing system which creates large delay
faults. A large-delay fault is said to occur when the delay Di,a and Di,b are far larger
than typical delay of other elements of Arbiter PUF. The impact of such delay on the
23
Figure 2.6: Effect of fault size on Uniqueness
.
uniqueness on a set of chips is shown in Figure 2.6. We plot the change in inter-class
HD with fault size. The results were generated for 1000 chips with 1000 random
patterns. As shown in Figure 2.6, the inter-class Hamming distance of the population
decreases with increase in fault size. This directly implies an increase in yield-loss
with the fault size.
2.6 Testing Weak PUFs
Usage model for strong and Weak PUFs are different. Weak PUFs are primarily
used for key generation while Strong PUFs are used for authentication. Consequently
low uniqueness leads in strong and Weak PUFs lead to different concerns.
Unlike Strong PUFs, if two Weak PUFs result in similar but not identical keys,
it is not a concern. Since the keys are used along with cryptography algorithms, the
resulting output will not be similarity preserving. For a 128/256-bit key, even in a
low process variation manufacturing process, the probability of the exact same key
24
Figure 2.7: Illustration of uniqueness testing procedure for Weak PUF
.
25
in two chips is fairly low. Hence, two Weak PUF chips can be considered similar
unless they are same. In contrast to Strong PUFs, the search problem in Weak PUFs
has different constraints. Weak PUFs have the requirement that the key should
not be brought out during testing as the key can be compromised in any section
of the supply chain. Considering these requirements and constraints, we propose a
uniqueness testing method for testing Weak PUFs below.
The primary aim of the method is to compare whether manufactured Weak PUF
keys are unique, ensuring the secrecy of the keys. We use the existing hardware
structures to achieve this. As Weak PUFs are primarily used with cryptographic
blocks, we can use the cryptographic blocks for testing. The proposed scheme is
shown in Figure 2.7. During testing, the tester sends a number to the chip under
test. The number is encrypted/hashed with the cryptographic block already present
in the chip using the secret key. This response is sent back to the tester and the tester
checks whether the same response exists in the database. If the same response exists
the chip is rejected; else it is accepted and the new response is added to the database.
The search problem is exact-string-matching rather than searching in the Hamming
space as is the case for Strong PUFs. This search can be done in O(1) using hash
tables/dictionary. As pre-existing encryption hardware blocks are used for testing,
the area overhead for any DFT is negligible.
2.7 Conclusion
PUFs are promising as hardware root-of-trust. High volume manufacturing (HVM)
of PUFs require test methods to evaluate their uniqueness. Current solutions are for
offline analysis. They are not suited for HVM, which requires real-time comparison
against all previously tested PUFs. In this work, we have proposed a scalable test so-
lution for uniqueness testing of Strong and Weak PUFs in high-volume manufacturing
setting and demonstrated its practicality.
26
CHAPTER 3
IMPROVING MACHINE LEARNING RESISTANCE OF
STRONG PUFS
3.1 Introduction
Strong PUFs are a subclass of PUFs which have exponential challenge-response
pairs and are aimed at authentication applications. It has been shown that many
Strong PUF designs are vulnerable to machine learning (ML) attacks, where a model
can be built to predict PUF response to any input after training with the observa-
tions. These attacks have necessitated design of ML attack-resistant Strong PUFs
and machine learning attack analyses to ensure sufficiently secure designs. In this
chapter, we propose a ML attack-resistant PUF design based on a circuit block to
implement a non-linear voltage transfer function. The proposed circuit is simple,
exhibits high uniqueness and randomness. Further improvements are proposed to
enhance PUF reliability. The simulation results indicate a significant improvement
in ML attack resistance in comparison to traditional PUFs. Along with the new de-
sign, we also propose fast simulation methodology and analyses based on Gradient
Boosting algorithm. This methodology facilitates analyzing the security of the PUF
against potent machine learning attacks.
3.2 Modeling attacks on Strong PUFs
Despite of the inherent advantages of PUFs, machine learning based modeling at-
tacks have exposed the vulnerability of Strong PUF circuits [63]. A machine learning
model trained with a certain number of responses from PUF circuits, can predict the
27
future PUF response with high degree of success. Arbiter PUFs were initially shown
to be vulnerable to ML attacks [41]. Digital modifications were proposed to increase
the machine learning resistance but machine learning techniques such as Support Vec-
tor Machine (SVM), Logistic Regression and Evolutionary Strategies have been used
to mount attacks with increasing success [63]. Kalyanaraman et al. have proposed a
machine learning resistant PUF based on non-linear operation of leakage current of
MOSFETS [33]. The circuit they proposed relies on difference between two arrays of
transistors which are in sub-threshold region to generate responses. Exponential de-
pendence of leakage current on supply voltage and temperature is well known. Hence
these circuits have reliability issues with variations in temperature or supply volt-
age. Kumar et al. have presented a circuit that relies on non-linear current mirrors
to generate machine learning resistant PUF [35]. The current sources used in the
simulation were assumed to be ideal current sources which in practical circuit can
experience voltage and temperature variations. So the impact of using ideal current
sources for simulation on the reliability metric is not clear. This motivates the need
for further investigation into design of modeling attack tolerant PUF circuits that are
also robust with respect to variations in environmental conditions.
Due to the threat of machine learning attacks, the strength of a Strong PUF is
defined by the complexity of modeling input challenge to output response. But there
are still no Strong PUFs that are complex enough to be ML-resilient and remain so
over a range of environmental conditions such as voltage and temperature. In this
work we propose a new PUF to solve this problem. In addition to the novel design, we
also propose efficient analysis methodology using fast software model based on spice
simulations. Taking advantage of the software model we perform further exploration
into the machine learning resistance of the proposed PUF and present novel insights.
Our primary contributions in this work are
28
• We propose and perform in-depth analysis of a promising non-linear PUF - the
Voltage Transfer Characteristics (VTC) PUF.
• We propose an efficient analysis methodology for such PUFs by building a fast
software model based on HSPICE simulations.
• Using the fast software model we perform machine learning analysis for a pop-
ulation of this PUF using potent Gradient Boosting meta-ensemble algorithms.
3.3 Proposed PUF Circuit
In this section, we describe the proposed circuit and discuss its machine learning
resistance against SVM algorithm.
3.3.1 Main idea
The traditional delay-based PUFs rely on delays of two paths to create challenge
response set [40]. The total delay in each path is sum of delays of each delay element.
The delay of one element does not directly affect the delay of any other element in
the path. This linear delay model of the PUF circuit makes it vulnerable to machine
learning attacks. In our case, we aim at creating a PUF by cascading circuit blocks
which have a non-linear Voltage Transfer characteristics (VTC). The basic idea is
that, as the input and output of each block are voltage signals and as each block has
non-linear VTC, cascading them creates a complex input-output mapping.
For example a non-linear VTC for the basic block is shown in Figure 3.1. The
multiple plot-lines in the figure represent VTC under different process variation cor-
ners. The VTC shown in Figure 3.1 can be realized by a simple 3-transistor circuit
shown in Figure 3.2. For our discussion let us assume that the supply rails are at
Vdd and 0V. In the circuit, the transistors M1 and M2 act similar to an inverter.
The transistor M3 acts as a feedback transistor whose gate is connected to the node
out. The PMOS M3 ensures that the VTC curve does not saturate to 0V when input
29
Figure 3.1: Non-linear Voltage Transfer Characteristic under process variation
.
voltage nears supply voltage Vdd. If the VTC curve saturates to 0V, the VTC would
be similar to an inverter and cascading multiple blocks would saturate the final out-
put to either 0V or Vdd. For example, as input voltage tends to Vdd, the output
would decreases towards 0V due to the inverter transistors M1 and M2 but the cur-
rent through PMOS M3 increases as the output voltage tends towards 0V, thereby
increasing the output voltage. This VTC is similar to VTC of pseudo-NMOS circuit
but using the feedback transistor M3 along with inverter gives better control of slope
of the curve.
3.3.2 Operation of the circuit
The VTC of the circuit is sensitive to process variations occurring in the transis-
tors. For example, the variation of the VTC curve under different process variation
instance is shown in Figure 3.1. Both the slope and shape of the curve vary with pro-
30
(a)
(b)
Figure 3.2: Proposed circuit. (a) Non-linear VTC block and (b) Complete circuit
diagram of a 64-bit PUF
.
cess variation. When such blocks are cascaded, final output becomes highly sensitive
to the process variation in each block.
The complete circuit of the proposed PUF is shown in Figure 3.2. The circuit
presented has a 64-bit input challenge and a single bit output. Each block consists of
the three transistor circuit shown in Figure 3.2. The outputs of a pair of such blocks
are connected to a 2-input switch. For example for stage i, the output of the pair
of blocks are xi and yi. Depending on the challenge input Ci, the outputs xi and
yi are connected to inputs of the blocks in stage i+1. The switches are created by
simple transmission gate based circuit. Thus by cascading these blocks a PUF circuit
with input challenge bits of any length can be created. The input node for the blocks
in first stage is connected to Vdd/2. Such input signal can be easily created with a
voltage divider circuit. Variation in creating the initial input signal does not affect
31
Figure 3.3: Voltage difference of VTC blocks at each stage in a 64-bit PUF
.
the normal operation of the PUF circuit. The differential output of the blocks in
last stage is measured to create a single bit output signal. A voltage sense amplifier
is used to determine the final output. For example if the differential output at final
stage is positive the output of sense amplifier resolves to logic-1 and vice verse.
Since expressing the output in closed form equation is hard, we present a graphical
illustration of the PUF characteristic. Consider the Figure 3.3, in which the differen-
tial output xi-yi is plotted for each stage i for a 64 stage PUF circuit (the plotlines are
displayed as continuous lines for better readability). The differential output at each
stage evolves in a complex manner varying each stage. The two different plot-lines
represent two different process variation instances for the same challenge input. The
dissimilarity between the two plots represents the dependence of the PUF circuit on
process variation. As a result the sense amplifier creates a response of logic-0 for
one process and logic-1 for another. The complex evolution of this differential signal
32
Figure 3.4: Machine learning resistance of proposed VTC PUF and Arbiter PUF [40]
.
increases the machine learning resistance and sensitivity of process variation of the
circuit.
3.3.3 Experimental Settings
In this section we describe the core experimental settings used in this work. The
circuit simulation platform is 45nm predictive technology model [77]. The process
variation is modeled as threshold voltage variation with a normal distribution con-
sistent with ITRS [11]. The circuits were operated at nominal supply voltage of 1V
and temperature of 25 C.
3.3.4 SVM Machine Learning Resistance
Several machine learning techniques such as logistic regression, evolutionary tech-
niques and support vector machines (SVM) have been used to attack PUFs [63].
Hence, we first use SVM for comparison and due to their favorable property in mod-
33
eling non-linear problems. Support vector machines are non-probabilistic, linear clas-
sification technique for binary classification problems. SVMlight machine learning
tool was used in our experiments [31]. The SVM tools rely on choosing appropriate
kernels depending on the problem. In our case we have chosen radial basis function
RBF kernel as it is more suited to model non-linear problems [33][35].
In order to model the PUF circuit for SVM machine learning, the parity vectors
may have to be derived [41]. As the switch selection architecture of our circuit is
similar to traditional Arbiter PUF, the parity vector derivation remains similar as
in Arbiter PUF. The mapping of sample space to vector space have been derived in
detail in previous publications and are omitted her for sake of brevity [41]. The ma-
chine learning resistance of the proposed circuit is shown in Figure 3.4. The reduction
in prediction error with the number of training samples is shown. The training set
(challenges and collected responses) were chosen randomly. To estimate the predic-
tion error, a set of 50,000 randomly chosen challenges were used. For comparison,
prediction error of a 64-bit Arbiter PUF is also plotted [40]. From the figure it is
evident that the proposed PUF is orders of magnitude more resistant to modeling
attack than delay based Arbiter PUF. Even with a training set of 100,000 samples
the prediction error is as high as 20.8 %. From the above results and discussion, the
PUF circuit displays excellent improvement in machine learning resistance which was
the primary design motivation. We further explore the ML resistance in Section 3.6
and in Chapter 4. Other metrics to assess uniqueness and reliability are presented
next.
3.4 Reliability Evaluation
In this section we evaluate the reliability of the proposed circuit and present a
reliability enhancement technique to compensate for the response errors due to supply
voltage variations.
34
Figure 3.5: Response errors due to voltage variation
.
PUF circuits are expected to provide a stable response for the challenges over a
range of temperature and supply voltage variation. If the circuit is sensitive to tem-
perature or voltage noise it can result in authentication failure. The circuit proposed
above relies on non-linear VTC block that are cascaded. As the shape of VTC of
circuits is sensitive to supply voltage, any supply variation can impact the circuits
operation. Also due to the non-linearity described above, the voltage variation can
reduce the reliability of the circuit considerably. For example, the percentage of er-
rors with voltage variation is shown in Figure 3.5. To generate the data, a random
set of 100 challenges were used and their responses were collected over the range of
supply voltage of +/- 10% variation. The responses were then compared to the ideal
responses which were characterized at supply of 1V and at 25C. As shown in Figure
3.5, the reliability of the circuit with supply voltage variation is low. The error is as
high as 33% for supply variation of 10% of nominal supply voltage. A simple circuit
change to correct for the noise is discussed below.
35
Figure 3.6: Reliability enhanced circuit: (a) Circuit modification and (b) bias circuit
3.4.1 Circuit modifications for Reliability Enhancement
Consider the circuit shown in Figure 3.6. This circuit is similar to the original
circuit except the extra footer transistor M4. The gate of the NMOS M4 is connected
to a bias signal which is linear function of Vdd. The bias signal can be easily generated
by a simple resistive divider shown in the same figure. In original circuit whenever
there is a drop in supply voltage, the output voltage also reduces (in comparison to
output under nominal Vdd). In order to compensate for the drop at the output,
the footer transistor is added. As the bias signal reduces with Vdd, it increases the
resistance of transistor M4 thereby stabilizes the output of the block through negative
feedback. The bias generation circuit is sized to reduce the impact of process variation.
The bias generation circuit can be common to all the blocks or can be spread out
throughout each block. A centralized bias generator would reduce the efficiency to
correct for high frequency supply noise but would be easier to control the process
variation impact. Also it will result in lower area and power. We name the modified
circuit as reliability enhanced/compensated circuit.
The percentage error with temperature and voltage variation for an instance of
the reliability enhanced circuit is shown in Figure 3.7. The reliability has significantly
36
(a)
(b)
Figure 3.7: Reliability of modified circuit: (a) Response errors with supply voltage
variation and (b) Response errors with temperature changes
.
37
Figure 3.8: Machine learning resistance of Reliability enhanced PUF
.
improved with a maximum error rate of only 4% even for voltage variation as high
as 10% and industrial temperature range of 0 to 85C. Even though we have not com-
pensated for temperature variation explicitly, the circuit inherently exhibits tolerance.
In comparison, Arbiter PUF has been demonstrated with error of 4.8% over half the
temperate range of our simulation [40]. More analysis of reliability with intra-class
Hamming distance metric is presented in next section. Thus simple feedback changes
in circuit can be used to enhance the reliability. In Figure 3.8, the machine learning
resistance of the reliability-enhanced circuit is shown along with the unreliable circuit
and Arbiter PUF circuit. The machine learning resistance of the reliability-enhanced
circuit has reduced negligibly and is significantly better than Arbiter PUF.
38
3.5 Analysis of PUF Properties
3.5.1 PUF Metrics
In previous sections, the machine learning resistance and reliability were analyzed.
In this section we present results of other PUF metrics for the proposed circuit. The
reliability-enhanced circuit is used for all the results presented here on. Uniqueness,
uniformity and reliability metrics are plotted in Figure 3.9 and the average values are
tabulated in Table 3.1.
3.5.1.1 Uniqueness
A PUF design should create different response in each chip instance. This property
is known as uniqueness and we use inter-class Hamming Distance (HD) as a metric
to assess the uniqueness. 100 different PUF instances with 10000 randomly chosen
challenge bits were used to evaluate the uniqueness.
3.5.1.2 Uniformity
Uniformity measures the ratio of zeros (or ones) to total bits measured for mul-
tiple challenges. Bias towards one or zero reduces the randomness and makes the
output predictable and easier to attack. Ideally number for zeros and ones should be
equal, with ideal uniformity of 0.5. We evaluate the uniformity with 10,000 different
challenges over 100 different PUF process instances. The histogram and results are
presented in Figure 3.9 and Table 3.1 respectively.
3.5.1.3 Reliability
In previous section only one instance of PUF circuit was used to evaluate the
reliability of temperature and supply voltage variation. Here we analyze the reliability
for the compensated circuit in detail. The PUF circuit was first characterized at
supply voltage of 1 V and temperate of 25 C. The responses for 100 random bit
sequences were collected. 25 different operating conditions were simulated by varying
39
the supply voltage between 0.9 to 1.1 volt and temperature in the set 0, 25, 50, 75, and
85 C and the responses were collected. This process was repeated for 100 different
PUF instances. The reliability is calculated with the metric intra-class Hamming
distance. For the wide operating condition, the circuit has an excellent intra-class
Hamming distance of 0.021 (ideal value is 0.0).
The distribution and ideal values for the above metrics discussed are plotted in
Figure 3.9 and Table 3.1 respectively. From the distribution and table we can conclude
that the PUF circuit exhibits high uniqueness and reliability. In our experiments we
have assumed that the sense amplifiers process variation is minimal. High bias in
sense amplifier reduces the uniformity but special comparator circuit based on offset
cancellation to tackle this issue has already been used [28].
3.5.2 Other Metrics
In this section we present other metrics evaluated on the proposed circuit. The
machine learning resistance of the circuit was discussed in previous sections. With
100,000 CRPs for training, the prediction error is as high as 21% for the 64-bit
proposed PUF. This is comparable to previously published PUF which uses 80 stages
and has 30 % error rate for 100,000 CRPs [35]. The proposed circuit consumes 100
micro-Watts of dynamic power operating at a frequency of 100 MHz. Similar to [35],
this circuit also consumes static current. Power gating can be employed to minimize
static current as authentication is infrequent. Accurate area estimation based in
silicon implementation is part of proposed future work. However, for reference, each
stage of the proposed circuit features 20 transistors in contrast to 32 transistors in
[35].
40
(a)
(b)
(c)
Figure 3.9: Histogram of PUF metrics: (a) Uniqueness, (b) Uniformity and (c) Reli-
ability
.
41
Table 3.1: PUF Metrics
Metric Ideal value Mean value for proposed circuit
Uniformity 0.5 0.501
Inter-class HD 0.5 0.498
Intra-class HD 0.0 0.021
3.6 Fast PUF simulation methodology
Many previous works have used a single instance of PUFs to analyze the machine
learning resistance [33, 35]. Unfortunately, analyzing a single PUF for machine learn-
ing resistance does not reveal the true machine learning resistance as the modeling
difficulty of each PUF varies due to process variation. Hence, it is necessary to ana-
lyze the machine learning resistance for a population of the PUF rather than a single
instance. Unfortunately, simulating PUFs at SPICE level to collect large amount of
CRPs for machine learning analysis can be computationally expensive. In particu-
lar, analog PUFs with non-linearity have higher SPICE simulation time than simple
PUFs. Due to these factors, analyzing a population of PUFs for machine learning
is impractical unless a fast methodology is used. This warrants the need for devel-
oping methodologies for fast simulation of the VTC PUF. Our methodology aims at
achieving this and is discussed next.
Our proposed methodology is shown in Figure 3.10. The aim of the methodology
is to build a software model which can be used for fast generation of large amount of
CRPs. Our modeling relies on the fact that the VTC blocks of the PUF are cascaded,
where output of one block is the input of another block. Hence, we can create a
software program to simulate the cascading process where each block’s VTC curve
varies with process variation. Hence the fundamental step is to model the variation
of the VTC curve of each block with respect to process variations. To achieve this,
42
(a) (b)
Figure 3.10: Methodology for PUF simulation: (a) Generation of VTC curves
database (b) Generation of CRPs
we pre-characterize the VTC curves using SPICE simulation across a wide range of
process parameters and use them to create a software PUF model.
Consider the three-transistor VTC circuit shown in Figure 3.11. Since process vari-
ation is modeled as threshold voltage variation, each circuit block has three threshold
voltage parameters corresponding to the transistors M1, M2 and M3. The thresh-
old voltage variation is typically modeled as normal distribution with a standard
deviation of 53 mV and a mean of 0 mV [11] for 45 nm technology. We characterize
threshold voltage variation for a wide range of around +/- 3 standard deviation val-
ues. We vary the threshold voltage in steps of 5 mV from range of -150 to 150 mV
for each transistor. Due to the 5 mV quantization, each threshold voltage parameter
can take 61 possible values between −150 mV and 150 mV. This results in 226, 981
(=613) possible VTC curves for the range of process variation. In our simulation
methodology, we run spice simulation and characterize these 226, 981 VTC curves.
43
Figure 3.11: Circuit of VTC block
.
For each curve, the input-output of the VTC curve is mapped at a resolution of 1 mV
from the range of 0 mV to 1000 mV(as the supply voltage is 1000 mV). These VTC
curves for each threshold corner are pre-characterized using SPICE simulations and
stored in HDF5 [70] format with the three threshold voltages as the query values.
The power supply was set at 1 V for the SPICE simulations to generate the database.
In order to simulate a single 64-bit VTC PUF, first we generate process variations
for the 128 blocks in each PUF (2 blocks per challenge bit). The threshold voltage
variation values are rounded as multiple of 5 mV and are fetched from the VTC
database. As the operation of the VTC PUF is a cascaded process, as explained
before, we use a python script to mimic this process. The initial block of the PUF
is started with an input value of 500 mV as in the case of the actual circuit. The
blocks are connected depending on the challenge applied and the output values of
each block are propagated through the cascaded chain. The final values at the end of
the last two blocks are digitized to mimic the sense amplifier. The threshold voltage
quantization step size of 5 mV for threshold voltage variation can lead to inaccuracy.
Hence, we use linear interpolation to improve the accuracy. The accuracy of the
model in comparison to SPICE simulation is around 99%. The simulation time to
44
generate 150, 000 CRPs for a PUF was around 95 min in comparison to days for
SPICE simulation. Hence this model is suited to create large amount of CRPs for a
population of PUFs for machine learning analyses.
3.6.1 Gradient Boosting Machine Learning
Along with SVM, we tested the machine learning resistance of the proposed PUF
against Gradient Boosting algorithm. We present the results in this subsection. Per-
formance of Gradient Boosting algorithm in comparison to other machine learning
algorithms is discussed in detail in Chapter 4. Boosting falls under the class of en-
semble meta-algorithms. In problems of classification or recognition, an ensemble
meta-algorithm consists of combining the predictions of a set of individually trained
classifiers yielding a single classifier. Generally, the resultant classifier is more ac-
curate than any of the individual classifiers that belongs to the ensemble. Bagging
(Bootstrap Aggregating) and Boosting are two popular techniques for creating ac-
curate ensembles [5, 17, 64]. Boosting algorithms relies on learning an ensemble of
weak classifiers. The algorithms converges towards a strong classifier in an iterative
manner by combining the ensemble of weak classifiers by assigning weights to each.
The parameters and weights are assigned by analyzing the data and prediction rate
of the classifiers. This iterative process results in a classifier in the end which is
stronger than each weak classifier. Gradient Boosting and Adaptive Boosting are
popular methods of Boosting algorithm. In this work we use Gradient Boosting as it
performed better that adaptive Boosting for the PUF proposed.
As mentioned before, due to process variations each PUF instance has different
learning difficulty. This necessitates analyzing a population of PUF for any practical
estimate of ML resistance of the PUF. Hence we aim at performing this experiment
for a population of PUF. We used scikit-learn tools [57] for Gradient Boosting imple-
45
mentation in this work. The learning rate was fixed at 0.08 and number of estimators
were fixed at 512 as this gave the best results empirically.
3.6.2 Results
In order to evaluate a population of the PUF, we used the software model discussed
previously to create 500 different PUF instances. The process variation for each PUF
instances and the blocks were generated for 45 nm technology [77]. We generated
100, 000 CRPs for training and 50, 000 independent CRPs set for testing each PUF.
The training time was 12 min. The distribution for the prediction rate for 500 PUFs
is shown in Figure 3.12. The mean prediction rate of the population was 91.7%.
The maximum prediction rate was as high as 99.92% for 500 PUFs. For this
set of 500 randomly generated PUFs, we were able to model 9 PUFs (2% of the
population) with prediction rate higher than 98%. This is significant because the
inter-class hamming distance for the VTC PUF is 0.02%. This indicates that the
authentication threshold employing this PUF cannot be more than 98% without any
error-correction. This signifies that these PUFs in this population will be attack prone
if used in a system directly. Also, from the results, it is evident that Gradient Boosting
performs significantly better at modeling the proposed VTC PUF than SVM.
3.7 Conclusion
Strong PUF circuits have been shown to be vulnerable to modeling attack using
machine learning techniques. We proposed a novel PUF circuit which relies on a non-
linear voltage transfer characteristic to improve machine learning attack resistance.
To improve reliability against power supply noise, we further proposed a simple circuit
alteration that is shown to be highly effective. Simulation results indicate excellent
PUF properties including uniqueness, reliability and high tolerance against SVM ma-
chine learning attacks. We also investigated the machine learning resistance against
46
Figure 3.12: Distribution of Gradient Boosting prediction rate
Gradient Boosting algorithm. Results indicate Gradient Boosting was able to model
the PUF with far higher accuracy than SVM despite the fact that the proposed PUF
is orders of magnitude ML attack-resistant than the widely discussed Arbiter PUF.
We investigated the machine learning resistance further and this topic is presented in
Chapter 4.
47
CHAPTER 4
INVESTIGATING MACHINE LEARNING RESISTANCE
OF STRONG PUFS
4.1 Introduction
As discussed in Chapter 3, Strong PUFs, despite their promise, have not lived
up to the expectation for their intended use in authentication. If an attacker in
possession of the CRPs can create a software model which is indistinguishable from
the original PUF, the PUF can be forged, compromising its security. Even though
Strong PUFs have practically intractable size of CRP table, this software model,
having sufficient prediction accuracy, alleviates the need for an attacker to mine the
entire CRP to break the PUF. Machine learning based software models have been
typically used to mount such model building attacks in Strong PUFs [63]. As shown
in Figure 4.1, typically an attacker with access to a subset of CRPs of Strong PUFs,
creates a machine learning model and predicts the future response of the PUF (out
of the CRP set possessed). The attack model for such an attacker assumes that the
attacker gets hold of this subset of CRP either through (i) security flaw through the
supply chain where a subset of CRP is mined through temporary possession of the
PUF or (ii) by observing an insecure channel through which an authentication agent
and the PUF communicate in a system where the PUF is deployed.
Seminal work by Ruhmair et al. deployed machine learning techniques to model
various classes of Strong PUFs successfully [63]. Research effort to increase modeling-
attack resistance of Strong PUFs have been ad hoc as was the VTC PUF proposed in
Chapter 3. Circuit designers do not receive any useful feedback from successful ML-
attacks about aspects of their circuit in need of improvement. In this work, our goal
48
Figure 4.1: Overview of PUF Machine Learning Attack
is to gain that insight. Typically, Strong PUF research has progressed in a way where
designers and attackers often design and attack PUFs independently in sequential
manner. Instead, we aim to realize insights with the aid of modeling, experiments
and design both from designer and attackers point of view as shown in Figure 4.2.
In this chapter, first we present results on machine learning experiments on an ab-
stract PUF model using Support Vector Machines (SVM), Logistic Regression (LR),
Bagging, Boosting and Evolutionary techniques to establish criteria for machine learn-
ing resistant Strong PUF design. With the aid of these experiments, we show that
if certain randomness properties can be met, cascaded switch structure based Strong
PUFs can indeed be made machine learning (ML) attack resistant against known ML
attacks. Next, we investigate a new structure for Strong PUFs which shows significant
improvement in machine learning resistance for a given entropy budget in comparison
to the traditional cascaded structure PUFs.
The main contributions of this work are:
49
Figure 4.2: PUF Machine Learning Analyses
• We establish the foundational principles that guide ML-attack resistance in
Strong PUFs that are based on cascaded structures. We present results illus-
trating that they can be made ML-attack resistant under certain randomness
criteria of the circuit fabric.
• We systematize construction and analysis of Strong PUFs based on the principle
of function composition that aids in the analysis of entropy source of a PUF
and relate that to its ML-attack resistance.
• We demonstrate ensemble meta-algorithms based machine learning (Bagging
and Boosting) to be more effective than traditional Logistic Regression and
SVM based attacks on Strong PUFs.
• We propose a new architecture - Log-switch architecture which increases ma-
chine learning resistance significantly in comparison to cascaded switch archi-
tecture for a given entropy budget.
50
4.2 Methodology
In this section, we briefly discuss the machine learning algorithms used in this
work. Selection of a machine learning algorithm is important, as history is replete
with cases where a PUF which was initially thought to be modeling-attack resistant,
later crumbled under a different ML algorithm. While most previous studies rely
on Logistic Regression (LR), Support Vector Machines (SVM), or Evolutionary Al-
gorithms (EA) [63] [33] [75], we also deploy a new class of ML algorithms, namely,
ensemble meta-algorithms. As our results will show, Bagging and Boosting ensemble
meta-algorithms are powerful for mounting modeling-attack. For completeness, we
detail the various ML parameters used in our study.
4.2.1 Evolutionary Algorithms
Evolutionary Algorithms (EA) are a set of population-based meta-heuristic opti-
mization algorithms. In this work, we make use of Genetic Algorithm (GA), a subset
of EA, to attack Strong PUFs. Other works on modeling attacks on PUF have used
Evolution Strategies (ES) for the same purpose [63]. ES represent the possible solu-
tions to a problem as vectors of real numbers. Since, in our implementation, PUFs
are modeled as tables of integers (to be detailed in next section), we take advantage
of GA which is designed to handle integer and binary string solutions. GA mimics
biological evolution similar to ES and utilizes analogous concepts like reproduction,
mutation, recombination/crossover and selection.
In this work, we use an open-source toolkit, Pyevolve [58] for GA experiments.
We set the number of generations to obtain the solution at 100, a crossover rate of
80% and parent selection rate of 80 for each generation. The mutation rate, τ is fixed
for GA at
√
n, where n is the total number of parameters (table entries in our case,
as explained latter) in the PUF.
51
4.2.2 Bagging and Boosting
Bagging (Bootstrap Aggregating) and Boosting are machine learning algorithms
that belong to ensemble meta-algorithm approaches [5, 17, 64]. Ensemble learning
represents a technique of combining the predictions of several classifiers to generate
a robust classifier.
4.2.2.1 Bagging
The Bagging algorithm extracts multiple versions of the input training set using
bootstrapping and uses these versions as the new learning sets. Several estimators are
designed independently and then, their results are averaged (for regression) or voted
on (for classification). This method tries to reduce over-fitting and variance leading
to smaller test errors and more stable predictions. Thus, it is best suited for strong
and complex models.
4.2.2.2 Boosting
In contrast to Bagging, the algorithm iteratively learns several weak classifiers and
assigns weights to them based on their learning accuracy, to add towards a final strong
classifier. Once a weighted weak classifier is added towards the final classifier, the data
is re-analyzed and mis-classified data points are given greater weights while correct
ones loose weight. Hence, future weak classifiers will focus on the previously mis-
classified data. With each iteration, the final classifier gets stronger and will be able
to predict results for the test data more accurately. Adaptive Boosting (AdaBoost)
and Gradient Boosting are two popular variants of this method.
In this work, we deployed Bagging ML using Matlab tools with the Tree classi-
fication method. The number of iterations was set at 300. All implementations of
Gradient Boosting were developed and evaluated using the scikit-learn tools [57]. The
number of estimators were set at 128 and learning rate at 0.01.
52
4.2.3 Logistic Regression and Support Vector Machine
Logistic Regression (LR) and Support Vector Machine(SVM) have been tradi-
tionally used for testing the modeling-attack resistance of Strong PUFs [63] [33] [35].
Hence, we discuss only the setting used in this work and refer readers to previous
works for further details. We use open-source python package scikit-learn [57] for
both LR and SVM. We set the inverse of regularization strength to value of 10−5 for
LR. Using radial basis function (RBF) kernel machines, SVM algorithms can model
non-linearly separable functions as linearly separable in higher dimensions. We utilize
such an approach for modeling the PUF using SVM, as it allows for greater response
classification accuracy.
4.3 Foundational Principles for Modeling-Attack Resistance
In this section, we investigate the foundational principles of modeling-attack resis-
tance of Strong PUFs. Strong PUF research has a history of convivial rivalry between
the “makers” who design the PUFs and the “breakers”, who break them via model-
ing. While this is an important aspect of research in security, where the strength of
any solution must be vetted openly; we approach the problem from a slightly different
angle. We ask, “what knowledge can we transfer from machine learning studies to
the PUF design?”, thus reversing the order. Specifically, we are interested in answers
to the following questions and experiments: (A) Given that the underlying electrical
basis of various Strong PUFs are all different, some compute on the basis of delay,
some on the basis of voltage and some on the basis of ON or leakage current, can we
extract any useful abstraction that will be meaningful to gain general understanding
of how to increase the modeling-attack resistance? (B) Given that nearly all PUFs
derived from the general Arbiter PUF structure have been broken, is the cascading
block structure itself fundamentally limiting? (C ) What are the preferred characteris-
tics of the circuit sources to enable modeling attack resistance? (D) In the framework
53
Figure 4.3: Function Composition Representation of Strong PUFs
of the abstract model, we also evaluate the benefits of digital non-linearities such as
XORs and Feed-forward vis-a`-vis native sources of entropy. We conduct a number of
experiments around these questions which turn-up some new insights.
4.3.1 Abstract model: Function Composition Model
Many of the previously proposed Strong PUFs, such as Arbiter PUF [40], non-
linear current PUF [35] and non-linear VTC PUF [75], share a common cascade
structure that is interspersed with switches to facilitate challenge to response map-
ping. We define this structure as cascaded switch structure. The circuit blocks that
are cascaded harness process variations – a property that is central to uniqueness of
PUFs. The switches perform selection, while the blocks provide a transfer function
determined uniquely by process variation.
Cascading of blocks provides a method for composing the overall function from a
set of sub-functions or blocks. For example, consider the non-linear current and VTC
based PUFs, which in principle, provide function composition with block selection.
54
Figure 4.4: Representation of PUF using Non-Linear Tables
Consider the Figure 4.3, which represents the above two PUFs in an abstract model.
The function fi(x) represents a process variation dependent voltage/current curve.
The initial value is a pre-determined current or voltage source. The functions are
composed depending on the challenges applied. For example, if the PUF has two
stages, then the function is composed such that the output is f1 ◦ f3 − f2 ◦ f4 for
the challenges {0, 0}. The final Arbiter is a voltage or current comparator which
resolves the output to logic-0 or logic-1 depending on which path has the greater
value. Arbiter PUF can also be represented using this function composition model
where each function is the sum of signal arrival time and delay of the switch from
process variation.
This abstract model to represent many Strong PUFs facilitates analysis of the
sources and techniques for modeling-attack resistance.
4.3.2 Effectiveness of Cascading Block Architecture
We perform the following experiments to understand the effectiveness of cascading
blocks architecture and qualities of the function needed to improve modeling attack
resistance. For the analysis, we first make the function f(•) a discrete function, as it
is easier to analyze. For example, each function can be defined using a look-up table.
55
The input to the table is the address to the table entry and the value in the table
entry is the output. Consider Figure 4.4, where each table is of size 4. The table has
values from 0 . . . 3 or 00 . . . 11, if expressed as binary values. The initial address is a
fixed value (for example, address 01). This is akin to starting with a fixed voltage
or current for the initial stage in the non-linear analog PUFs. The values in the
table represent the process variation in each function of the PUF instances. These
values vary from table to table and from chip to chip representing the overall process
variations.
The values in the table are assumed to be sampled from a uniform distribution.
For example, if the values are represented in decimal, for a table size of 4, we assume
that each table entry is chosen from uniformly distributed values in {0, 1, 2, 3}. If
the decimal numbers are represented in binary, we assume that the probability of
0s and 1s are equal. We assume uniform distribution as it assigns equal probability
to each value in the set, representing PUF circuits with high uniqueness. Thus, the
discrete function in the table represents a non-linear function whose elements in the
co-domain are equi-probable.
We are interested in studying the effect of increasing the amount of information
in each PUF, i.e., the size of each table. This experiment is of interest to verify
whether the cascading architecture is modeling-attack resistant, given sufficiently
large entropy in the PUF. A table of length S can be represented using log2S bits
per entry. If n = log2S, then each table has n ∗ 2n (= log2S ∗ S) bits. The results
of Bagging, SVM, LR and Gradient Boosting algorithms for different values of S are
plotted in Figure 4.5. The number of CRPs for testing is fixed at 50, 000. The
Gradient Boosting algorithm has the maximum prediction rate across S = 4, 32 and
128. When table size is small (= 4), the Gradient Boosting technique is able to
predict with accuracy > 99% for training set of 80, 000 CRPs and above. However,
for S = 128 the prediction rate is always less than 55% even for training set as high
56
(a) Table size = 4
(b) Table size = 32
(c) Table size = 128
Figure 4.5: Results for various machine learning attacks on cascaded switch architec-
ture
57
Figure 4.6: Prediction rate for 4 bit table Function Composition structure
as 100, 000 CRPs. Genetic algorithm (GA) based evolutionary method has generally
higher prediction rate than the other methods presented. As the variance of GA is
higher and due the random nature of convergence, we repeat the experiment 10 times
for each training set and report the best case, as shown in Figure 4.7. The prediction
rate for GA is around 60% for S = 128. The case with S = 128 represents highly ML
resistant Strong PUF tested across a set of ML algorithms.
The increase in cardinality of the discrete non-linear function increases the ma-
chine learning resistance. This arises from the fact that increase in size of table
increases the entropy. Shannon’s entropy is defined as
H = −
n∑
i=1
pi log2 pi (4.1)
where pi is the probability of an output value.
58
Figure 4.7: Results from GA ML attack for various table sizes and training sets
If the table size S=4, then for uniformly distributed values pi = 1/4 which results
in
H = −
4∑
i=1
1
4
log2(
1
4
)
= log2(4) = 2 (4.2)
Similarly, for table size S = 128, the entropy H = 7. From the above results, we
make two major observations. Our first observation is that the cascading block archi-
tecture is indeed effective to create ML resistant PUF given sufficient entropy in the
PUF. Our next observation is that the Boosting and Bagging ML algorithms are supe-
rior than conventional Logistic Regression and SVM based modeling, particularly for
highly non-linear PUFs. The change in prediction rate of Gradient Boosting method
with the increase in table size is shown in Figure 4.8. The Shannon entropy for each
table size is also shown. Gradient Boosting method has also been consistently effective
59
Figure 4.8: Change in Modeling accuracy with Gradient Boosting on Cascaded switch
architecture against Table size
with lower variance in prediction rates compared to Genetic algorithms. Hence, these
algorithms can potentially be used as the platform of choice for evaluating non-linear
PUFs.
4.3.3 Characteristics of the Circuit Sources
Previously, we made the assumption that the domain values of the function were
sampled from a uniform distribution to represent an ideal case non-linear function. To
study the impact of non-uniform distribution of the values, we conduct the following
experiment: we compare the machine learning resistance of uniformly distributed
values against values from normal distribution. The mean of the distribution is fixed
at the midpoint of the table size and a truncated Normal distribution is generated so
that 3σ values are within {0, S − 1}. The impact of the distributions are plotted in
Figure 4.9. Also, we investigate a case where the values in the table are represented
as binary values with bias. We simulate a biased distribution such that probability of
zeros is 0.3 and probability of ones is 0.7. It is clear that uniformly distributed data
60
Figure 4.9: Comparison of Gradient Boosting ML attack on Uniform and Normally
distributed values for a table size of 16
has higher machine learning resistance than biased data. This study illustrates that
bias in the function implemented reduces the machine learning resistance. Hence,
we make the following observations: (i) Non-linear functions increase the machine
learning resistance (ii) the function implemented by the circuit in each block should
be such that its outputs are, ideally, equi-probable for high ML resistance, and (iii)
Increasing the cardinality/entropy (or range) of the non-linear function increases the
modeling-attack resistance.
4.3.4 Impact of Digital Non-Linearity
4.3.4.1 XOR vs Cardinality of Function
Using XOR function to create the output from responses of multiple instances of
a PUFs has been shown to provide greater ML resistance [68]. In this section, we
perform a comparison between XOR implementation with multiple PUFs containing
tables of a given size against an area equivalent unmodified PUF featuring the same
61
Figure 4.10: Comparison of Gradient Boosting ML attack on Uniform and biased
distribution for table size of 16
total number of bits. Specifically, we study two implementations where the first
implementation has a table size of 22 (= 4) and three such PUFs’ outputs are passed
through XOR function to create the final output. The second implementation has a
table size of 23 (= 8) and both are shown in Figure 4.11. Both designs require the
same amount of bits (3072 bits) to express the discrete functions (tables). We perform
machine learning using Gradient Boosting and the results, plotted in Figure 4.12, show
that the XOR implementation performs worse than the PUF implementation with
increased table size. Hence, we infer that given sufficient non-linearity and uniform
distribution of the functions, a PUF with a larger table size has better resistance than
using XOR function.
4.3.4.2 Impact of Feed-forward Loops
Feed-forward loops in Arbiter PUFs reportedly increased ML resistance by in-
creasing the non-linearity [41]. We performed similar experiments using the above
62
(a) XOR
(b) Increased Cardinality
Figure 4.11: Comparison of XOR and Cardinality of function
function composition model with uniformly distributed non-linear function. The ex-
periments were performed on a 64 challenge PUF with 4 feed-forward loops with
68 stages(64 external challenges and 4 feed-forward internal challenges). Gradient
Boosting machine learning was used to study the resistance and the results are shown
63
Figure 4.12: Impact of XOR: Results from Gradient Boosting ML attack
in Figure 4.13. It can be observed that addition of feed-forward loops has minimal
impact on the ML resistance. This is primarily due to the fact that the functions
already provide significant non-linearity and the feed-forward loops do not increase
the non-linearity any further.
4.4 Discussion and Hardware Implementation
In this section, we report the major experimental findings from this research.
Then, we discuss hardware design principles to be followed to increase modeling-
attack resistance of PUFs. We also analyze the building block of an analog non-linear
PUF to understand the source of modeling-attack resistance.
4.4.1 Experimental Observations
First we present a summary of the general observations.
64
Figure 4.13: Modeling accuracy of Feed-Forward non-linear PUF and Table size S=8
with Gradient Boosting ML
(i) Non-linear functions increase the machine learning resistance. Composing non-
linear functions in cascaded switch architecture is shown to be effective in previous
work [35, 75] and this work.
(ii) Increasing the cardinality/entropy (or range) of the non-linear function in-
crease the modeling-attack resistance. Smaller tables (or if the range is continuous)
are still easy to model with current machine learning techniques.
(iii) If sufficient entropy is ensured, it is indeed possible to create Strong PUFs.
Hardware design to achieve such functions needs further investigations as initial de-
signs show promising results [35, 75].
(iv) With sufficient entropy from non-linear functions, the cascaded switch archi-
tecture with function composition construction ensures modeling-attack resistance.
(v) Bagging and Gradient Boosting algorithms have higher prediction accuracy
than SVM and LR to model non-linear Strong PUFs.
65
4.4.2 Hardware Implementation
From a design perspective, the implemented function in each stage of the PUF
can either be analog or digital implementation. Nevertheless, the function utilized
in the PUF should be ideally non-linear. Also, the functions are required to be
non-monotonic to prevent saturation of the PUF output. Digital (or discrete) non-
linear functions, which are step functions, can offer higher modeling-attack resistance
without suffering from saturation problem as they can be bounded. Digital functions
can be implemented using digital sources of entropy.
For example, consider a hypothetical case in which a 64-stage PUF is built using
non-linear tables. The value for the tables is collected from large array of SRAM PUF.
Let us assume the values generated from the SRAM array are uniformly distributed.
Let each table size be 32 and each table entry has 5-bits. Therefore, each stage of the
PUF has 32∗5∗2 = 320 bits. Hence, a 64-stage PUF would require 20, 000 bits of value
from SRAM PUF array with no bias in number of 1s or 0s. If such a SRAM PUF is
built with high reliability, then it is possible to construct this PUF which has machine
learning prediction error rate as high as 27% (or higher) as shown in Figure 4.5b
across a variety of machine learning algorithms. Such a hypothetical digital PUF
would have an area of 20K SRAM cells, but ensures high modeling-attack resistance.
Analog implementations can offer lower area footprints compared to digital for the
same amount of entropy. We investigate the characteristics of a nonlinear analog
function proposed for machine learning resistance [75] in next subsection.
4.4.3 Practical Evaluation of a Non-linear PUF
In this subsection, we present a practical analysis of a PUF design with respect to
the modeling-attack resistance. We analyze the non-linear VTC PUF that has been
proposed previously [75] and discussed in Chapter 3 to understand the source of ML
resistance. We simulate the unit analog cell of the PUF using SPICE simulations with
66
Figure 4.14: PDF of non-linear VTC PUF
the same experimental settings used in Chapter 3. The voltages are tabulated from
the VTC curve in steps of 1mV. The process is repeated for 10, 000 cell instances. The
data is used to generate a probability density function (PDF) of the output voltages,
as shown in Figure 4.14.
Observing the PDF shows us that it is not uniform and has inherent bias towards
some voltage values. This should make the design less resistant to modeling in com-
parison to a uniformly distributed function as discussed in previous sections. Hence,
Gradient Boosting algorithm was able to model the PUF with higher accuracy as
discussed in Chapter 3. The result for an instance of the VTC PUF, as shown in
Figure 4.15, illustrate that the PUF can easily be modeled with 92% accuracy with
Boosting in contrast to approximately 80% accuracy reported. Hence, non-linear
PUFs that do not possess uniform distribution can be still susceptible to ML attacks.
We believe that the analysis methodology presented here can be used by PUF circuit
designers to improve the modeling-attack resistance.
67
Figure 4.15: ML attack comparison between Gradient Boosting and SVM [75] for
non-linear VTC PUF
4.5 Exploring Alternative Structure
In previous sections, we investigated the basic principles to increase the machine
learning resistance using the cascaded switch architecture. Using the learning, we
present an alternative structure - Log-switch architecture to increase the machine
learning resistance. The new structure relies on learning on cascading non-linear
functions but in a novel structure to improve the ML resistance. We first describe
the construction of Log Switch architecture and then, present results which indicate
the advantage of using this architecture in comparison to traditional cascaded switch
architecture. We also present optimization specific to the structure to improve the
efficiency. The goal of this investigation is not to build a new PUF, but to explore
alternate structure which can be used to build new Strong PUFs.
68
4.5.1 Definition - Entropy Budget
We define the term entropy budget to aid our discussion. As described in the
previous section, using discrete functions for the blocks used in abstract model of the
PUFs is helpful in analyzing various structures. Hence, in theory, the whole discrete
non-linear table can be harvested from a digital entropy source such as SRAM PUF
[26, 39] as digital values. Hence, we define “entropy budget” as the total number of
bits required to construct the whole PUF as discrete tables. For example, if we use
a 4-value table, we would need 4 ∗ 2 = 8 bits per table. This translates to 8 ∗ 2 = 16
bits per stage and thereby 16 ∗ 64 = 1024 bits for a 64-bit PUF. The probability of
each such bit being 0 or 1 is assumed equal(0.5) unless stated. This is to create a
non-linear discrete function (represented as table) with equi-probable outputs due to
process variation.
4.5.2 Log-switch Structure - Function Composition
The proposed Log-switch architecture is shown in Figure 4.16. Similar to the
analysis used in cascaded switch architecture, we use the same discrete non-linear
functions based analysis to construct and analyze this structure. The basic building
block - unit cell is shown in Figure 4.16. and consists of two non-linear function
blocks and a selector. Depending on the challenge applied the signals are connected
straight or crossed. This unit cell is exactly the same as in each stage of the cascaded
switch structure. These unit cells are arranged as Logarithmic structure as shown in
Figure 4.16. This is in contrast to the cascaded structure where they are cascaded
back-to-back as the name implies. The 64-bit challenge PUF structure is levelized
and named level 1 through level 7. All the challenges are applied at level 1. The input
values for the selectors in unit cells in level 2 to level 7 are generated internally as feed-
forward values. Feed-forward values are generated by digitizing half of the signals in
previous level. For example, among the generated signals, the odd numbered (s1, s3)
69
Figure 4.16: Log-switch Strong PUF architecture
are propagated as input to functions in next level and the even numbered (s2, s4) are
digitized (using a comparator) to feed-forward values. The number of unit cells in a
given level are half of the number in previous level and thereby, creating a Logarithmic
structure.
The structure relies on (i) equi-probable challenge to output transition; (ii) feed-
forward structures and (iii) structure specific optimization to increase the machine
learning resistance which are elaborated below.
70
4.5.2.1 Equally weighted challenges
In the traditional cascade switch architecture, the challenges control the flow of
signal in a cascaded way. This increase the probability of output transition when
challenges at the right end flip (with appropriate transformations for controlability)
in PUFs based on this architecture [47, 48]. For example, if C63 flips it has higher
probability of changing the output response than C0. Whereas, in the proposed Log-
switch structure, the challenges are arranged such that the probability that output
flips when a challenge changes is equal for all challenges. This alleviates any bias
in challenge-response transition that comes from the structure and thereby, increases
the machine learning resistance.
4.5.2.2 Feed-forward loops
Feed-forward loops in Arbiter PUF have been shown to be an effective way of in-
creasing the non-linearity and thereby, machine learning resistance [41]. For example,
it has been demonstrated that when more than 2 feed-forward loops were added to
Arbiter PUF, machine learning using SVM and LR is impractical [63]. The prediction
rate for Evolutionary Strategies when trained with 50, 000 training CRPs is around
95% [63]. This has higher resistance than modeling Arbiter PUF with just 640 CRPs
using LR for the same prediction rate [63].
In section 4.3.4, we discussed that feed-forwarding does not increase the machine
learning resistance for the cascaded switch architecture when given sufficiently high
entropy. Nevertheless, feed-forward creates internal signals which are not directly
visible to the attackers. These signals can be used as internal challenges. This hardens
the problem of feature set selection and transformations for machine learning. Hence,
we use feed-forward PUFs as a natural way of reducing the number of functional
blocks in each level and create the Log structure.
71
4.5.2.3 Operation
All the challenges are applied at Level 1 and the blocks are crossed or connected
straight in Level 1. The corresponding functions are selected for each challenge bit.
A fixed initial value is used as starting value and the values are propagated using
function look-up similar to the abstract model described in section 4.3. Once the
values are generated in level 1, the internal feed-forward values are generated through
the comparators. These values in turn decide the connections of the unit cells in
level 2. This operation extends till the final level 7. The final output is created by
comparing the outputs of unit cell in final level.
To explain the operation and the structure in detail, consider a 2-bit PUF shown
in Figure 4.17. Since the PUF has only 2 inputs, it has two levels. The challenges
C0 and C1 are applied at level 1. The 4-valued table, representing a function, has
values populated from uniform distribution and is represented as binary values. The
operation of the PUF starts with an common initial value of 01. For example, consider
the case when the challenge C0,C1 is 0, 0 and hence enables straight connections in
the corresponding switches. The output of the left-most table is 01 for the input
value 01. As C0=1, this value is fed to X0. Similarly, X1,Y0 and Y1 takes the values
of 00, 10, 11 respectively. The values Y0 and Y1 are fed as input to the Arbiter. The
Arbiter creates an output value of 0 for the signal Cint by comparing Y0 and Y1.
As Cint is 0, the straight connection is enabled for the responses z0 and z1 creating
values of 11 and 10(which are generated from inputs X0 and X1). These two values
are compared and finally the output value of 1 is created. Similarly, outputs are
generated for other combinations of the input challenges. The output and internal
values for all the challenges are shown in Table 4.1. Larger 64-bit PUF operates
similar to this example, with 7-levels of the unit cells.
72
Figure 4.17: 2-bit Log-switch architecture example
In the subsequent sections, we explore two variants of the log structure and per-
form experiments to identify structure-specific entropy allocation to increase machine
learning resistance.
73
Table 4.1: Truth table for 2-bit Log-switch architecture example
Challenge Intermediates
Output
C0 C1 x0 x1 y0 y1 Cint z0 z1
0 0 01 00 10 11 0 11 10 1
0 1 01 11 10 00 1 01 11 0
1 0 10 00 01 11 0 10 10 1
1 1 10 11 01 00 1 01 10 0
4.5.3 Machine Learning Resistance
In this subsection, we describe experimental results and entropy allocation for
function composition based Log-switch structure for improving the efficiency.
We tested the machine learning resistance of the Log-switch structure using the
above discussed techniques and Gradient boost meta-ensemble algorithm had the best
prediction rate. Hence, we present the results based on prediction rate of Gradient
boosting. The non-linear tables were populated with values from uniform distribu-
tions. The table size was kept at 16 which requires 16 ∗ 4 = 64 bits for each table.
We generated 20 different instances of such PUF and tested using Gradient boost-
ing. The distribution for the machine learning prediction using Gradient boosting
for 100K CRPs is shown in Figure 4.18 and has an average prediction rate of around
62%. This is significant due to the high prediction error and reduction is entropy
budget as explained below.
4.5.3.1 Comparison to Cascaded Switch architecture
The Log-switch structure achieves significant machine learning resistance, but
with far lower entropy budget in comparison to cascaded architecture. For example,
with table size S = 16, around 62% modeling accuracy for Gradient boosting was
achieved for a 64-bit Log-switch architecture. For comparable prediction rate, the
74
(a)
(b)
Figure 4.18: Comparison of prediction distribution (Gradient boosting) for Table size
S = 16: (a) Cascaded switch structure and (b) Log-switch structure
75
cascaded switch architecture needed a table size of S = 64 for the discrete function.
The two associated prediction rate distribution is shown in Figure 4.18 Comparing
the number of bits required to generate tables for the PUF structures demonstrates
the reduction in area that is possible. The equivalent number of bits for the cascaded
switch architecture would be 6 ∗ 2 (for first stage)+63 ∗ 2 ∗ 6 ∗ 64 (for other stages)=
48, 396 bits. The first stage does not require the whole table as the input value is
fixed for the PUF. In comparison, Log-switch PUF requires 64 ∗ 8 (for first stage)+
63 ∗ 64 ∗ 2 = 8576 bits. Hence, Log-switch architecture reduces the entropy budget
by 5.6×. This directly corresponds to reduction in area for generating entropy.
4.5.3.2 Entropy Allocation
In our experiments we observe that allocating high entropy in first layer and
low/no entropy in other layers improves the design efficiency. For example, the Log-
switch structure is modified such that a fixed non-linear discrete table is used for all
stages except the first stage. This common table is sampled from uniform distribution,
but the same table is used for all levels through level 2 to level 7. For example, this
single table can be a single entropy source and can be used sequentially to generate the
PUF responses. The first level has 64 ∗ 2 = 128 different non-linear tables, as shown
in Figure 4.16. The mean machine learning resistance of this modified structure is
also around 62% for 20 instances. This implies that area for entropy generation can
be further reduced by using this optimization without significant reductions in ML
resistance. Additionally, for the first level just a single table entry value is required
instead of a table as the input to the first stage is a fixed value. Due to these, the
entropy budget reduces significantly. For example, the amount of bits required for
this modified structure is 64∗8 (first stage) + 64 = 576 bits. This is a significant 84×
reduction in entropy in comparison to cascaded switch architecture. This entropy
budget reduction translate to extreme area savings.
76
Figure 4.19: Log-switch architecture with additive delay elements
4.5.4 Additive Log Structure
Based on the entropy budget allocation we show another practical example using
Log PUF with additive delay elements instead of non-linear discrete function. This
additive delay element can be viewed as a simple case of function composition. For
example, consider the circuit shown in Figure 4.19. The non-linear table is replaced by
delay values. The delay process variation values were chosen from normal distributed
with mean 10 and standard deviation of 1. Arbiters used in Arbiter PUF [40] are
used to generate the feed-forward values. This Arbiter compares the arrival time of
two signals at the input and creates a digital output depending on which of its input
arrives first.
77
The operation of the structure is similar to the Arbiter PUF [40], but has internally
generated values. A transition is started at the input and depending on the value of
the process variation and the signal propagates along the network creating a zero or
one value at the output. The delay elements (DE) are in place to ensure the signal
propagation does not propagates before the digital value is resolved for the feed-
forward network. Similar to the previous function composition based Log structure,
the internal challenges are created during operation.
The mean prediction rate for this case under Gradient boosting technique was
87% for a training set of 100, 00 CRPs. This is comparable to the machine learning
resistance of VTC PUF and non-linear current PUFs [35, 75] (but using simple delay
elements). For 128-bit PUF, the mean modeling accuracy is 74% for 20 instances for
Gradient boosting technique, as tabulated in Table 4.2.
4.5.4.1 Entropy Allocation
As explained in previous section, allocating low/no entropy in lower stage reduces
the entropy budget. Furthermore, we observe that increasing the entropy in the first
layer also increases the machine learning resistance. We allocate delay value from
normally distribution to first level with mean of 100 and standard deviation of 3.16.
This was chosen to correspond to the case of 10 delay elements added sequentially.
This leads to normally distributed values with the above characteristics and signifies
the case of increase entropy. As a result of increasing the entropy by 10-folds under
the linear additive model, machine learning resistance increases to 72.84% for 64-bit
case and 63.30% for 128-bit case. The prediction rate distribution for 100 instances
of the circuit is shown in Figure 4.20. These results show significant promise for a
practical implementation of machine learning resistant Strong PUF using the Log-
switch architecture. All relevant results are tabulated in Table 4.2.
78
Figure 4.20: Prediction rate distribution for 64-bit Additive log structure with high
entropy allocation
4.5.5 Summary and Discussion
In this subsection, we present a summary of the general observations of the Log-
switch structure.
(i) Logarithmic arrangement of non-linear functions along with feed-forward sig-
nals show promise in improving machine learning resistance with far lower entropy
budget than cascaded switch structure.
(ii) Allocating low or no entropy in lower layers does not impact machine learning
resistance, but reduces entropy requirements.
(iii) Increasing entropy of first layer increases the machine learning resistance.
This was observed in both function composition and simple additive delay model.
(iv) Log-switch structure based PUF can indeed be made machine learning resis-
tant provided sufficient entropy from the non-linear function (with far lower entropy
requirements than cascaded switch structure).
79
Table 4.2: Summary of Results on Machine Learning Resistance of Strong PUFs
Structure
Prediction rate (%)
(Gradient Boosting)
Comments
Cascaded switch, S = 4 98.74% Broken, High accuracy
Cascaded switch, S = 16 76.06%
Single Instance
measurement
Cascaded switch, Biased
table and S = 16
81.5%
Bias increases prediction
rate
Cascaded switch,
S = 128
55.06%
ML Resistant, low
prediction rate
VTC PUF [75] 92.80%
SVM prediction rate was
80% [75]
Log-switch Structure,
S = 16
62.58%
ML Resistant, low
prediction rate, 20
instances
Log-switch Structure,
Common values for levels
2 to 7
61.10%
ML Resistant, low
prediction rate, 20
instances
Log-switch Structure,
Gaussian Additive delay
µ = 10, σ = 1
87.07% (64-bit), 74.62%
(128-bit)
Average of 100 instances
Log-switch Structure,
Additive delay µ = 100,
σ = 3.16 for level 1,
others levels fixed at
common values
72.84% (64-bit), 63.30%
(128-bit)
High entropy in level 1
increases ML resistance
4.6 Conclusion
Due to their functional uniqueness, low cost of implementation, and resistance to
invasive attacks, Strong PUFs have been envisioned for low-cost IC authentication.
Unfortunately, despite a decade of research, security of most Strong PUFs remains
inadequate as they are breakable by machine learning. This raises a fundamental
question of whether design research for Strong PUFs is futile. In this work, we
80
have investigated the machine learning attack resistance of Strong PUFs through
experimental analysis. Our study is based on an abstract genotype PUF model that
corresponds to a large class of Strong PUF implementations. Our results indicate
that if certain randomness requirements can be met by basic design blocks, machine
learning attack resistant Strong PUFs are feasible for known attacks. As an aside, we
also find that meta-ensemble techniques are most potent in model building attack.
We make a case that machine learning attack resistant Strong PUF design will
possibly require a basic design block that implements a high entropy source. This
entropy source can be a digital source such as SRAM PUF or analog functions. We
believe this systematic design approach through generic PUF model will accelerate
both the design and analysis of Strong PUFs with respect to machine learning attacks.
We hope that such a design will emerge soon.
We also propose a new PUF architecture called Log PUF for efficient use of entropy
sources while significantly increasing ML resistance. Preliminary evaluation of Log
PUFs shows promise and we observe that this structure-specific optimization can
further make PUFs area efficient.
81
CHAPTER 5
IMPROVING RELIABILITY OF PUFS
5.1 Introduction
PUF circuit characteristics are affected by environmental variations, noise and
aging. This impacts repeatability of the response which in PUF context is called reli-
ability. Reliability of PUFs is a key design concern in Weak PUFs as the responses are
typically used for cryptographic key generation (or identification) where the responses
are expected to be 100% reliable. A typical setup to extract reliable key from Weak
PUFs is shown in Figure 5.1. The Weak PUF circuits typically generate multiple bits
of output. These bits are post-processed to correct errors using techniques such as
voting, error-correction codes (ECC), fuzzy extraction, and a stable key is derived.
This work focuses on improving the reliability of Weak PUFs with contribution
both in improving the error correction schemes and circuit reliability through design
alterations. First, we propose a new voter based technique based on up/down counter
(UDC) to reduce the error rate of SRAM-based PUF responses by harnessing the
statistical bias. We also present analytical results on error rate pertaining to the voter
design. To enable a complete solution, we then propose a design for Design for Test
(DFT)/testing based approach that capitalizes on the voter based characterization to
improve overall reliability of the system. We have simulated example designs of the
proposed method to measure error rates, area, performance, and yield of the proposed
method and compare it against prior approaches to demonstrate the benefits of this
solution. Second, we study alternative circuit designs of SRAM-based PUFs that
can provide a greater sensitivity to intrinsic process variations and thereby, enhance
82
Figure 5.1: Typical Weak PUF based key generation setup
the mismatch between the coupled elements. Enhancing the imbalance not only
makes the design less susceptible to noise, but reduces the ancillary circuit resources
required to improve reliability. Hence, we can obtain significant savings over many of
the previously proposed reliability enhancement techniques.
5.2 Background
5.2.1 SRAM PUF and Reliability
SRAM cells that are constituent of embedded memories, typically consist of cross
coupled inverters connected by access transistors. Figure 5.2 shows a typical 6-
Transistor SRAM cell.
Due to intrinsic process variations, a SRAM cell on start-up would typically settle
in either of logic-0 or logic-1 value consistently. The settlement state is determined
by mismatch in process variations in the cell transistors. Settlement to consistent
83
Figure 5.2: Traditional 6T SRAM Cell
yet random states allow values from multiple cells to be collected for use as a key
or identifier. A SRAM PUF is expected to produce this key each and every time in
power-up operation. Unfortunately, noise during start up can impact the settlement
state of the PUF resulting in unreliability. Specifically, cells with low mismatch due
to process variations are more sensitive to noise than cells with greater mismatch.
Cells with greater mismatch produce sufficient differential drive to overcome any im-
pact of noise. Along with various noise sources, variations in ambient conditions and
supply voltage and parametric changes due to aging of the transistors also impacts
reliability. As the properties of a PUF depend on process variations, long-term device
degradation due to aging impacts PUF reliability. Reliability of a PUF is also influ-
enced by ambient conditions, supply voltage and various sources of noise. Among the
various noise sources, power supply noise, crosstalk, thermal noise, shot noise and
random telegraphic noise play a role. In addition to environmental variations, aging
of circuits due to negative-bias temperature instability (NBTI), hot-carrier injection
84
(HCI) and time-dependent dielectric breakdown (TDDB) decreases life-time reliabil-
ity of a PUF. Thus a PUF is considered unreliable, whenever it produces a response
which is different from the enrolled ideal response.
As mentioned earlier, PUF circuits can be broadly classified as Weak and Strong
PUFs. Weak PUFs such as SRAM PUFs typically have few CRPs, while Strong
PUFs such as Arbiter PUFs have exponentially large number of CRPs. A Strong
PUF circuit may produce errors for certain CRP pairs. Similarly a Weak PUF may
produce an identifier in which only few bits of the identifier has high error rate. Due
to the difference in CRP count and usage model of Strong and Weak PUFs, the er-
rors can be handled differently. As Strong PUFs have exponential CRPs and are
used for authentication applications, multiple challenges are applied and response are
obtained. Authentication using these error-prone responses can be carried-out suc-
cessfully by setting a success threshold for responses. For example, a Strong PUF with
2% constant intrinsic error rate, can be successfully authenticated if 98% responses are
correct. In contrast, Weak PUFs responses are typically used for cryptographic key
generation (or identification) where the responses are expected to be 100% reliable.
Hence, solutions such as error-correcting codes are required for Weak PUFs. As the
reliability requirements of Weak PUFs are more demanding, we focus on improving
reliability of SRAM PUFs in the remainder of this work.
5.2.2 Related Literature
To derive a stable key from a string of bits where a few location in the string
are unreliable, a number of technology and algorithmic solutions have been proposed
to improve the reliability of key produced from SRAM PUF. Since 100% accuracy
requires large cost, typically a low error-rate such as an error rate of 10−6 is used as
design target [50]. They are discussed below.
85
5.2.2.1 Error-Correcting Codes and Fuzzy Extractor
Error-correcting Codes (ECC) and Fuzzy Extractors have been proposed to cor-
rect errors in SRAM PUF [16, 43, 42, 15, 45]. Typically, these techniques create a
helper data which is made public. The helper data is used to recover a key from
the original data while minimizing the probability of error. To derive a stable key
of a certain length, one must start with a string that is longer in length, introduc-
ing an overhead. In fuzzy extractor, this overhead along with processing overhead
increases rapidly with rising intrinsic bit error rate of SRAM PUF. Invariably, as
technology matures, process mismatch decreases, thereby increasing intrinsic bit er-
ror rate . Since low cost systems are produced in mature technologies rather than
in leading edge processes, the overhead of fuzzy extractor, ironically, can become
greater in low-cost systems. Apart from this overhead concern, such schemes should
also ensure minimal information leakage from the helper data. These concerns have
spurred an alternative line of research, which is described next. A SRAM PUF array
with inherently low reliability requires more area and computation for error reduction
using above approaches. Hence, reduction in error rate of the raw PUF cells would
benefit in total area reduction.
5.2.2.2 Circuit and Manufacturing Technology Solutions
Several researchers have proposed technology and circuit solutions to improve
the reliability of Weak PUFs. Mathew et al. have proposed a solution based on
Temporal Majority Voting (TMV) and dark bits evaluation [50]. They also employ
burn-in and aging effects to improve the PUF. Design changes to enable voting, along
with uniqueness improvement through synchronous design have also been proposed.
However, majority voting scheme proposed can only correct up to error rates of less
than 8% stand-alone (with a 15-way voter) and additional techniques are mandatory
to achieve practical application. Garg et al. have also proposed a method to improve
86
Figure 5.3: Enrollment procedure for SRAM PUF
the reliability of the PUF cells by aging the cells and increasing the mismatch [19].
Similarly, Bhargava et al. have proposed technique to improve reliability through
Hot carrier injection (HCI) aging [7]. Maes et al. also investigate the effect on aging
in SRAM PUF and effectiveness of data-dependent strategies to improve reliability
[44]. Hoffer et al. have proposed an alternative to error correction by pre-selecting
the bit with greater mismatch to reduce error rate [25]. Cortez et al. have proposed
a method by adapting voltage ramp-up time to ambient temperature to reduce the
error rate of memory PUFs [12]. Though this technique improves the reliability, the
auxiliary circuits needed for voltage ramp-up can be area intensive. Also, shaping
supply voltage is expensive for designs with large power delivery network.
Ganta and Nazhandali explore alternate configurations of the inverters in the
SRAM cell to improve the stability of SRAM cells with respect to variations in tem-
perature and reduce the number of unreliable bits to save on ECC area [18]. In
87
contrast, our goal is to study the SRAM cell performance at a given temperature
in the presence of thermal noise that can cause errors in its output. We seek to
explore alternative configurations of the SRAM cell that increase process sensitivity
and hence, provide greater mismatch between the cross-coupled elements.
5.3 Proposed Up/Down counter (UDC) based Technique
In this section, we describe the preliminaries for the proposed voting method.
Then, we present our approach and the design of associated circuits. We conclude
this section with observation on the simulated reliability improvement.
5.3.1 Harnessing Statistical Bias for Improving Reliability
A SRAM PUF cell is expected to produce reliable value for key generation. If the
relative drive strength of the transistors is low, then any noise present in the cell will
determine the outcome of the PUF. This can result in changing cell values on each
start-up. When a SRAM cell exhibits changing behavior during start-up, it is possible
that if the cell is powered-up several times, it will exhibit a statistical bias. That is,
if we assume that the mean value of noise that gets coupled to the PUF cells is close
to zero, multiple evaluation of the PUFs response would give the true statistical bias
of the PUFs response. Temporal Majority Voting (TMV) is one of the well known
technique to extract such bias in presence of noise. If the statistical bias is strong, it
may be detected using only a limited number of experiments. On the other hand, if
the bias is weak then a much larger number of start-up experiments may be necessary
to detect the true bias. A problem with larger number of start-up experiments is that,
the associated circuits for book-keeping will grow in size. Our proposed technique
avoids this problem while extracting such bias. Through simulation and analysis we
show that this technique is superior to traditional TMV.
88
5.3.2 Temporal Majority Voting
A simple way to reduce the error rate is using a Temporal Majority Voting scheme
[50, 76]. For example, a simple 4-bit counter based TMV counts from 0 to 15 and
hence can be used as 15-way voting. If the resultant value after 15 evaluations of the
cells response is greater than 8, then the final value is classified as 1 or else it can be
classified as 0. The concept of TMV has been discussed extensively in previous works
by Mathew et al. and Xiao et al. [50, 76]. The mathematical model of the TMV is
a binomial counting process and, hence the reduction in error rate can be calculated
analytically. For example, a PUF cell whose error rate is 1− p, reduces to
Pe(N) =
N∑
m=k
(
N
k
)
pm(1− p)N−m (5.1)
where k = (N + 1)/2 (N is odd) when a N -way TMV is used [38]. The circuit
implementation of the TMV typically consists of a n-bit counter where N = 2n − 1.
The counter counts the number of ones; it is incremented by 1 if and only if the
response from the SRAM cell is 1.
A major disadvantage of TMV is the high cost when the statistical bias of a SRAM
PUF cell is weak. This is discussed further in subsequent sections.
5.3.3 New Voter Design
In this subsection, we present our voter design. Even though we demonstrate the
proposed technique using SRAM-based PUFs, the technique is generic to use with
other classes of Weak PUFs to improve the reliability. The proposed voter is based
on an UP/DOWN counter. A simple counter starts at an initial value of count 0 and
counts upwards. By contrast, the n-bit counter used in this design starts at an initial
value of (2n − 2)/2. The counter value is increased if the response from the current
trial is 1, else (0) it is decreased. When the counter reaches a terminal value of 0 (or
2n − 2) the counter saturates and retains the terminal value.
89
Figure 5.4: UP/DOWN counter based voter scheme
The complete setup of the proposed voter scheme along with the SRAM PUF
is shown in Figure 5.4. The output of the PUF cell is used as input to the n-bit
UP/DOWN counter. In the figure, we show a setup where an UP/DOWN counter
is shared with 4 SRAM cells, as an example. Starting at an initial counter value of
(2n − 2)/2, multiple trials are conducted until the n-bit UP/DOWN counter reaches
terminal values of 0 or 2n− 2. Unlike TMV, where the number of trials is fixed, trials
may continue indefinitely in an UP-DOWN counter until a terminal value is reached.
When the counter reaches a terminal value of 0 (or 2n − 2), the value of the SRAM
PUF cell is resolved as a logic-0 (or logic-1 ). When a terminal value is reached, PUF
output is resolved and the trials are terminated, otherwise in practice, the trials are
continued for a pre-determined number of times. It is possible that no decision can
be reached when the trials are terminated. The optimal values for n for varying error
rates will be discussed in greater detail in Section 5.4.3. Multiple PUF cells can share
a single UP/DOWN counter or each cell can be assigned an UP/DOWN counter
as done in previous designs [50]. The multiplexer is appropriately chosen (4 : 1 in
90
Figure 5.5: Modified SRAM cell for multiple evaluations
Figure 5.4). As expected, sharing the counter across multiple cells would increase the
run-time, but reduces the area overhead.
5.3.4 Circuit Design
The voting process described above needs multiple evaluations of the PUF output,
but multiple power-ups of the circuit is inefficient. Instead, the SRAM cell can be
modified with minimal changes to implement the scheme. Instead of using start-
up values during power-up, the circuit can be converted to a pre-charge/discharge
circuit. This modified circuit is shown in Figure 5.5. The clock signal first enables
the paths from Vdd to out and ¯out and pre-charges the nodes to supply voltage.
During evaluation, depending on the process variation, the output will settle in logic-
1 or logic-0 due to the mismatch in the strength of discharge paths. This design is
similar to changes required for enabling TMV in related work [50], Sense-Amplifier
PUF [6] and also, similar to meta-stability based TRNGs [69]. The circuit changes
91
above have minimal impact on the cell area. The UP/DOWN counter circuit can
be implemented using flip-flops along with the required logic for initialization and
saturation detection.
5.3.5 Error Rate from Simulation
We defer description of the details of simulation settings to Section 5.5. Here, we
describe the methodology for obtaining error rate from simulation. We simulate a
noisy SRAM PUF cell at SPICE level in 45nm technology [77] by randomly varying
supply noise. The power supply noise distribution is varied to control the error rate of
the SRAM cell. 1 million samples were collected from the cell. The outputs were fed
into a 4-bit UP/DOWN counter and the new, more reliable output bits were obtained.
The new error rate was calculated for these bits to obtain the reliability improvement.
As shown in Figure 5.6, the error rate of the proposed technique obtained through
simulation is in the order of 10−6 or less for initial error rate of ≤ 0.16. We reserve
the reliability improvement over traditional TMV to the next section. Since the order
of magnitude of the error rate obtained through simulation is in range of 10−6, large
number of simulations may be required to get an accurate estimate. In order to
estimate the error rate with higher accuracy, analysis of the proposed technique is
presented in the next section.
5.4 Analysis of UDC based Design
In the following subsections, we describe the basics of random walk model and
use it to analytically evaluate the UP/DOWN counter scheme. Also, we compare our
approach to Temporal Majority Voting (TMV) and present a DFT scheme to improve
overall reliability.
92
Figure 5.6: Error rate results from simulation
5.4.1 Operation of The Proposed Voter as Random Walk
Let (U1, U2, . . . ) be a sequence of independent random variables from the set
{1,−1}. Let the probability of value 1 from a trial be p, where p ∈ [0, 1]. Then, the
probability of value −1 is 1 − p. If XN represents the sum of such sequence after
N -trials, then
XN =
N∑
i=1
Ui ; where N is the number of trials (5.2)
The path traced by XN is called a simple random walk [29]. This is an elementary
1-dimensional random walk on integer number line. The properties of random walk
and associated problems are well studied and are related to Markov process.
The UP/DOWN counter based scheme proposed above can be modeled and ana-
lyzed using random walk based models. For the purpose of analysis, without any loss
in generality, let us assume that in absence of any noise the PUF cell settles at logic-1
due to process variation. In presence of noise, there is a possibility that the cell will
settle at logic-0 which is opposite to the inherent process variation of the cell.
93
In Table 5.1, we describe the notations for our analysis of the UP/DOWN counter.
In this analysis, we assume the noise experienced by the cell has zero mean; hence,
for a statistical bias towards logic-1, p must be > 0.5. If the cell has a strong bias
towards logic-1, then p  0.5, otherwise for weak bias p has a value slightly above
0.5.
As mentioned earlier, the n-bit UP/DOWN counter is initialized at k and multiple
trials are conducted. The state transition diagram to better illustrate this process
is shown in Figure 5.7. If the PUF cell creates logic-1, it is symbolized by Ui = +1
else, Ui = −1 in (5.2). However, due to the absorbing saturation (decision) barrier at
the end as shown in Figure 5.7, (5.2) cannot be used directly to model UP-DOWN
counter. Nevertheless the UP/DOWN counter is related to the well-studied Gamblers
ruin problem [29] and hence, the metrics of interest can be determined. For the
proposed voter scheme, we are concerned with three probabilities: (i) probability of
reaching the correct state, Ps (probability of success); (ii) probability for reaching
the wrong state, Pe (probability of error); and (iii) probability of unresolved output
1−(Pe+Ps). If we consider our working example, as the cell is biased towards logic-1,
the probability of reaching the end state (2n − 2)/2 is the probability of success Ps.
Similarly, the probability of the trials leading to the end state 0 is the probability
of error Pe. The last probability case may arise when the counter has not reached a
saturating state after a given number of trials. As the circuit is designed to resolve
to a decision, we are concerned with Pe and Ps in limited number of trials.
5.4.2 Error rate
A random walk with absorbing barrier is akin to the random walk in Gamblers
ruin problem as mentioned earlier. Instead of deriving probability of error in T trials,
the probability of error occurring in infinite number of trials can be derived [29] and
is given by
94
Figure 5.7: Markov Chain model for the voter scheme
Table 5.1: Definition of Symbols
Symbols Definitions
p Probability of logic-1 from a PUF cell
q = 1− p Probability of logic-0 from a PUF cell
n Length of the UP/DOWN counter
T Total number of Trials
k = (2
n−2)
2
Initialization state. It is, also, the number
of steps from initialization to end states
Ps
Probability of logic-1 from UP/DOWN
counter (probability of success)
Pe
Probability of logic-0 from UP/DOWN
counter (probability of error)
95
Pe = 1−
1−
(
q
p
)k
1−
(
q
p
)2k (5.3)
This value serves as an upper bound on worst case error rate resulting after using
the UP/DOWN counter. The above expression is directly related to the ruin prob-
ability in gamblers ruins problem [29] and the derivation is excluded for brevity. As
the above (5.3) signifies the probability of error under infinite trials, the probabil-
ity of error is lower under fixed number of trials. Nevertheless, the expression gives
meaningful bounds for the design signifying that the resultant error rate is always
less than the expression derived in (5.3). This is shown in Figure 5.6 where the error
rate using our scheme using experimental and analytical results are plotted. Thus,
the equation gives insights on benefits of using the proposed voting scheme.
Similar to probability of error, the probability of reaching correct state (probability
of success) under infinite trials can be derived as
Ps = 1− Pe (5.4)
From (5.3) and (5.4) we can infer that under infinite trials the UP/DOWN counter
saturates at either one of the saturating ends when p 6= 0.5. In reality, due to the
limited number of trials, the counter value can be struck in a value between the
saturating values. As this primarily occurs in cells with high error rate, they can be
neglected for key generation if the design has redundant cells. A testing/DFT scheme
using this method is discussed in Section 5.4.4.
5.4.3 UP/DOWN Counter vs TMV
In Figure 5.8, the comparison of error rate after using the TMV and UP/DOWN
counter scheme are plotted against the initial error rate of the cell. Analytical results
were used for both schemes. The expression for error rate of TMV can be found in
related publications [50]. For comparison, the error rate reduction by using 4, 5 and
96
Figure 5.8: Comparison of error rate reduction for TMV and UP/DOWN
counter(UDC)
6-bit TMV counters and UP/DOWN counters are plotted. For the target error rate
in order of 10−6, the new voter design offers significant advantage over a traditional
TMV. The 4-bit UP/DOWN counter is capable of handling twice the initial error rate
compared to a 4-bit 15-way TMV to get a final error rate in order of 10−6. Similarly,
5-bit and 6-bit UP/DOWN counter offer 2× and 1.8× improvement. Another point
to mention is that the improvement in reliability when using UP/DOWN counter
is a conservative estimate as 5.3 is an upper bound and in reality better gains are
expected. The UP/DOWN counter has an area penalty of ∼ 10% to ∼ 15% compared
to a similar simple counter used by TMV in 45nm technology Nangate open cell library
[52]. This area increase is acceptable for the significant improvement offered in error
rate reduction.
97
Figure 5.9: Expected number of trials to reach saturation (decision) in a 4-bit
UP/DOWN counter
5.4.4 DFT based on Trials to Settlement
Equation (5.4) quantifies the probability of success when infinite trials are applied
to each cell. Yet, in reality, the UP/DOWN counter may settle in a value between the
end states due to the limited number of trials. This is related to the inherent error
rate of each cell. The expected number of trials needed by the UP/DOWN counter
to reach a decision state can be derived [29] as
dk =
k
(p− q)
2
 1−
(
q
p
)k
1−
(
q
p
)2k
− 1
 (5.5)
where k = (2n−2)/2. This expected value is for a 4-bit UP/DOWN counter is plotted
in Figure 5.9 for different initial error rate. As illustrated, the average number of trials
for reaching the saturation increases with the error rate. The exact distributions for
probability of reaching the correct value under given number of trials can be derived
using probability generating functions. Such explicit expressions for number of trials
98
(a)
(b)
Figure 5.10: Testing/DFT method for identifying high error-rate cells: (a) Test-
ing/Enrollment and (b) Operation
99
and probability of not reaching any state can be used to improve the design, but are
beyond the scope of this work.
As the average number of trials to reach the end state is related to inherent error
rate of the cell, this information can be used during the trials to filter out cells with
high error rate. For example, consider the test setup shown in Figure 5.10. The PUF
array has redundant cells so that during trials the cells with high error rate can be
discarded for operation. Hence, the aim of the test is to generate mask information
which indicates which of the cells in the PUF array should be considered for real
time operation. One simple way to achieve this is to set an empirical or analytical
threshold based on (5.5) (or using tighter bound expressions) for the number of trials
to apply to the PUF array. If a PUF cell does not reach the end state within the
target number of trials, the cell is marked invalid. As this mask does not reveal
any information about the PUFs values, it can be made public. During real time
operation this mask value along with raw PUF response is combined to filter out
response of cells with high error rates. The resultant identifier is used as a key. This
technique may also be combined with ECC or other post-processing techniques to
reduce the probability of error further. Bhargava et al. have proposed similar mask
generation, but by adjusting supply voltage [8]. This technique can also be used to
determine whether a PUF chip is reliable enough for operation. For example, if the
mask implies that the number of cells that are reliable in a array is less than the
length of key designed for, the chip can be marked as unreliable and hence, rejected
during testing.
Thus, even though the UP/DOWN scheme has subtle changes in comparison to
the TMV scheme, they offer significant improvement on error rate and also, provide
leverage to identify high error rate cells.
100
5.5 UDC technique - Results and Case Studies
In this section, first we perform a case study to illustrate how yield and error rates
can be improved using redundancy with the proposed voter based design. Later, we
make comparisons in terms of area and performance with other related works.
5.5.1 Case Study: Redundancy to Improve Yield and Error Rate
In this subsection, we demonstrate how combining redundancy of SRAM PUF
cells along with proposed testing technique can be used to improve reliability and
yield of a chip using SRAM based PUF. We also estimate how much extra redundant
cells are required for guaranteeing high yield.
Let us assume that under a particular manufacturing process and SRAM design,
x% of cells are unreliable beyond correction by UP/DOWN voter scheme. We name
such cells as bad cells. Hence, we can assume that each cell has a probability of x%
of turning up as a bad cell after manufacturing. We are interested in calculating for
a particular value of x, how many redundant cells are required to create a stable 128
bit-key with high probability. We aim at reducing the probability of not finding 128
reliable bits cells to 10−6. This can be calculated using binomial counting process as
given below:
Pyield loss =
N∑
m=128
(
N
128
)
xm(1− x)n−m (5.6)
where Pyield loss is defined as the probability of not finding 128 stable cells in a design
with N cells. For example, if a particular manufacturing process has x = 0.05, with
just 151 cells, we can guarantee that 128 stable bits can be found with high probability
(1− 10−6). This ensures that expected yield loss is around 1 in a million chip for the
above example.
5.5.2 Area and Performance Comparisons
In this subsection, we present a design example along with estimates on other
metrics. The 45nm technology Nangate open cell library [52] is used for all area
101
Figure 5.11: Histogram of number of times SRAM cell was read before saturation
calculations. We use a 4-bit UP/DOWN for the estimation which can correct initial
error rate of 15% to less than order of 10−6.
First, we obtain the statistics on the number trials required for decision using
the UP/DOWN counter in the presence of noise (which is related to the analysis
presented in 5.5). 10000 SRAM cells were instantiated with PVT variations modeled
as Gaussian threshold voltage, Vth, variations with 3σ value of 150 mV. The voltage
supplies connected to the source of M5 and M6, of Figure 5.5, were each used as noise
source. 100 such outputs were generated with varying supply noise and the outputs
were fed to a 4-bit UP/DOWN counter. The histogram for the number of outputs
each SRAM cell needed to reach saturation in presence of supply noise is shown in
Figure 5.11. The results indicate that the majority of the cells (99.7% or 3σ) take
≤ 50 counts, for a 4-bit UP/DOWN counter, to reach a decision. Hence, if a 4-bit
UP/DOWN counter is used, 50 counts are adequate to cover majority of the cells.
This empirical number for trials is used in key generation time calculations below.
To make a comparison with a fuzzy logic based implementations we chose the fuzzy
extractors proposed by Bosch et al. [10] for generating a stable key from SRAM PUFs,
102
given 15% error probability. The authors make use of concatenated codes and show
results of using repetition. Based on the results presented in the work we estimated
the areas for Golay, G23 and Reed Muller, RM , decoder implementations where the
code parameters were set to yield a final error rate of around 10−6. The initial
number of SRAM cells needed were 3105 and 5040 cells for Golay and Reed Muller
implementations, respectively, to generate 171 stable bits. For this comparison, we
do not consider the hashing function area from the work. Considering 5% of the total
cells are beyond correction by our scheme, our implementation requires minimum of
180 SRAM PUF cells to get 171 stable bits. We consider 192 cells (∈ 16Z) for ease
of implementing our multiplexing solutions.
The total area of the PUF comprises of the area of PUF cells, the area for
UP/DOWN counter, selection multiplexers and the clock generator/controller logic.
We calculate the areas using different multiplexing options, built using 2 : 1 muxes
from Nangate library, for our solution. Table 5.2 tabulates all the results and shows
that our implementation has 1.4× - 2.3× range of improvement in area. The penalty
for fuzzy logic implementations comes from the large number of SRAM PUFs needed
to generate reliable keys. However, for high reliability systems, our voting scheme
can be combined with Fuzzy extraction and ECC based schemes to create efficient
hybrid techniques.
Table 5.2: Area estimates of proposed voting scheme using Nangate Cell Library [52]
Implementation Area (µm2)
repetition[9,1,9]; Golay[23,12,7] [10] 7648
repetition[9,1,9]; Reed Muller[16,5,8] [10] 7945
UP/DOWN counter scheme
4 : 1 Mux 5380
8 : 1 Mux 4108
16 : 1 Mux 3472
103
Bhargava et al. [8] have shown significant timing improvements compared to
fuzzy logic-based works with data obtained from test chip. Hence, we compare our
work against them for key generation time. The authors load mask data, termed as
reliability map, and generate the key. It is stated that a total of 286 cycles were needed
to generate 171 bits. Similar to their work we consider loading mask information in
4-bits/cycle sequence. Hence, for 192 bits we need 48 cycles. Taking 50 cycles as the
maximum number of times each SRAM PUF is read we get a total of 200 cycles for
the 4 : 1 mux implementation to obtain all the required bits. So, for the smallest
multiplexer implementation we require a total of 248 cycles to generate 171 reliable
bits. When clocked at 2 GHz this corresponds to ≤ 0.2µs of key generation time.
Higher order multiplexing options benefit from lower area, but there is a trade-off in
terms of time taken to generate the final key (Eg. 848 cycles for 16 : 1 mux option).
These results indicate the efficiency of the proposed solution.
5.6 Circuit Design Alternatives
In this section we discuss the circuit design alternatives to improve the reliability
of SRAM-based PUFs. We study alternative cross-coupled designs that can provide a
greater sensitivity to intrinsic process variations and thereby, enhance the mismatch
between the coupled elements. Enhancing the imbalance not only makes the design
less susceptible to noise, but also reduces the ancillary circuit resources required
to improve reliability. Hence, we can obtain significant savings over many of the
previously proposed reliability enhancement techniques. First, we describe the process
variation and noise modeling parameters that will be utilized across all the designs
considered. Next, we discuss the standard simple SRAM cell design consisting of
two-cross coupled inverters. Later, we explore alternatives to the simple cell that
have a greater sensitivity to process variation.
104
5.6.1 Modeling Process Variation
We model manufacturing induced process variations as random parametric varia-
tion applied individually to each transistor in a circuit. Also, the parameters are ran-
dom across PUF instances. Generally, threshold voltage (V TH) and channel length
deviations are modeled to represent process variation. The values are obtained from
a normal distribution, N(µ, σ2), where the mean (µ) and standard deviation (σ) are
determined based on the technology node being considered. The geometry of a tran-
sistor decides the susceptibility of a device to process variations with larger devices
experiencing lesser fluctuations. In terms of threshold voltage, the mean is the default
transistor model value and the standard deviation is given by,
σV TH =
σV TH0√
W∗L
Wmin∗Lmin
(5.7)
where (Wmin,Lmin) are the minimum possible width and length of a device, respec-
tively, and (W ,L) are the sizes used in the design. σV TH0 is the standard deviation
of threshold voltage for the minimum sized device.
In this work, we instantiate the SRAM cell designs in 45 nm using NCSU FreePDK45
models [53]. Typically, for this particular technology node, a standard deviation of
53 mV for threshold voltage and 10 % variation in channel length are considered [3].
However, our goal is to analyze the design performance for low process variation cor-
ner and hence, we chose to model only threshold voltage changes with a low σV TH0
(5 mV). This approach amplifies the error rates and helps showcase the designs with
high process sensitivity that will perform better in mature technology nodes with low
manufacturing process variations.
5.6.2 Thermal Noise Errors
Thermal noise in a transistor occurs due to the random motion of the charge carri-
ers from thermal excitation and can create a random voltage fluctuation in conductors
105
Figure 5.12: Simple SRAM with cross-coupled inverters
[32, 55]. Thermal noise has no correlation among different sample across time and
has a near uniform power spectral density. In advanced CMOS technology nodes,
short channel effects [22] have increased the effect of thermal noise and significantly
impact transistor noise performance under particular conditions [71]. The magnitude
of thermal noise at any given node is determined by the device temperature and the
node capacitance. This can be represented in terms of a normal distribution with 0
mean and standard deviation given by [27],
σNOISE(T ) =
√
kB ∗ T
C
(5.8)
where kB is the Boltzmann constant, T is the absolute temperature (Kelvin) and C
is the node capacitance (Farads).
5.6.3 Simple 6T SRAM-based Weak PUF (Reference circuit)
A simple SRAM cell consists of two cross-coupled inverters and two access tran-
sistors, as shown in Figure 5.12. The sizing of the transistors depends on various
106
Figure 5.13: SRAM cell modified for simulating multiple power-ups
design requirements related to speed of read/write, noise tolerances and so on. The
positive feedback due to the cross-coupling enables the SRAM to store stable states
(logic-0 or logic-1). A write operation can force the SRAM cell to a particular desired
state during operation. However, in an ideal case, the cell exists in a metastable state
on device power-up until the first write is performed. In practical scenarios, due to
manufacturing process variations, there is a mismatch between the strengths of the
two cross-coupled inverters. The power-up process can trigger the feedback loop and
the SRAM cell settles into a stable state depending on the differential mismatch of
inverter strengths. This makes a SRAM cell appealing for use in Weak PUFs.
We expect a SRAM cell to have reliably identical behavior across multiple power-
ups and in the presence of common mode noise since the circuit is differential. How-
ever, thermal noise is differential in nature causing errors. Across multiple power-ups,
the SRAM cell can end up with different outputs and hence introduce errors when
used in Weak PUFs. This problem is particularly critical when the mismatch between
the cross-coupled inverters is not strong enough to be resistant to noise.
107
We modify the SRAM cell by implementing additional pre-charge/discharge cir-
cuitry, as shown in Figure 5.13. The modification has two uses: (a) to model the
thermal noise with ease and (b) the circuit can be used for multiple evaluations to
reduce errors as shown in previous works [50]. The enable signal (at logic-0), EN,
first pre-charges the OUT and OUT to Vdd while keeping the footer NMOS (M7) in
OFF state. The evaluation phase begins by setting EN to logic-1 and depending on
process variation, the SRAM cell will settle into a particular state. To mimic thermal
noise effects, we pre-charge the OUT and OUT to two different values, Vdd,1 and Vdd,2.
The pre-charge voltages are obtained from a Gaussian distribution with a mean of
Vdd and variance given by (5.8).
5.6.4 Study of various Cell Designs
Here, we explore alternative SRAM cell designs that are more susceptible to pro-
cess variations and can result in a greater mismatch between the two cross-coupled
elements of the SRAM cell. The salient features and primary drawbacks of each
alternative are also discussed.
5.6.4.1 Simple active loads (D1)
The first alternative we consider modifies the SRAM cell pull-up network of the
previous inverter configuration by connecting a DC source, Vbias, to the PMOS gate
terminals. This converts the PMOS transistors into active loads. The resultant cross-
coupled circuit is shown in Figure 5.14 for conciseness. The resulting configuration is
similar to a 4T SRAM cell [59] where the input only controls the pull-down network.
In a simple inverter configuration, the currents in both the pull-up and pull-down
networks are affected by the inputs. In a low process variation scenario, thermal
noise will affect both networks in a differential manner and has a greater chance
of introducing error. By removing the input dependence in one of the networks of
each cross-coupled element (M1 or M3 in Figure 5.14), we make the current through
108
Figure 5.14: SRAM cell with only pull-down network and active resistive loads
that network purely dependent on the process variation. The constant source, Vbias,
is common to both cross-coupled elements in the circuit and any noise associated
with this source will become common mode noise to the circuit. The complimentary
network, also affected by process variation, is connected to the input and used to
drive the feedback loop. Hence, the circuit overall becomes more process sensitive.
One disadvantage of this approach would be the high static power as the pull-up
network is always conducting current. The footer NMOS helps with this as the SRAM
cell can be cutoff when not in use.
5.6.4.2 Stacked active loads (D2)
The next alternative we explore replaces the simple active load, discussed previ-
ously, with a stack of such loads, as shown in Figure 5.15. Although this approach
does provide same advantages as the previous alternative, we note that the current
109
Figure 5.15: SRAM cell with stacked active loads
in the pull-up network will be limited by the weakest active load in the stack. The
stacked structure itself does not provide any extra beneficial effect in terms of sensi-
tizing the SRAM cell to process variations. There is also the fact that the additional
transistors increase the area of the cell.
5.6.4.3 Parallel active loads (D3,D4)
We study the case of using parallel active loads, as shown in Figure 5.16, instead
of just a single load. Applying Kirchhoff’s Current Law (KCL) at the output node,
OUT , we see that the currents through the active loads add up. Taking a case of
110
Figure 5.16: SRAM cell with parallel active loads
two active loads in parallel at OUT node, let the currents in the two loads have a
normal distribution with unique means (µ1, µ2) and variances (σ
2
1, σ
2
2), due to process
variations, across a population of SRAM cells. The addition of such currents at OUT
realizes a current with a variance that is the sum of the two load variances (σ21 + σ
2
2).
This concept can be extended to multiple parallel loads. Hence, we see that this
approach , theoretically, provides a greater sensitivity to process variation than just
a single active load.
Having a number of parallel PMOS loads may mean that the size of the input
connected NMOS transistors may need to be larger to effectively sink enough current
to drive one of the outputs to a logic-0. Also, multiple active loads in parallel might
require the need for complex biasing of the active loads to further reduce the current
that the pull-down network needs to handle. The multiple parallel loads and the need
for a larger pull-down transistor adds to the area overhead compared to the simple
case.
111
Figure 5.17: SRAM cell with current mirror loads
5.6.4.4 Current Mirror loads (D5)
A current mirror has the property of providing a multiplier effect based on the
sizing of the mirror transistors. Hence, we explore replacing the simple load with
a current mirror load, as shown in Figure 5.17. The current mirror provides two
venues to accentuate the effects of process variations on the SRAM cell: (a) the
process variation in the mirror transistors (say Mx1, Mx2 in Figure 5.17) affects the
multiplying factor across the current mirror; (b) the process variation of the bias
transistor (Mx3) of the current mirror influences the base current that will be mirrored.
These combined effects can help improve the mismatch between the cross-coupled
elements of the SRAM cell and hence, reduce error.
Aside from the extra transistors added to realize the current mirrors, we have to
generate an appropriate biasing voltage. However, the significant advantages that
can be accrued by using the current mirror loads help justify the overhead.
112
5.7 Circuit Design Alternatives - Results and Discussion
In this section, we will first discuss the reliability performance of various alter-
natives explored in section 5.6.4 in comparison to the base case of a simple inverter
cross-coupled SRAM cell using error rate as the metric. Lastly, we comment on the
area savings with respect to previous enhancement techniques.
5.7.1 Error rate Comparison
Our primary goal is to compare the reliability of each proposed alternative with
respect to the simple SRAM cell. For this purpose, we assume room temperature
(25 ◦C) and obtain the input capacitance for the cross-coupled elements of each design,
with no process variation, and calculate the standard deviation of thermal noise to be
used for simulation according to (5.8). The thermal noise values are generated from
a normal distribution with 0 mean and the calculated standard deviation and added
to the pre-charge sources, (Vdd,1, Vdd,2), shown in Figure 5.13.
To obtain the error rate of a single instance, we produce 1000 thermal noise
value pairs and simulate the instance for all the noise values. This is equivalent to
simulating a 1000 power-ups of the SRAM cell instance. The output voltages are
then classified as either logic-0 or logic-1 based on a certain threshold, Vdd
2
V in our
case. The error rate for the instance is the number of times the logic output of the
cell flipped compared to the base case of no thermal noise. The error rate is expressed
as a percentage in the results. We instantiate 10000 instances for each design with
the process variation expressed as threshold voltage variation. The threshold voltage
is varied with the base value provided by transistor models as the mean and the
standard deviation set at 5 mV, representing a low process variation design corner.
The threshold voltage for each transistor is further scaled according to its size as
given by (5.7).
113
The operation of the SRAM cell has been described in section 5.6.3 and only the
cross-coupled elements are varied to generate the results. The simple SRAM cell is
termed as Reference while the rest of the designs are designated D1 . . . D6 (as listed
in Table 5.3). For the parallel active loads case, we do not consider more than 3
parallel loads as the pull-down network will become overpowered and we will need
large NMOS transistors to effectively sink the current. Also, the area overhead is
large compared to the benefits obtained.
The pass transistors and the footer NMOS are considered ideal for our simulation
to sensitize the outputs to only the variations in the transistors of the proposed
alternate designs. They are sized to allow for proper operation of the circuit. The pull-
up and pull-down transistor sizes for each design configuration are listed in Table 5.3.
The bias voltage, Vbias, is set to 0 V for designs (D1, D2, D6) and the bias is
Vdd
2
V
for the rest.
Results : The mean of the error rate distribution obtained from simulating 10000
instances the reference SRAM cell and of each alternative are tabulated in Table 5.3.
As the results show, all the design alternatives perform significantly better than the
simple cross-coupled inverter SRAM cell with gains ranging from 4X to 20X. The
stacked active loads design (D2) does not provide any additional benefit over the
single active load (D1) case. Also, the parallel loads alternatives (D3, D4) showcase
the improvements obtained by increasing the number of parallel transistors. However,
we have to consider the pull-down network sizing issue discussed previously in picking
the number of parallel loads for a design.
SRAM cells can be implemented in various styles and thus we cannot accurately
estimate the area of the various designs. An approximate calculation of only the
active cross-coupled elements yielded a 50 % area overhead for D4 and D5. Hence,
we will use this overhead in further discussions below.
114
Table 5.3: Low process variation results for various SRAM cell configurations
Configuration (Identifier)
Mean
Error Rate
(%)
Transistor Sizing
(nm)
PMOS NMOS
Simple (Reference) 20.79 90 90
Single Active Load (D1 ) 3.14 90 180
2 Stacked Loads (D2 ) 5.27 90 90
Parallel Loads
2 (D3 ) 1.56 90 180
3 (D4 ) 1.04 90 180
Current Mirror Load (D5 ) 2.3 180
M(x,y)3: 90
Mn(1,2): 180
5.7.2 Flipping point based Analysis
To further illustrate the improvements afforded by the new designs over the simple
SRAM cell, we compare the reference design against one of the alternatives, D5, in
terms of the amount of noise needed to change the base state of a cell. First, we
find the state of the cell outputs without the presence of noise. Choosing the side
that settled to logic-0, we add a DC offset to its corresponding pre-charge supply, in
steps, up to a certain threshold and simulate the cell in HSPICE. Then, we parse the
results and find the offset that caused the output states to flip from the base case and
record the result, termed as flipping voltage. We repeat the experiment for each design
across 1000 cell instances for the low process variation corner considered and a 300 mV
maximum offset threshold. The histogram of the recorded flipping voltages for both
Reference and D5 designs are shown in Figure 5.18. The distributions highlight that,
on average, a much larger noise perturbation is needed to affect the output states
of D5 than a simple SRAM cell design. We should note that this experiment offers
more of a qualitative insight into the noise resilience of the circuits as it is extremely
unlikely that the thermal noise values will reach over 10 mV for any of the instantiated
designs.
115
Figure 5.18: Flipping voltage comparison between Reference and D5
5.7.3 Reducing ECC Circuitry Overhead
As the inherent cell error rate is reduced considerably by using the alternative
SRAM cell designs, we can make appreciable gains by reducing the amount of post-
processing required to make a highly reliable Weak PUF implementation (error rate
∼ 10−6). To obtain an idea on the amount of savings to be expected, we examine
the Reference and D5 designs. We use the full process variation parameters for
the 45 nm technology with σ = 53 mV threshold voltage variation and 10 % channel
116
length variation [3]. We instantiate 1000 different cells of each design and obtain
the error rates as described in section 5.7.1 under thermal noise. The 3σ values of
the respective error distributions obtained are considered for further analysis. This
approximation gives us the error rate that the ECC circuitry will need to handle. The
error rate distribution for Reference gave 3σ ∼ 20% and for D5, 3 σ ∼ 10%.
Bo¨sch et al. [10, 9] conducted extensive studies on the implementation of ECC in
hardware for the purpose of extracting stable keys from SRAM-based Weak PUFs.
The authors explore the use of various error correction codes to correct errors in
a binary string with certain error probability (pb). We use these works as a basis
for acquiring the ECC area savings for the designs considered and assume that the
initial error probability is equal to the 3σ values obtained previously. To get the
final area values we make use of the Nangate 45 nm cell library [52]. Also, we assume
the unit SRAM cell area for Reference design to be 1µm2 and for D5, 1.5µm2 (50 %
overhead). Table 5.4 tabulates the initial error probabilities for each design and the
correction codes used to reduce the final error probability (∼ 10−6). We have not
included the area for the final hashing function used by Bo¨sch et al. [10, 9] as this is
common to all implemented designs. We see a 46.7 % reduction in ECC area, most of
which is due to the lower amount of initial number SRAM cells required to produce
the final stable key. This showcases the fact that reducing the inherent error rate of
the SRAM cell is advantageous.
5.8 Conclusion
SRAM based PUFs are popular for key/ID generation. Such keys may be unre-
liable due to noise. Since the reliability of key is of paramount importance, various
ECC and Fuzzy extraction based techniques have been proposed previously. Unfor-
tunately, they are expensive. We present an alternative design based on a new voter
design and circuit designs which reduces the error rate significantly. Our integrative
117
Table 5.4: Area estimates of proposed circuit alternatives using Nangate Cell Library
[52]
SRAM
Design
Initial Error
probability (pb)
Implementation
Area
(µm2)
Reference 0.2
repetition[9,1,9]; Reed
Muller[32,6,16] [10]
8929
D5 0.1
repetition[3,1,3]; Reed
Muller[32,6,16] [10]
4753
solution encompassing analysis, design and test techniques ensures low error rate,
high yield and simplicity of design. The proposed technique can be viewed as both
an alternative to ECC schemes in low cost systems and complementary to ECC in
high reliability design requirements.
118
CHAPTER 6
CONCLUSION
Physically Unclonable Functions (PUFs) are emerging as promising hardware se-
curity primitives. With extreme hardening against invasive attacks, PUFs are finding
various applications from key-generation and authentication to IP protection. The
three main properties of PUFs - Uniqueness, Reliability and Unpredictability/Security
are at the core for using PUFs as security primitives. Hence, improving these core
properties of PUFs are essential for practical applications. This dissertation focused
on solutions to problems associated with improving these properties of PUF.
Uniqueness of PUFs is closely tied to the fabrication setup in a manufacturing
plant. Hence, post-manufacturing testing solutions are needed to ensure sufficient
uniqueness for the PUFs that are sent to market. Despite a decade of research on
PUFs, no attention has been given to the problem of post-manufacturing testing of
PUFs for Uniqueness. In this dissertation, we focused on this problem and have pro-
posed novel testing techniques. Our testing technique based on multi-index hashing
is fast, low cost and is suitable for high volume manufacturing setup.
Strong PUFs are mainly targeted for authentication applications. For success-
ful deployment, Strong PUFs must be tolerant to modeling attacks. Unfortunately,
many of the current Strong PUFs are vulnerable to machine learning based attacks.
To address this problem, we proposed a new PUF design to increase the modeling
complexity. The proposed PUF design shows promise with orders of magnitude higher
machine learning resistance against known attacks in comparison to standard Strong
PUFs.
119
Many of the Strong PUF designs are ad-hoc with circuit designers getting no in-
sights into techniques to increase modeling complexity of Strong PUFs. To address
this, we proposed a systematic analysis methodology based on an abstract model.
We identified characteristics of non-linear building blocks of Strong PUFs that are
required to increase machine learning resistance. In this process, we discovered that
meta-ensemble machine learning algorithms are potent if Strong PUFs are not com-
plex enough. We also proposed fast simulation methodologies to enable simulation
of large number of challenge-response pairs for a population of Strong PUFs. This
greatly aids the machine learning analyses. We believe that our contributions will
help both in building Strong PUFs and identifying security issues. Future research
directions include designing efficient circuits to implement non-linear building blocks
to build robust machine learning attack-resistant Strong PUFs. Combining analog
non-linear circuits with digital non-linear functions such as S-box can lead to secure
Strong PUFs.
Weak PUFs have been primarily suggested for cryptographic key generation. One
primary requirement for such keys are that they are expected to be 100% reliable.
Hence, output bits generated from Weak PUFs are typically post-processed to gener-
ate error-free keys. Unfortunately, the error-correction techniques add significant area
and thereby, cost. Hence, there is a great value to improve the reliability of Weak
PUFs. In this dissertation, we proposed techniques to improve reliability of Weak
PUFs through both error-correction and circuit solutions. Our solutions show great
promise in area savings. Future research direction include fabrication and testing of
various circuit solutions to improve reliability of Weak PUFs. In addition, integrating
them with voting, masking, and ECC to create a hybrid system can lead to further
increase in cost efficiency.
With Internet-of-Things (IoTs) being touted as a major future technology revo-
lution, security of the systems deploying them are of paramount importance. Low
120
cost security solutions will be of great necessity to enable IoTs. PUFs show promise
to provide this low cost solution. With the PUF related problems researched in this
dissertation, we believe that PUFs have a great potential in emerging as a low-cost
security solution for IoTs and other similar applications.
121
BIBLIOGRAPHY
[1] Alkabani, Yousra, and Koushanfar, Farinaz. Active hardware metering for intel-
lectual property protection and security. In USENIX Security (2007), pp. 291–
306.
[2] Anderson, Ross. Security engineering: A guide to building dependable dis-
tributed systems. 2001.
[3] Association, Semiconductor Industry, et al. International technology roadmap
for semiconductors (itrs), 2003 edition. Hsinchu, Taiwan, Dec (2003).
[4] Bauder, DW. An anti-counterfeiting concept for currency systems. Sandia Na-
tional Labs, Albuquerque, NM, Tech. Rep. PTK-11990 (1983).
[5] Bauer, Eric, and Kohavi, Ron. An Empirical Comparison of Voting Classification
Algorithms: Bagging, Boosting, and Variants. Machine Learning 36, 1-2 (1999),
105–139.
[6] Bhargava, Mudit, Cakir, Cagla, and MAI, Khanh. Attack resistant sense am-
plifier based pufs (sa-puf) with deterministic and controllable reliability of puf
responses. In Hardware-Oriented Security and Trust (HOST), 2010 IEEE Inter-
national Symposium on (2010), IEEE, pp. 106–111.
[7] Bhargava, Mudit, and Mai, Ken. A high reliability puf using hot carrier injec-
tion based response reinforcement. In Cryptographic Hardware and Embedded
Systems-CHES 2013. Springer, 2013, pp. 90–106.
[8] Bhargava, Mudit, and Mai, Ken. An efficient reliable puf-based cryptographic key
generator in 65nm cmos. In Proceedings of the conference on Design, Automation
& Test in Europe (2014), European Design and Automation Association, p. 70.
[9] Bosch, Christoph. Efficient Fuzzy Extractors for Reconfigurable Hardware. Mas-
ter’s thesis, Dept. EECS, Massachusetts Institute of Technology, 2004.
[10] Bosch, Christoph, Guajardo, Jorge, Sadeghi, Ahmad-Reza, Shokrollahi,
Jamshid, and Tuyls, Pim. Efficient Helper Data Key Extractor on FPGAs.
In Proceeding Sof the 10th International Workshop on Cryptographic Hardware
and Embedded Systems (Berlin, Heidelberg, 2008), CHES ’08, Springer-Verlag,
pp. 181–197.
[11] Committee, International Roadmap, et al. International technology roadmap for
semiconductors. Available from: public. itrs. net (2008).
122
[12] Cortez, M., Hamdioui, S., van der Leest, V., Maes, R., and Schrijen, G.-J.
Adapting voltage ramp-up time for temperature noise reduction on memory-
based pufs. In Hardware-Oriented Security and Trust (HOST), 2013 IEEE In-
ternational Symposium on (June 2013), pp. 35–40.
[13] Cortez, Mafalda, Roelofs, Gijs, Hamdioui, Said, and di Natale, Giorgio. Testing
PUF-based Secure Key Storage Circuits. In Proceedings of the Conference on
Design, Automation & Test in Europe (3001 Leuven, Belgium, Belgium, 2014),
DATE ’14, European Design and Automation Association, pp. 194:1–194:6.
[14] DeJean, Gerald, and Kirovski, Darko. RF-DNA: Radio-frequency certificates of
authenticity. Springer, 2007.
[15] Delvaux, Jeroen, Gu, Dawu, Schellekens, Dries, and Verbauwhede, Ingrid. Helper
data algorithms for puf-based key generation: Overview and analysis. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems 34,
6 (2015), 889.
[16] Dodis, Yevgeniy, Ostrovsky, Rafail, Reyzin, Leonid, and Smith, Adam. Fuzzy
extractors: How to generate strong keys from biometrics and other noisy data.
SIAM J. Comput. 38, 1 (Mar. 2008), 97–139.
[17] Friedman, Jerome H. Greedy Function Approximation: A Gradient Boosting
Machine. Annals of Statistics 29 (2000), 1189–1232.
[18] Ganta, D., and Nazhandali, L. Circuit-level approach to improve the temperature
reliability of bi-stable pufs. In Quality Electronic Design (ISQED), 2014 15th
International Symposium on (March 2014), pp. 467–472.
[19] Garg, A., and Kim, T.T. Design of sram puf with improved uniformity and
reliability utilizing device aging effect. In Circuits and Systems (ISCAS), 2014
IEEE International Symposium on (June 2014), pp. 1941–1944.
[20] Gassend, Blaise, Clarke, Dwaine, Van Dijk, Marten, and Devadas, Srinivas. Con-
trolled physical random functions. In Computer Security Applications Confer-
ence, 2002. Proceedings. 18th Annual (2002), IEEE, pp. 149–160.
[21] Gassend, Blaise, Clarke, Dwaine, Van Dijk, Marten, and Devadas, Srinivas. Sil-
icon physical random functions. In Proceedings of the 9th ACM conference on
Computer and communications security (2002), ACM, pp. 148–160.
[22] Goo, Jung-Suk, Choi, Chang-Hoon, Abramo, A., Ahn, Jae-Gyung, Yu, Zhiping,
Lee, T. H., and Dutton, R. W. Physical origin of the excess thermal noise in short
channel MOSFETs. IEEE Electron Device Letters 22, 2 (Feb 2001), 101–103.
[23] Guajardo, Jorge, Sˇkoric´, Boris, Tuyls, Pim, Kumar, Sandeep S, Bel, Thijs, Blom,
Antoon HM, and Schrijen, Geert-Jan. Anti-counterfeiting, key distribution, and
key storage in an ambient world via physical unclonable functions. Information
Systems Frontiers 11, 1 (2009), 19–41.
123
[24] Herschel, William James. The origin of finger-printing. H. Milford, Oxford
University Press, 1916.
[25] Hofer, Maximilian, and Boehm, Christoph. An alternative to error correction for
sram-like pufs. In Proceedings of the 12th International Conference on Crypto-
graphic Hardware and Embedded Systems (Berlin, Heidelberg, 2010), CHES’10,
Springer-Verlag, pp. 335–350.
[26] Holcomb, Daniel E, Burleson, Wayne P, and Fu, Kevin. Initial sram state as a
fingerprint and source of true random numbers for rfid tags. In Proceedings of
the Conference on RFID Security (2007), vol. 7.
[27] Holcomb, Daniel E, and Fu, Kevin. Bitline PUF: building native challenge-
response PUF capability into any SRAM. In International Workshop on Cryp-
tographic Hardware and Embedded Systems (2014), Springer, pp. 510–526.
[28] Hussain, Siam U., Yellapantula, Sudha, Majzoobi, Mehrdad, and Koushanfar,
Farinaz. BIST-PUF: Online, Hardware-based Evaluation of Physically Unclon-
able Circuit Identifiers. In Proceedings of the 2014 IEEE/ACM International
Conference on Computer-Aided Design (Piscataway, NJ, USA, 2014), ICCAD
’14, IEEE Press, pp. 162–169.
[29] Ibe, Oliver C. Elements of Random Walk and Diffusion Processes. John Wiley
& Sons, 2013.
[30] Jiang, Dan, and Chong, Cheun Ngen. Anti-counterfeiting using phosphor puf.
In Anti-counterfeiting, Security and Identification, 2008. ASID 2008. 2nd Inter-
national Conference on (2008), IEEE, pp. 59–62.
[31] Joachims, Thorsten. Making large scale svm learning practical. Tech. rep.,
Universita¨t Dortmund, 1999.
[32] Johnson, J. B. Thermal Agitation of Electricity in Conductors. Phys. Rev. 32
(Jul 1928), 97–109.
[33] Kalyanaraman, M., and Orshansky, M. Novel strong puf based on nonlinear-
ity of mosfet subthreshold operation. In Hardware-Oriented Security and Trust
(HOST), 2013 IEEE International Symposium on (June 2013), pp. 13–18.
[34] Katzenbeisser, Stefan, Kocabas¸, U¨nal, Rozˇic´, Vladimir, Sadeghi, Ahmad-Reza,
Verbauwhede, Ingrid, and Wachsmann, Christian. PUFs: Myth, Fact or Busted?
A Security Evaluation of Physically Unclonable Functions (PUFs) Cast in Silicon.
In Proceedings of the 14th International Conference on Cryptographic Hardware
and Embedded Systems (Berlin, Heidelberg, 2012), CHES’12, Springer-Verlag,
pp. 283–301.
[35] Kumar, R., and Burleson, W. On design of a highly secure PUF based on non-
linear current mirrors. In Hardware-Oriented Security and Trust (HOST), 2014
IEEE International Symposium on (May 2014), pp. 38–43.
124
[36] Kumar, Sandeep S, Guajardo, Jorge, Maes, Roel, Schrijen, G-J, and Tuyls, Pim.
The butterfly puf protecting ip on every fpga. In Hardware-Oriented Security
and Trust, 2008. HOST 2008. IEEE International Workshop on (2008), IEEE,
pp. 67–70.
[37] Kundu, Sandip, and Sreedhar, Aswin. Nanoscale CMOS VLSI Circuits: Design
for Manufacturability, 1 ed. McGraw-Hill, Inc., New York, NY, USA, 2010.
[38] LAM, Suk Wah Louisa. Theory and application of majority vote: From condorcet
jury theorem to pattern recognition. 2nd Int. Conf. mathematics education into
the 21st century: Mathematics for Living (2000).
[39] Layman, Paul Arthur, Chaudhry, Samir, Norman, James Gary, and Thomson,
J Ross. Electronic fingerprinting of semiconductor integrated circuits, May 18
2004. US Patent 6,738,294.
[40] Lee, J.W., Lim, D., Gassend, B., Suh, G.E., van Dijk, M., and Devadas, S.
A technique to build a secret key in integrated circuits for identification and
authentication applications. In VLSI Circuits, 2004. Digest of Technical Papers.
2004 Symposium on (June 2004), pp. 176–179.
[41] Lim, Daihyun, Lee, Jae W, Gassend, Blaise, Suh, G Edward, Van Dijk, Marten,
and Devadas, Srinivas. Extracting secret keys from integrated circuits. Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on 13, 10 (2005),
1200–1205.
[42] Maes, Roel, Tuyls, Pim, and Verbauwhede, Ingrid. Low-overhead implementa-
tion of a soft decision helper data algorithm for sram pufs. In Cryptographic
Hardware and Embedded Systems-CHES 2009. Springer, 2009, pp. 332–347.
[43] Maes, Roel, Tuyls, Pim, and Verbauwhede, Ingrid. A soft decision helper data
algorithm for sram pufs. In Information Theory, 2009. ISIT 2009. IEEE Inter-
national Symposium on (2009), IEEE, pp. 2101–2105.
[44] Maes, Roel, and van der Leest, Vincent. Countering the effects of silicon aging
on sram pufs. In Hardware-Oriented Security and Trust (HOST), 2014 IEEE
International Symposium on (2014), IEEE, pp. 148–153.
[45] Maes, Roel, Van Herrewege, Anthony, and Verbauwhede, Ingrid. Pufky: A fully
functional puf-based cryptographic key generator. In Cryptographic Hardware
and Embedded Systems–CHES 2012. Springer, 2012, pp. 302–319.
[46] Maiti, Abhranil, Gunreddy, Vikash, and Schaumont, Patrick. A Systematic
Method to Evaluate and Compare the Performance of Physical Unclonable Func-
tions. In Embedded Systems Design with FPGAs, Peter Athanas, Dionisios Pnev-
matikatos, and Nicolas Sklavos, Eds. Springer New York, 2013, pp. 245–267.
125
[47] Majzoobi, Mehrdad, Koushanfar, Farinaz, and Potkonjak, Miodrag. Lightweight
secure pufs. In Proceedings of the 2008 IEEE/ACM International Conference on
Computer-Aided Design (2008), IEEE Press, pp. 670–673.
[48] Majzoobi, Mehrdad, Koushanfar, Farinaz, and Potkonjak, Miodrag. Testing
techniques for hardware security. In Test Conference, 2008. ITC 2008. IEEE
International (2008), IEEE, pp. 1–10.
[49] Majzoobi, Mehrdad, Koushanfar, Farinaz, and Potkonjak, Miodrag. Techniques
for design and implementation of secure reconfigurable pufs. ACM Transactions
on Reconfigurable Technology and Systems (TRETS) 2, 1 (2009), 5.
[50] Mathew, S.K., Satpathy, S.K., Anders, M.A., Kaul, H., Hsu, S.K., Agarwal, A.,
Chen, G.K., Parker, R.J., Krishnamurthy, R.K., and De, V. A 0.19pJ/b PVT-
variation-tolerant hybrid physically unclonable function circuit for 100% stable
secure key generation in 22nm CMOS. In Solid-State Circuits Conference Digest
of Technical Papers (ISSCC), 2014 IEEE International (Feb 2014), pp. 278–279.
[51] Messmer, Ellen. Black hat: Researcher claims hack of proces-
sor used to secure Xbox 360, other products — Network World.
http://www.networkworld.com/article/2243700/security/black-hat–researcher-
claims-hack-of-processor-used-to-secure-xbox-360–other-products.html. (Visited
on 12/15/2014).
[52] Nangate, Sunnyvale. Nangate Open Cell Library.
[53] NCSU, EDA. NCSU FreePDK 45nm.
[54] Norouzi, M., Punjani, A., and Fleet, D.J. Fast search in Hamming space with
multi-index hashing. In Computer Vision and Pattern Recognition (CVPR),
2012 IEEE Conference on (June 2012), pp. 3108–3115.
[55] Nyquist, H. Thermal Agitation of Electric Charge in Conductors. Phys. Rev. 32
(Jul 1928), 110–113.
[56] Pappu, Ravikanth, Recht, Ben, Taylor, Jason, and Gershenfeld, Neil. Physical
one-way functions. Science 297, 5589 (2002), 2026–2030.
[57] Pedregosa, Fabian, Varoquaux, Gae¨l, Gramfort, Alexandre, Michel, Vincent,
Thirion, Bertrand, Grisel, Olivier, Blondel, Mathieu, Prettenhofer, Peter, Weiss,
Ron, Dubourg, Vincent, et al. Scikit-learn: Machine learning in python. Journal
of Machine Learning Research 12, Oct (2011), 2825–2830.
[58] Perone, Christian S. Pyevolve: a python open-source framework for genetic
algorithms. ACM SIGEVOlution 4, 1 (2009), 12–20.
[59] Preston, Ronald P. Design of High-Performance Microprocessor Circuits. In
Design of High-Performance Microprocessor Circuits, Anantha P Chandrakasan,
William J Bowhill, and Frank Fox, Eds. Wiley-IEEE press, 2000, ch. 14, p. 290.
126
[60] Rivera, Janessa, and Goasduff, Laurence. Gartner says world-
wide it spending on pace to reach $3.8 trillion in 2014.
http://www.gartner.com/newsroom/id/2643919. (Visited on 12/15/2014).
[61] Rivest, R. Illegitimi non carborundum, 2011. Invited keynote talk.
[62] Ru¨hrmair, Ulrich, Devadas, Srinivas, and Koushanfar, Farinaz. Security based
on physical unclonability and disorder. In Introduction to Hardware Security and
Trust. Springer, 2012, pp. 65–102.
[63] Ru¨hrmair, Ulrich, Sehnke, Frank, So¨lter, Jan, Dror, Gideon, Devadas, Srinivas,
and Schmidhuber, Ju¨rgen. Modeling Attacks on Physical Unclonable Functions.
In Proceedings of the 17th ACM Conference on Computer and Communications
Security (New York, NY, USA, 2010), CCS ’10, ACM, pp. 237–249.
[64] Schapire, RobertE. The Boosting Approach to Machine Learning: An Overview.
In Nonlinear Estimation and Classification, DavidD. Denison, MarkH. Hansen,
ChristopherC. Holmes, Bani Mallick, and Bin Yu, Eds., vol. 171 of Lecture Notes
in Statistics. Springer New York, 2003, pp. 149–171.
[65] Schrijen, Geert-Jan, and van der Leest, Vincent. Comparative analysis of sram
memories used as puf primitives. In Proceedings of the Conference on Design,
Automation and Test in Europe (2012), EDA Consortium, pp. 1319–1324.
[66] Sincerbox, Glenn T. Counterfeit deterrent features for the next-generation cur-
rency design, vol. 472. National Academies Press, 1993.
[67] Suh, G Edward, O’Donnell, Charles W, and Devadas, Srinivas. Aegis: A single-
chip secure processor. Information Security Technical Report 10, 2 (2005), 63–73.
[68] Suh, G.E., and Devadas, S. Physical unclonable functions for device authentica-
tion and secret key generation. In Design Automation Conference, 2007. DAC
’07. 44th ACM/IEEE (June 2007), pp. 9–14.
[69] Suresh, V.B., and Burleson, W.P. Robust metastability-based trng design in
nanometer cmos with sub-vdd pre-charge and hybrid self-calibration. In Qual-
ity Electronic Design (ISQED), 2012 13th International Symposium on (March
2012), pp. 298–305.
[70] The HDF Group. Hierarchical Data Format, version 5, 1997-2016.
http://www.hdfgroup.org/HDF5/.
[71] Triantis, Dimitris P, Birbas, Alexios N, and Kondis, D. Thermal noise modeling
for short-channel MOSFETs. IEEE Transactions on Electron Devices 43, 11
(1996), 1950–1955.
[72] Tuyls, Pim, Schrijen, Geert-Jan, Sˇkoric´, Boris, Van Geloven, Jan, Verhaegh,
Nynke, and Wolters, Rob. Read-proof hardware from protective coatings. In
Cryptographic Hardware and Embedded Systems-CHES 2006. Springer, 2006,
pp. 369–383.
127
[73] Tuyls, Pim, and Sˇkoric´, Boris. Strong authentication with physical unclon-
able functions. In Security, Privacy, and Trust in Modern Data Management.
Springer, 2007, pp. 133–148.
[74] Van der Leest, Vincent, Preneel, Bart, and Van der Sluis, Erik. Soft decision error
correction for compact memory-based pufs using a single enrollment. In Crypto-
graphic Hardware and Embedded Systems–CHES 2012. Springer, 2012, pp. 268–
282.
[75] Vijayakumar, A., and Kundu, S. A novel modeling attack resistant PUF design
based on non-linear voltage transfer characteristics. In Design, Automation Test
in Europe Conference Exhibition (DATE), 2015 (March 2015), pp. 653–658.
[76] Xiao, Kan, Rahman, M.T., Forte, D., Huang, Yu, Su, Mei, and Tehranipoor,
M. Bit selection algorithm suitable for high-volume production of sram-puf. In
Hardware-Oriented Security and Trust (HOST), 2014 IEEE International Sym-
posium on (May 2014), pp. 101–106.
[77] Zhao, W., and Cao, Y. New generation of predictive technology model for sub-
45 nm early design exploration. IEEE Transactions on Electron Devices 53, 11
(Nov 2006), 2816–2823.
128
