Cryptographic application of physical unclonable functions (PUFs) by Guo, Yunxi
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
2018 
Cryptographic application of physical unclonable functions 
(PUFs) 
Yunxi Guo 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/etd 
 Part of the Computer Engineering Commons 
Recommended Citation 
Guo, Yunxi, "Cryptographic application of physical unclonable functions (PUFs)" (2018). Graduate Theses 
and Dissertations. 17456. 
https://lib.dr.iastate.edu/etd/17456 
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and 
Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please 
contact digirep@iastate.edu. 
Cryptographic application of physical unclonable functions (PUFs)
by
Yunxi Guo
A dissertation submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Major: Computer Engineering
Program of Study Committee:
Akhilesh Tyagi, Major Professor
Chris Chong-Nuen Chu
Tom Daniels
Yong Guan
Shashi K. Gadia
The student author, whose presentation of the scholarship herein was approved by the program of
study committee, is solely responsible for the content of this dissertation. The Graduate College
will ensure this dissertation is globally accessible and will not permit alterations after a degree is
conferred.
Iowa State University
Ames, Iowa
2019
Copyright c© Yunxi Guo, 2019. All rights reserved.
ii
TABLE OF CONTENTS
Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
CHAPTER 1. OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2. REVIEW OF LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Public-Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Physical Unclonable Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Arbiter PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Ring Oscillator PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Pelgrom’s Mismatch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER 3. METHODS AND PROCEDURES . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 BS-PUF based Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Encryption Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3 Single Block Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.4 Barrel shifter PUF design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.5 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 General PUF-Based Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 PUF Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Encryption Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Hybrid Key encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.4 Entropy Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.5 Speed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Variation Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Mismatch Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Layout overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Uniqueness Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Entropy of multi-block pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.2 Multi-Block APUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iii
CHAPTER 4. RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 BS-PUF performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Inter-chip Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 Inter-chip Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.3 Intra-chip Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.4 Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.5 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Encryption Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Modeling Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 General PUF-Based Public key Encryption . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Brute Force attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.2 Modeling Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.3 Side-channel Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.4 Speed verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.5 Overall Time Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Asymmetric Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Multi-Block APUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5.1 Block Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2 Reliability Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.3 Uniqueness test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.4 Reproducibility test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.5 Area & Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
CHAPTER 5. SUMMARY AND DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . 77
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
iv
LIST OF TABLES
Page
Table 4.1 INTRA-CHIP HD OF BS-PUFs LSB
(HD: Hamming distance; %: percentage of bit-stream pairs with certain HD) 51
Table 4.2 INTRA-CHIP HD OF BS-PUFs 2nd LSB
(HD: Hamming distance; %: percentage of bit-stream pairs with certain HD) 51
Table 4.3 NIST TEST RESULTS OF LSB RESPONSE . . . . . . . . . . . . . . . . . 53
Table 4.4 NIST TEST RESULTS OF 2nd LSB RESPONSE . . . . . . . . . . . . . . . 54
Table 4.5 LR ON LSB WITH 6 AND 8 STAGES BS-PUFs . . . . . . . . . . . . . . . 55
Table 4.6 LR ON 2ND LSB WITH 6 AND 8 STAGES BS-PUFs . . . . . . . . . . . . 55
Table 4.7 P (1) of Different kinds of PUFs . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 4.8 Entropy of Different kinds of PUFs . . . . . . . . . . . . . . . . . . . . . . . 60
Table 4.9 Communication Overhead of TCP/IP . . . . . . . . . . . . . . . . . . . . . . 66
Table 4.10 Block bias under different challenge input . . . . . . . . . . . . . . . . . . . 71
Table 4.11 Inter-die HD of different APUF structures . . . . . . . . . . . . . . . . . . . 74
Table 4.12 Area and power cost of different APUFs . . . . . . . . . . . . . . . . . . . . 75
v
LIST OF FIGURES
Page
Figure 2.1 Example of RSA algorithm, where public key (e) is 5; private key (d) is 11;
The modulus is 14, and the message sent is 2. . . . . . . . . . . . . . . . . . 6
Figure 2.2 (a) An Arbiter PUF composed of several multiplexers and an arbiter. (b)
Multiplexers are implemented using NAND gates for performance evalua-
tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 2.3 Conventional RO-PUF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 3.1 Encryption protocol with message encryption based on commutative PUFs
fBob and fAlice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 3.2 Cipher block chaining methods are used to encrypt (a) and decrypt (b)
messages. This prevents the adversary from identifying plaintext patterns;
it ensures identical blocks of plaintext encrypt to different ciphertext. . . . . 13
Figure 3.3 (1) Bob applies fBob and (2) sends the result to Alice.(3) Alice applies fAlice
and (4) sends the result to Bob.(5) Bob applies f−1Bob and(6) returns the
result to Alice.(7) Alice applies f−1Alice hoping to recover the message. Un-
fortunately, f−1 does not subtract delay from the correct bit in (5), (7); the
correct message is not received by Alice. This scheme fails to be commuta-
tive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 3.4 Invertible and Commutative PUF protocol: PUF1(fBob) and PUF2(fAlice)
illustrate the PUF composition and how barrel shifter PUF is used for en-
cryption and decryption processes. Assume both PUF1 and PUF2 are two
stages BS-PUFs, key1(PUF1) is (1, 0), key2(PUF2) is (0, 1). For PUF1, bit
x0 (x1) goes to output bit position y1 (y2). The encrypted bit output at y1
(y2) is x0 ⊕D(0, 1)m (x1 ⊕D(1, 2)m). D(i, i
′
)m is the mth least significant
bit of the delay from input bit i to the output bit i
′
. Permutator is added
after each PUF to shift each bit back to its original position after encryption. 17
Figure 3.5 Sharing a key allows both parties to perform the same permutation. This
ensures the delay is subtracted from the correct bit when performing the
inverse f−1PUFl for l = 1, 2. Entropy is added into public message by bit
shifting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 3.6 Block diagram of the delay test circuit with two propagation examples.
When key0 = 0 and key1 = 1, Input0 passes through the light grey path.
There is one bit shift at the first level and no shift at second level, Input0 →
Output1. When key0 = 1 and key1 = 0, Input0 passes through the dark
grey path. There is no shift at the first level and there is a two bit shift at
the second level, Input0 → Output2. . . . . . . . . . . . . . . . . . . . . . . 25
Figure 3.7 Schematic of 1-bit input logic. Each input bit is controlled by an input logic
unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 3.8 Shift Unit of barrel shifter. If key = 1 (key = 0), N1/P1 is on (N2/P2 is
off), then output equals Input A; otherwise, output equals Input B. . . . . . 27
vi
Figure 3.9 (a) D Flip-Flop – Triggered by rising edge.The output, Q, is high when
there is a rising edge at input, in. (b) Edge Detector – The output reflects
a transition at the input. (c) Positive edge trigger generator –Produces a
pulse in response to a positive edge at the input, in. (d) Output Logic
–Captures the path delay;this is provided to entanglement logic. . . . . . . . 29
Figure 3.10 The path delay capture unit tests for and stores the path delay. The edge
detector detects an output transition; S equal to output will not be de-
tected. Consequently, the transmission path receives S and S successively;
a transition at output is guaranteed. . . . . . . . . . . . . . . . . . . . . . . 30
Figure 3.11 Proposed PUF-based public-key encryption algorithm. (mod n) is omitted
for all in-flight messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 3.12 Challenge (public key) and response (private key) of delay-based PUF. No
correlation exists between private keys or between public and private keys. . 34
Figure 3.13 PUF-based public-key encryption algorithm support digital signing. (Pi
represent PUFi, K
+
i is the public key and K
−
i is the complement key) . . . 35
Figure 3.14 Sample set for HD calculation (M). It contains m, n-bit PUF responses.
M [i][j] represents the jth bit of the ith response. M(:, 2) is the 2nd bit of
all responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Figure 3.15 (a) Symmetric layout of 8-stage BS-PUF with 2 rows. (b) Asymmetric
layout of 8-stage BS-PUF with 2 rows. Block i denotes a shift unit along
the first row. i
′
is the corresponding second row shift unit. . . . . . . . . . 41
Figure 3.16 Balanced inner-die systematic bias is achieved through alternate bit assign-
ment. Odd challenge bits are arranged as top path in Al and bottom path
in Ar. Even bits are arranged as bottom path in Al and the top path in Ar. 42
Figure 3.17 Gate level implementation of a MUX (delay unit) . . . . . . . . . . . . . . . 43
Figure 3.18 Stick diagram of a MUX cell. The layout is minimized according to CMOS
layout rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.19 An example of 2-1 DAPUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.20 Entropy versus block bias in multi-block pattern . . . . . . . . . . . . . . . 46
Figure 3.21 (a) 256-stage 2-block Multi-Block APUF (b)256-stage 4-block Multi-Block
APUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Figure 4.1 Histogram for simulated forward path (x0 7→ y16) delay distribution. 25%
inter-chip variability is shown. . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 4.2 Percentage of bit flips under temperature variation. Flip rates demonstrate
signal-to-noise ratio (SNR) under different temperatures. Flip rates of LSB
are shown in blue. Flip rates of 2nd LSB are shown in green. . . . . . . . . 52
Figure 4.3 Percentage of bit flips under voltage variation. Flip rates of LSB are shown
in blue. Flip rates of 2nd LSB are shown in green. . . . . . . . . . . . . . . 53
Figure 4.4 Error rate of P (1) estimation. Estimation error rate dramatically decreases
with m and getting larger when P (1) is close to 50%. . . . . . . . . . . . . 59
Figure 4.5 Blinding technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.6 Time cost of RSA encryption/decryption with different modulus length . . . 63
Figure 4.7 Time cost of RSA encryption/decryption with different modulus length . . . 65
Figure 4.8 Overall time cost of RSA and proposed PUF based asymmetric cryptosys-
tem. (a) PUF based asymmetric encryption (b) PUF based digital signature. 68
vii
Figure 4.9 (a) Probability density of delay difference between two racing paths in 64-
stage APUF. (b) Distribution of delay difference between two racing paths
in 128-stage APUF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 4.10 Percentage of valid bits with symmetric and asymmetric layout. . . . . . . 70
Figure 4.11 Valid rate in traditional APUF with arbiter operating under different voltage
supplies. n is the length of selector chain. . . . . . . . . . . . . . . . . . . . 72
Figure 4.12 Hamming distance distribution of 256-stage APUFs and 128-stage 2-block
MBAPUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 4.13 Bit flip rate of different stages Arbiter PUFs under temperature variation . 75
viii
ACKNOWLEDGEMENTS
I would like to express my gratitude to my supervisor and mentor, Dr. Akhilesh Tyagi, who
welcomed me into his research group in 2015. He has taught me to have confidence in my research
and work, and has supported my participation in activities such as conferences and courses to
expand the breadth and depth of my knowledge. He has helped me to always see the big picture
and keep my goals in mind while focusing on trying to solve specific problems.
Besides my advisor, I would like to thank the rest of my thesis committee: Prof.Yong Guan,
Prof.Chris Chu, and Dr.Tom Daniels and Prof.Shashi Gadia, for their insightful comments and
encouragement, but also for the hard question which led me to widen my research from various
perspectives.
I thank my fellow labmates in hardware security research Group: Timothy Dee, Hala Hamadeh,
Ravikumar Selvam and Ananda Biswas, for the stimulating discussions, for the sleepless nights we
were working together before deadlines, and for all the fun we have had in the last three years. I
also want to thank the other students and researchers I have worked along-side in the lab over the
past years who have encouraged me and created an enthusiastic work environment. Also I thank
my friends in Iowa state University: Xinyao Li, Songtao Lu, Yifei Li and Xu Zhang.
My sincere thanks also goes to Ergin Seyfe and Brian Cho, for offering me the summer internship
opportunities in their groups and leading me to work on diverse exciting projects.
Last but not the least, I would like to thank my family: my parents Chun Guo and Yan Zhang,
for giving birth to me in the first place and supporting me spiritually throughout my life.
ix
ABSTRACT
Physical Unclonable Functions (PUFs) are circuits designed to extract physical randomness
from the underlying circuit. This randomness depends on the manufacturing process. It differs
for each device enabling chip-level authentication and key generation applications. This thesis has
performed research work about PUF based encryption and low power PUFs.
First, we present a protocol utilizing a PUF for secure data transmission. Each party has a PUF
used for encryption and decryption; this is facilitated by constraining the PUF to be commutative.
This framework is evaluated with a primitive permutation network - a barrel shifter. Physical
randomness is derived from the delay of different shift paths. Barrel shifter (BS) PUF captures
the delay of different shift paths. This delay is entangled with message bits before they are sent
across an insecure channel. BS-PUF is implemented using transmission gates; their characteristics
ensure same-chip physical commutativity, a necessary property of PUFs designed for encryption.
Post-layout simulations of a common centroid layout 8-level barrel shifter in 0.13 µm technology
assess uniqueness, stability and randomness properties. BS-PUFs pass all selected NIST statistical
randomness tests. Stability similar to Ring Oscillator (RO) PUFs under environment variation is
shown. Logistic regression of 100, 000 plaintext-ciphertext pairs (PCPs) failed to successfully model
BS-PUF behavior. Then we generalize this encryption protocol to work with PUFs other than the
BSPUFs.
On the other hand, we further explore some low power techniques for building PUFs. Asym-
metric layout improved unit path delay variation by as much as 73.2% and uniqueness problem
introduced by asymmetric layout is proved to be solvable through Multi-Block entanglement pat-
tern. By adopting these 2 techniques, power and area consumption of PUF can be reduced by as
much as 44.29% and 39.7%.
1
CHAPTER 1. OVERVIEW
Encryption/decryption algorithms form the backbone of modern public key infrastructure which
supports a broad set of activities such as e-commerce and digital currency. Mathematical cryp-
tosystems such as RSA [Takagi (1998)] can take millions of clock cycles. Even symmetric en-
cryption/decryption through AES takes 10-20 clock cycles. Moreover, even though their security
is predicated on a hard mathematical problem such as prime number factoring, a mathematical
model exists for an adversary [Boneh et al. (1999)]. Physical unclonable functions (PUFs) source
physical randomness of a silicon foundry with a potential appeal of unmodelable, physical func-
tions. They have been used to generate unique physical identities, and to seed key generation.
Such PUFs offer both inter-chip variability and same-chip reproducibility. The variability ensures
that distinct devices produce different outputs given the same input. Reproducibility, on the other
hand, is valuable for predictability and determinism in the device authentication behavior. PUFs
are designed to extract physical randomness from the underlying circuit. PUFs have the advantage
of generating chip-specific signatures. They are widely used in device-level authentication and key
generation applications [Suh and Devadas (2007)]. PUFs are hard to model and replicate by ma-
licious attackers; even the chip manufacturer cannot perfectly control process variation of a single
chip. As a result, PUFs based on complex physical systems provide significantly higher physical
security over the traditional systems which rely on storing secrets in nonvolatile memory. In ad-
dition, special manufacturing processes are not required to produce PUF devices. This advantage
makes PUF devices a cost-effective and reliable alternative to mathematical randomness sources.
So far, the use of PUFs in cryptography is somewhat limited - the most common being key
generation or random number generation. Chen used analog circuits to support cryptography with
some elements of PUF like randomness [Chen et al. (2009)]. Choi et al. deployed a variant of
arbiter PUF to replace symmetric encryption in RFID domain as an authentication mechanism
2
[Choi et al. (2010)]. This was based on the earlier work of Suh et al. that deployed PUFs for
anti-counterfeiting in RFIDs [Devadas et al. (2008)]. Che et al. described another authentication
protocol based on PUFs [Che et al. (2015)]. [Urbi Chatterjee and Mukhopadhyay (2016)] developed
an IoT communication protocol based on PUFs. [Kleber et al. (2015)] developed a code encryp-
tion engine based on PUFs for supporting a secure execution environment similar to AEGIS [Suh
et al. (2005)]. The key difference between a processor secure execution environment and general
encryption is that for the former scenario the processor platform is both the source and destination
for the communication. In a processor secure execution environment, both the sender and receiver
have access to the same physical PUF on the same platform. However, for general encryption, this
assumption is violated. Both the sender and receiver possess distinct and different PUFs. We show
a general communication protocol based on commutative PUFs.
The key contributions of this work are: (1)Explored several PUFs based information exchange
protocols which serves to encrypt/decrypt information, find the best protocol through analysis; (2)
this protocol requires PUFs to be physically commutative. Developed a framework for physically
commutative PUFs based on permutation networks; (3) Evaluated permutation networks based
physically commutative PUF framework with a primitive permutation network using barrel shifters
(Barrel Shifter PUFs). Barrel shifters have symmetric input to output path delays. Hence if two
different paths within the same barrel shifter generate randomly uncorrelated delays, it is a strong
lower bound for randomness in general permutation networks with more skewed path delays; (4) The
results show good same chip, same path delay reproducibility; good differentiation between different
chip, same path delay and same chip, different path delay; delays within 1-bit accuracy for the logic
high and logic low propagation through the same path demonstrates physical commutativity; and
good pseudo-random number generator properties for delay. (5) Developed a general PUF based
public key encryption protocol. Evaluated its performance under different attacks.
The performance of proposed commutative PUFs (BS-PUFs) is limited by the amount of pro-
cess variation. Although various attempts have been made to improve PUFs random variation from
a circuit design aspect, little work has been done to enhance manufacturing mismatch through
3
transistor-level layout. In this work, we propose an asymmetric layout strategy for improving
inter-die transistor-level mismatch of a PUF circuit in an area neutral manner. The mechanism
separates delay paths in BS-PUFs by an appropriate distance. Corresponding variation enhance-
ment is quantified according to Pelgrom’s mismatch model. The asymmetrical layout, on one hand
increases the variation amount, introduces systematic bias and therefore harms PUFs’ uniqueness
performance. This means that a specific batch of PUFs will give similar responses for the same
challenge. Attackers could unitize this to predict one PUF’s response with the model built from
same batch PUF’s CRPs..
By now, substantial effort has been devoted to improve uniqueness performance of delay based
PUFs, especially for APUFs. Machida et al. proposed a Double APUF (DAPUF) structure which
improves uniqueness by XORing responses of duplicate selector chains [Machida et al. (2015)].
[Machida et al. (2014)] provided further analysis of DAPUF and its correct mode of operation.
Fruhashi et al. developed a variant of the arbiter circuit with delay-time measurement for entropy
enhancement [Fruhashi et al. (2011)]. Kumar et al. developed another unique and reliable APUF
based on current starved inverter chain [Kumar et al. (2011)]. Gu et al. proposed a high entropy
arbiter-based FPGA identification generator utilizing an array of N 1-bit APUFs [Gu et al. (2014)].
Both [Machida et al. (2015)] and [Gu et al. (2014)] clarified that impact of asymmetric bias
could be eliminated by combining the response of multiple duplicate PUF blocks. This kind of
design efficiently improves uniqueness performance and it could also be applied on BS-PUF. The
only disadvantage is it consumes a lot of area and power. To solve the uniqueness problem cre-
ated by asymmetric layout, we introduce Multi-Block design pattern which could achieve entropy
enhancement without doubling area. In this design, a delay path is divided into multiple blocks to
avoid accumulation of systematic variation. Different voltage supplies are chosen for delay paths
and arbiter/counter circuit to overcome reliability problems produced by short paths. It’s effect is
assessed by multi-stage APUFs. Cadence Monte Carlo sampling shows the proposed Multi-Block
(MB-) APUFs provide inter-chip uniqueness and reproducibility similar to Double APUF (DA-
4
PUF); compared to DAPUF with similar uniqueness performance, MBAPUFs decrease area and
power consumption by a factor of 2. The Multi-Block pattern can also be applied on BS-PUFs.
5
CHAPTER 2. REVIEW OF LITERATURE
2.1 Public-Key Cryptography
Public-key cryptography [Bellare and Rogaway (1994)], also called asymmetric cryptography,
uses public and private keys to achieve encryption and decryption. The private and public keys
are paired together but not identical. Sending a message requires the sender to encrypt it using
the receiver’s public key. The receiver uses their private key to decrypt the message. Certificates
provide a secure method for disclosing public keys. Certificates signed using the private key of a
trusted Trusted Third Party (TTP) are frequently used as a root of trust. The TTP is the only
entity capable of generating valid certificates. The public key can be used to verify the certificate
ensuring public key integrity.
RSA is a widely used asymmetric cryptosystem enabling secure data transmission. Each party
has a public key, e, and a private key, d. d is the modular multiplicative inverse of e modulo n.
Therefore, a message m encrypted with e or d can be decrypted with the other. Encryption with
d is used to sign certificates in the case of digital signatures. Encryption with a receiver’s public
key provides a method to encrypt messages sent to the receiver. The sender encrypts message as
c = me (mod n). After receiving c, receiver can recover m from c by computing cd = (me)d = m
(mod n).
An example of RSA algorithm is shown in Fig. 2.1. In this example, we select 2 prime numbers
p = 2 and q = 7 and modulus n = p × q = 14. Related λ(n) = lcm(6, 1) = 6. 5 < λ(n) = 6 is
selected as receiver’s public key e. The corresponding private key d must satisfy d · 5 (mod 6) = 1.
11 is the minimum integer meeting this requirement. Thus, the sender holds (e, n) = (5, 14) and
the receiver holds (d, n) = (11, 14).
Four conclusions can be drawn from this key generation algorithm. (1) d · e = 1 (mod λ(n))
(2) λ(n) can be calculated with p and q (3) p× q = n (4) the modulus n is public. Since e is public
6
Public Keys
(5, 14)
Private Keys
(11, 14)
2
5 mod(14) = 4 411 mod(14) = 2
Message
2
Encrypted Message Decrypted Message
Sender Receiver
Figure 2.1 Example of RSA algorithm, where public key (e) is 5; private key (d) is 11;
The modulus is 14, and the message sent is 2.
key, the attacker can obtain private key d according to (1) as long as λ(n) is known. The most
straight-forward brute-force attack on a private key d is to try all prime factor pairs (p, q) such
that p × q = n. This allows one to calculate d on the basis of (2) and (1). The supra-polynomial
complexity of this problem is the basis for RSA’s security guarantees.
The relationship between e and d in RSA provides attack vectors. Several different sieve algo-
rithms (e.g. quadratic sieve [Davis and Holdridge (1984)], number field sieve [Cavallar et al. (1999)])
have been tried for modulus factorization. According to the latest experiment result [Weisstein
(2003)], even a 576-bit modulus can be successfully factored using the general number field sieve
algorithm. Securing message content requires increasing modulus and key lengths. However, this
increases encryption computation time and power consumption. Our proposed PUF-based asym-
metric encryption protocol enforces no mathematical relationship between e and d. Pseudo-random
PUF responses are part of private keys making brute force attacks more difficult than in RSA.
2.2 Physical Unclonable Function
A physical unclonable function, or PUF, is a digital fingerprint that serves as a unique identity
for a semiconductor device such as a microprocessor. PUFs are based on physical variations which
occur naturally during semiconductor manufacturing, and which make it possible to differentiate
7
between otherwise identical semiconductors. Today, PUFs are usually implemented in integrated
circuits and are typically used in applications with high security requirements.
PUFs’ performance metrics consist of three properties [Maes (2016)]: uniqueness, unpredictabil-
ity and reproduciblity.
• Uniqueness is the independence of responses to the same challenge on different PUFs. Unique-
ness is usually assessed by hamming distance of the responses Ri, Rj of two chips for challenge
C, averaged over k chips (inter-chip HD). The idea value of inter-chip HD is 50%
• Unpredicability is the uncertainty of responses to new challenges; observing same and/or
similar PUF responses to other challenges yields no increase in certainty.
• Reproducibility corresponds to the capability of generating similar responses to the same
challenge on the same PUF under different environments. Reproducibility could be evaluated
by hamming distance of the responses Ri, Ri
′ of chip i at different operating conditions,
averaged over m samples (intra-chip HD).
All of these three performance metrics rely on process variation introduced by the PUF circuit.
Uniqueness is driven by unbiased distribution of process variation. Unpredicability and reproducibil-
ity benefits from large process variation. Enhancing process variation per circuit block attains good
unpredicability and reproducibility with fewer transistors.
2.3 Arbiter PUF
One of the most reliable and frequently used delay-based PUFs is the Arbiter PUF (APUF) [Suh
and Devadas (2007)]. The APUF is composed of two identical delay paths and an arbiter circuit;
these are usually implemented using a set of multiplexers and edge-triggered flip-flops respectively.
The arbiter circuit detects the signal arrival sequence (top or bottom signal in Fig. 2.2(a)). The
propagation paths are determined by a set of external inputs called a challenge (c[1], c[2], ...c[n]).
At each stage, the two signals follow straight paths when the challenge bit c[i] = 1. c[i] = 0 causes
8
Figure 2.2 (a) An Arbiter PUF composed of several multiplexers and an arbiter. (b)
Multiplexers are implemented using NAND gates for performance evaluations.
the top and bottom path signals to switch. The same signal propagates through both of these
paths. The racing signals’ arrival times (TA and TB) determine one bit of the PUF response (rj).
rj =
 1 if TA > TB0 else (2.1)
An example of an n-stage APUF is shown in Fig. 2.2. Each multiplexer is called a delay unit.
Each pair of delay units (i and i′) is considered as one stage. For layout evaluation, each multiplexer
is composed of three NAND gates with identical layout.
Nowadays, most of the delay-based PUFs, such as arbiter PUFs [Suh and Devadas (2007)] and
butterfly PUFs [Kumar et al. (2008)], are implemented on an FPGA platform. However, FPGA
platform is not ideal for APUFs because asymmetric routing dominates delay skew in FPGA devices
instead of process variation [Morozov et al. (2010)]. Hence, ASIC implementation would be a better
alternative for APUFs.
9
2.4 Ring Oscillator PUF
Another widely used delay based PUF is Ring Oscillator PUF (RO-PUF) [Maiti et al. (2010);
Cao et al. (2015); Yin et al. (2013)]. Fig. 2.3 presents a conventional RO-PUF which consists of
N identically laid out ROs, two counters, a comparator, and two N -bit multiplexers. Each of
the identical ROs oscillate with unique frequency because of the devices manufacturing process
variations. The input to the PUF (challenge) is applied to both MUXs so that one pair of ROs
is selected. The counters count the number of oscillations for a fixed time interval known as
comparison time. After comparison time, the outputs of the counters are compared to generate a
response. The output of the comparator is set to 0 or 1 based on which oscillator from the selected
RO pair is faster.
Two counter blocks count the number of oscillations of each of the two ROs in a fixed time
interval (comparison time). At the end of the interval, the outputs of the two counters are compared
together. Depending on which of the two counters has the highest value, the output of the PUF is
set to ′0′ or ′1′. The output of the PUF is set to ′0′ if the first ring oscillator in the pair is faster
than the second (the value of the first counter is higher than that of the second), and to ′1′ if it
is slower (the value of the first counter is lower than that of the second). If the two frequencies
are very close to each other, the output of the PUF may variate unpredictably from run to run.
It is however possible to improve the accuracy of the PUF by using larger counters and longer
comparison time intervals [Mansouri and Dubrova (2012)].
2.5 Pelgrom’s Mismatch Model
The variation enhancement technique based on asymmetric layout mentioned in Section 1 is
inspired by Pelgrom’s mismatch model [Pelgrom et al. (1989)]. Pelgrom’s mismatch model [Conti
et al. (1999)], defines process variation of transistors as:
σ2(∆P ) =
A2∆P
WL
+ S2∆PD
2, (2.2)
10
Figure 2.3 Conventional RO-PUF.
where σ2(∆P ) is the standard deviation of MOSFET modeling parameter P ; A∆P and S∆P
are process mismatch coefficients; W and L are width and length of the transistor gate; D is the
spacing between two transistors.
In this model, process variation consists of two terms. The first term is random variations or
”white noise”; these are layout agnostic and inversely proportional to transistor size. The latter
term, systematic variation, increases with distance between transistors. Layout modifications affect
this distance.
Conventional PUF circuits are designed to maximize random variation and minimize systematic
variation. Large systematic variation, such as mismatch caused by the linear gradients effect, adds
doping-pattern-related bias into PUF responses. Thus, uniqueness suffers. Random variation is
maximized by minimizing transistor size. Systematic variation is reduced using a common centroid
layout [Long et al. (2005); Bastos et al. (1996)].
11
CHAPTER 3. METHODS AND PROCEDURES
In this section, we describe the detail structure of BS-PUF, discusses related communication and
encryption protocols, provide the variation enhancement and uniqueness improvement techniques
that could be applied on PUFs with cryptographic usage.
3.1 BS-PUF based Encryption
PUF based encryption requires specific communication and encryption protocols. Several dif-
ferent PUF structures could be employed in those protocols [Guo et al. (2018a)].
3.1.1 Communication Protocol
Fig. 3.11 depicts PUF based communication protocol, where Bob is the sender and Alice is the
receiver. Both Bob and Alice have their own PUF. If Bob encrypts his message m with his PUF
as fBob(m), Alice has no way to decrypt it except to ask Bob to decrypt it for her. The following
protocol overcomes this asymmetry.
1. Bob encrypts the message m with fBob.
2. Bob sends fBob(m) to Alice.
3. Alice encrypts fBob(m) with fAlice. (At this point, Alice does not know the message m.)
4. Alice sends fAlice(fBob(m)) to Bob.
5. Bob decrypts fAlice(fBob(m)) with f
−1
Bob and obtains fAlice(m).
6. Bob sends fAlice(m) to Alice.
7. Alice decrypts fAlice(m) with f
−1
Alice and obtains the message m.
12
Figure 3.1 Encryption protocol with message encryption based on commutative PUFs fBob
and fAlice.
Message confidentiality is maintained by entangling message bits with physical randomness.
The entangling process must be commutative so that the order of fAlice and fBob can be changed.
Decryption of entangled messages requires reversibility. The entangled message m′ must exhibit a
non-linear relationship with m; this makes it hard for an eavesdropper to learn m by examining
intermediate messages.
The circuit design and encryption protocol enable the commutative, invertible, and non-linear
relationship properties of messages. Section 3.1.2 describes a mechanism for BS-PUF-based en-
cryption. The BS circuit design is detailed in Sections 3.1.4, 3.1.5.
3.1.2 Encryption Protocol
Encryption must entangle the physical randomness of BS-PUF with the message. Physical
randomness is extracted by measuring the delay of message bits along a shift path. An XOR of the
message bits and delay accomplishes entanglement; this allows for commutativity and reversibility.
3.1.2.1 Encrypting Large Messages
A BS-PUF uses an n-bit key as shift amount. This allows for a a 2n-bit BS-PUF challenge
(message) resulting in a 2n-bit BS-PUF response. Alternately, one could view (n − bit key, 2n −
13
Figure 3.2 Cipher block chaining methods are used to encrypt (a) and decrypt (b) mes-
sages. This prevents the adversary from identifying plaintext patterns; it en-
sures identical blocks of plaintext encrypt to different ciphertext.
bit message) as a challenge. We take the former 2n-bit challenge view in this thesis. For a barrel-
shifter, practical values for n are limited to be in the range 7− 10 bits leading to a message block
size of 128−1024 bits. This means that a method of entanglement/encryption for plaintexts greater
than 2n bits is needed.
Entanglement could occur by serializing the blocks of plaintext at BS-PUF input and concate-
nating the generated ciphertexts. However, this approach reveals patterns in the plaintext; the
same plaintext will always encrypt to the same ciphertext. This leaks information by allowing an
adversary to identify plaintext patterns.
The technique of cipher block chaining (CBC) [Bace (2007); Bellare et al. (1994, 2000)] is
typically applied in block ciphers such as AES [Daemen and Rijmen (2013)]. Like AES, BS-PUF
encrypts a fixed number of plaintext bits. Thus, it can be viewed as a block cipher. A practical
barrel-shifter or permutation network implementation could consist of 128-1024 bit blocks.
Fig. 3.2 applies CBC to two blocks of plaintext. Before applying BS-PUF, the plaintext pi is
XOR’ed with the previous ciphertext ci−1. The output of BS-PUF using key K, BS−PUF (pi,K),
14
is the ciphertext, c′i. Thus, encryption of the i
th block is ci = BS −PUF (pi⊕ ci−1,K). The result
is a cipher text c1||c2|| . . . ||cm for m blocks where || denotes concatenation.
c0 is an initialization vector (IV). This IV must be updated with each message; otherwise
the same plaintext will encrypt to the same ciphertext. This would again allow an eavesdropper
to identify patterns. Unlike traditional CBC algorithms, IV for BS-PUFs based encryption does
not need to be public because ciphertext will be sent back to sender for decryption. It could be
generated with any PUF, e.g. SRAM PUFs [Holcomb et al. (2009)].
Decryption utilizes BS-PUF’s inverse. pi is recovered by the reverse process. Ciphertext ci
is given to the inverse BS-PUF operation. The ⊕ of the output and ci−1 is then taken. Thus,
decryption of the ith block is pi = BS − PUF−1(ci,K)⊕ ci−1.
Message encryption requires a secret key. The key determines the bit shift path; it is used as
shift amount. The BS-PUF response depends both on the challenge (plaintext) and the key. The
key does not change as frequently as the plaintext does.
Some of the desirable characteristics of BS-PUF are as follows. BS-PUF is fast. Encryption
takes multiple rounds with a traditional block cipher. BS-PUF makes only one pass through the
shifter or permutation hardware.
3.1.3 Single Block Encryption
In this subsection, several permutation schemes are discussed for single block encryption.
3.1.3.1 Asymmetric Key Encryption
Encrypting without a shared key is ideal.
Section 3.1.1 dictates invertibility and commutativity as communication protocol requirements.
PUF f must be a one-to-one function to achieve encryption and invertibility for decryption.
Many classical PUFs, such as RO-PUFs [Mansouri and Dubrova (2012); Yin and Qu (2010); Maiti
and Schaumont (2009, 2011)] and arbiter PUFs [Hori et al. (2010); Tajik et al. (2014)], cluster the
challenges into equivalence classes on a set of attributes resulting in the same response per challenge
15
Figure 3.3 (1) Bob applies fBob and (2) sends the result to Alice.(3) Alice applies fAlice
and (4) sends the result to Bob.(5) Bob applies f−1Bob and(6) returns the result
to Alice.(7) Alice applies f−1Alice hoping to recover the message. Unfortunately,
f−1 does not subtract delay from the correct bit in (5), (7); the correct message
is not received by Alice. This scheme fails to be commutative.
16
equivalence class. Arbiter PUF uses relative bit arrival time as the clustering attribute. RO PUF
uses relative oscillator frequencies. The end result is that this makes these PUFs not invertible,
since the mapping is many-to-one.
Further note that physical invertibility is distinct from logical invertibility. A mathematical one-
to-one function has logical invertibility, but may not be physically invertible. Physical invertibility is
applicable to the PUF physical attribute measurement process. In the forward computation, inputs
traverse the computation paths to the output; physical measurements may take place at various
points along these paths. In the inverse computation, output bits travel to the inputs through the
identical computation paths in reverse. The physical measurements of the same physical attribute
occur in the inverse computation. These forward and inverse physical measurements need to be
reproducible at all measurement points from input to output.
Permutation functions provide the necessary one-to-one relationship. Permutations create a
non-linear relationship from input bits to output bits. Due to this property, an adversary cannot
create a useful mathematical model describing the input, output relationship. For a n-bit data,
there exist N = n! permutations denoted by π0, π1, ...πN−1. Each πi captures some permutation
(i0, i1, . . . , in−1), where bit k 7→ ik. In other words, the bit at 0 is routed to bit position i0 in the
output. A key K is used to select this mapping. We call this a keyed PUF: Ri,K = f(K,Pi). The
PUF response is derived from the shift path delay.
The protocol requires the entanglement procedure to be commutative. Entanglement adds a
bit from the delay of each path to the plaintext. Thus, entanglement is expressed as f(KBob, Pi) =
Pi ⊕DBob. This is commutative because ’⊕’ is commutative. Note that the entanglement between
the physical delay attribute and logical bits can occur at multiple points during the flight of message
bits from input to output; each measurement point is also an entanglement point.
Our first version of encryption protocol is based on invertible and commutative PUFs.
Invertibility requires using a raw physical property like delay. The reversible computation
principle states that any information loss makes a process irreversible [Bennett and Landauer
(1985)]. Many PUFs derive their response through the comparison of physical properties. Arbiter
17
Figure 3.4 Invertible and Commutative PUF protocol: PUF1(fBob) and PUF2(fAlice) il-
lustrate the PUF composition and how barrel shifter PUF is used for encryp-
tion and decryption processes. Assume both PUF1 and PUF2 are two stages
BS-PUFs, key1(PUF1) is (1, 0), key2(PUF2) is (0, 1). For PUF1, bit x0 (x1)
goes to output bit position y1 (y2). The encrypted bit output at y1 (y2) is
x0 ⊕D(0, 1)m (x1 ⊕D(1, 2)m). D(i, i
′
)m is the mth least significant bit of the
delay from input bit i to the output bit i
′
. Permutator is added after each PUF
to shift each bit back to its original position after encryption.
18
PUF uses a race between two paths. RO-PUF uses a frequency comparison. These comparisons
provide reproducibility by including a wide margin of noise before comparison output changes, but
information is lost.
The proposed PUF is based on a barrel shifter. Constructing it with precisely sized transmission
gates makes its delay independent of bit state 0 or 1. Bit propagation delay for forward path and
inverse path is remarkably stable and consistent regardless of bit state. This is due to symmetric
physical structure of MOSFET’s source and drain. As we discuss in the following, physical com-
mutativity and invertibility in our protocol is only achieved if the physical delay on the paths is bit
state independent. The Step 5 of Fig. 3.11 when Bob computes f−1Bob is dealing with a different bit
pattern at the output of Bob’s PUF than what was computed in Step 1 at Bob’s PUF’s output.
This is because the Step 5 bit pattern has an additional permutation applied to it by Alice, which
is not known to Bob. An alternative implementation could have used pass transistors. However,
it is hard to equalize the delay for 0 and 1 through a pass transistor. Thus, transmission gates are
used to make the delay plaintext independent.
Asymmetric key encryption protocol in Section 3.1.1 is based on invertible and commutative
BS-PUFs; which are defined as follows:
Invertible PUF: An invertible keyed PUF f on input x and key K: for f(x,K) = y =⇒
f−1(y,K) = x, where f−1 is computed on the same PUF in the reverse direction. Note that the
PUF function f entangles a logical component and a physical component, and both need to be
invertible.
PUFs designed to be used directly for encryption need two input sequences: (1) key for response
function selection as in a permutation selector, (2) plaintext to be encrypted.
Commutative PUF: Assume there is a composition of two commutative PUFs, PUF1 and PUF2.
This means PUF2(PUF1(x)) = PUF1(PUF2(x)). Note that both logical and physical commu-
tativity are needed for such a commutative PUF. For BS-PUF, the entanglement function must
be commutative for physical commutativity in addition to the physical measurements being the
same in PUF2(PUF1(x)) and PUF1(PUF2(x)); this requires the physical measurements to be
19
invariant of the bit state. The physical measurements are completely defined by the key K for a
given PUF.
3.1.3.2 Protocol Without Permutation
In the first version of design, each PUF fPUF1 and fPUF2 is a permutation network keyed by
key1 and key2 respectively. Key key1 selects a permutation πkey1 from a large set of possible
permutations - Keccak permutation [Bertoni et al. (2016)], [Bertoni et al. (2011)] could be used for
instance. The implementation, however, needs to be physically and logically reversible consisting
of transmission gates. We assume that for a permutation πkey1 which maps ith input bit to the i
′
th
output bit and jth input bit to j
′
th output bit, we capture the exact delays for each input-output
path. Let D(i, i
′
) denotes the delay of the path from input i to output i
′
for πkey1 in fPUF1 . Let
D(j, j
′
) be defined likewise. We will describe how we can capture these delays by using timer
capture and edge detector functions in Section 3.1.5.
For each PUF, the output bit yi can be expressed as an entanglement function
e(xπ−1key(j)
, D(π−1key(j), j)) (3.1)
Here e is an entanglement function between the bit routed to output j (xπ−1key(j)
) and the delay
of this path from π−1key(j) to j. The delay D(π
−1
key(j), j) can be quantized to any resolution of k bits.
If we use all of the k bits of D(π−1key(j), j) to do encryption at the jth output bit, we expand the
n-bit input to an nk-bit output. Assuming we want to retain the same output resolution of n-bits,
one option would be to perform an XOR (⊕) of the mth bit of D(π−1key(j), j) with the input bit
xπ−1key(j)
to generate yj leading to the entanglement function yj = e(xπ−1key(j)
, D(π−1key(j), j)m). XOR
is a good choice because it is commutative and associative. Since the least significant bit (LSB)
and 2nd LSB of D(π−1key(j), j) is likely least correlated with the delay of other paths, we have used
them in entanglement. The corresponding simulation results are shown in Section 4.1.
Let us assume that the delays of the permutation function πkey1 in fPUF1 are denoted by
D(π−1key1(j), j) for a path from input π
−1
key1
(j) to output j and the delays of the permutation function
20
πkey2 in fPUF2 are denoted by d(j, πkey2(j)) for a path from input j to output πkey2(j). Assume
that π−1key1(j) = i, πkey2(j) = k, then the output zk = (xi ⊕D(i, j)m) ⊕ d(j, k)m is generated. The
mth least significant bit of PUF2’s delay captured by the d function is XORed with fPUF1 ’s output.
Clearly, the RHS of expression zk = (xi ⊕D(i, j)m) ⊕ d(j, k)m is commutative due to commu-
tativity of operator ⊕ - it does not matter whether fPUF1 is applied first or fPUF2 is applied first.
However, this commutativity statement is only correct for a specific bit routing, but incorrect for
encrypted data.
Consider PUF1 with πkey1 = (0 7→ 1, 1 7→ 2, 2 7→ 3, 3 7→ 0) for a 4 bit input x0, x1, x2, x3 and
PUF2 with πkey2 = (0 7→ 2, 1 7→ 3, 2 7→ 0, 3 7→ 1). Composition of fPUF1 ◦ fPUF2 = (0 7→ 1, 1 7→
2, 2 7→ 3, 3 7→ 0) ◦ (0 7→ 2, 1 7→ 3, 2 7→ 0, 3 7→ 1) = (0 7→ 3, 1 7→ 0, 2 7→ 1, 3 7→ 2). By going over
the communication protocol in Fig. 3.11 step by step, a defect becomes apparent. The complete
verification process is shown in Fig. 3.3.
In the following analysis, permutations are abbreviated according to output positions for sim-
plicity. e.g. (0 7→ 1, 1 7→ 2, 2 7→ 3, 3 7→ 0) is abbreviated to (1, 2, 3, 0). Assume πPUF1 = (1, 2, 3, 0)
and πPUF2 = (2, 3, 0, 1).
• Step 1: Apply fPUF1 to (x0, x1, x2, x3) resulting in (1, 2, 3, 0)(x0, x1, x2, x3), which equals
(x3 ⊕D(3, 0)m, x0 ⊕D(0, 1)m, x1 ⊕D(1, 2)m, x2 ⊕D(2, 3)m).
• Step 3: Apply fPUF2 to fPUF1 ’s output as in (2, 3, 0, 1)(1, 2, 3, 0)(x0, x1, x2, x3). This equals
(x1⊕D(1, 2)m⊕ d(2, 0)m, x2⊕D(2, 3)m⊕ d(3, 1)m, x3⊕D(3, 0)m⊕ d(0, 2)m, x0⊕D(0, 1)m⊕
d(1, 3)m).
• Step 5: Now invert the output. Apply f−1PUF1 to (2, 3, 0, 1)(1, 2, 3, 0)(x0, x1, x2, x3). f
−1
PUF1
re-
sults in (1, 2, 3, 0)−1(2, 3, 0, 1)(1, 2, 3, 0)(x0, x1, x2, x3) which equals (x2⊕D(2, 3)m⊕d(3, 1)m⊕
D
′
(0, 1)m, x3 ⊕ D(3, 0)m ⊕ d(0, 2)m ⊕ D
′
(1, 2)m, x0 ⊕ D(0, 1)m ⊕ d(1, 3)m ⊕ D
′
(2, 3)m, x1 ⊕
D(1, 2)m⊕ d(2, 0)m⊕D
′
(3, 0)m). D
′
(i, i
′
) denotes the backward path delay from output i
′
to
input i. According to post-layout simulations, D
′
(i, i
′
) is always equal to D(i, i
′
) in BS-PUFs.
21
• Step 7: Further applying f−1PUF2 as in (2, 3, 0, 1)
−1(1, 2, 3, 0)−1(2, 3, 0, 1)(1, 2, 3, 0)(x0, x1, x2,
x3) results in (x0 ⊕ D(0, 1)m ⊕ d(1, 3)m ⊕ D
′
(2, 3)m ⊕ d
′
(0, 2)m, x1 ⊕ D(1, 2)m ⊕ d(2, 0)m ⊕
D
′
(3, 0)m⊕d
′
(1, 3)m, x2⊕D(2, 3)m⊕d(3, 1)m⊕D
′
(0, 1)m⊕d
′
(2, 0)m, x3⊕D(3, 0)m⊕d(0, 2)m⊕
D
′
(1, 2)m⊕ d
′
(3, 1)m). This logical result is correct in routing xi back to the ith bit position,
but the physical delay terms are completely mixed up and do not cancel each other.
3.1.3.3 Protocol With Permutation
In order to ensure the correct routing and commutativity, we modify the original permutation
protocol by adding a permutation after each PUF. The primary function of this permutation is
routing xi back to the ith position from position πkey1(i) before sending the message at the end
of Step 1. The complementary key, key1, that results in the permutation π
−1
key1
is used; it routes
bits back to their original position. Mathematically, (πkey1 ◦ (πkey1 = π
−1
key1
)) = 1 where 1 is the
identity permutation. Bit shifting to restore the orginal message bit order is the only function of
this permutation. No delay is added.
An example of this protocol is shown in Fig. 3.4 with the following detailed description.
• Step 1: fBob permutes x0, x1, x2, x3 as in (1, 2, 3, 0)(x0, x1, x2, x3). It computes the physical
delay encrypted bit vector, (x3⊕D(3, 0)m, x0⊕D(0, 1)m, x1⊕D(1, 2)m, x2⊕D(2, 3)m). Before
sending it to Alice, Bob’s complementary permutation, called permutator in Fig. 3.4 is applied
to generate (x0 ⊕D(0, 1)m, x1 ⊕D(1, 2)m, x2 ⊕D(2, 3)m, x3 ⊕D(3, 0)m).
In this new permutation protocol, the logical permutation does not add to the confusion at
all unlike in AES or Keccak protocols. Confusion is achieved from the permuted physical
delay properties of the PUF. Which Path delay bits are combined with each input bit is still
hidden (through confusion) from the adversary through key driven π.
• Step 3: fAlice is applied as (2, 3, 0, 1)(x0 ⊕ D(0, 1)m, x1 ⊕ D(1, 2)m, x2 ⊕ D(2, 3)m, x3 ⊕
D(3, 0)m), resulting in (x2 ⊕ D(2, 3)m ⊕ d(2, 0)m, x3 ⊕ D(3, 0)m ⊕ d(3, 1)m, x0 ⊕ D(0, 1)m ⊕
d(0, 2)m, x1 ⊕D(1, 2)m ⊕ d(1, 3)m). Applying Alice’s complementary permutation results in
22
(x0⊕D(0, 1)m⊕ d(0, 2)m, x1⊕D(1, 2)m⊕ d(1, 3)m, x2⊕D(2, 3)m⊕ d(2, 0)m, x3⊕D(3, 0)m⊕
d(3, 1)m).
• Step 5: Apply f−1Bob to (x0 ⊕D(0, 1)m ⊕ d(0, 2)m, x1 ⊕D(1, 2)m ⊕ d(1, 3)m, x2 ⊕D(2, 3)m ⊕
d(2, 0)m, x3 ⊕D(3, 0)m ⊕ d(3, 1)m).
Decryption follows a similar process. However, the direction of message transmission is re-
versed and the inverse permutations are used. This is where physical invertibility helps recover
the original forward delay vector in the reverse direction.
Thus, (1, 2, 3, 0)(2, 3, 0, 1)(x0, x1, x2, x3)) is rearranged by Bob’s permutator first. This is
(x3⊕D(3, 0)m⊕ d(3, 1)m, x0⊕D(0, 1)m⊕ d(0, 2)m, x1⊕D(1, 2)m⊕ d(1, 3)m, x2⊕D(2, 3)m⊕
d(2, 0)m). This rearranged result is given to to PUF1 resulting in (x0⊕D(0, 1)m⊕d(0, 2)m⊕
D
′
(0, 1)m, x1 ⊕ D(1, 2)m ⊕ d(1, 3)m ⊕ D
′
(1, 2)m, x2 ⊕ D(2, 3)m ⊕ d(2, 0)m ⊕ D
′
(2, 3)m, x3 ⊕
D(3, 0)m ⊕ d(3, 1)m ⊕D
′
(3, 0)m).
Transmission gates show symmetric delays for forward and backward paths; D(i, j) always
equals D
′
(i, j).
Thus, the delay terms cancel. The result after applying f−1Bob is equal to (x0 ⊕ d(0, 2)m, x1 ⊕
d(1, 3)m, x2 ⊕ d(2, 0)m, x3 ⊕ d(3, 1)m).
Step 7: f−1Alice is applied. First, Alice’s permutator will rotate the bits giving (x2⊕d(2, 0)m, x3⊕
d(3, 1)m, x0 ⊕ d(0, 2)m, x1 ⊕ d(1, 3)m).
Rotated bits are then given to PUF2 in the reverse direction resulting in (x0 ⊕ d(0, 2)m ⊕
d
′
(0, 2)m, x1 ⊕ d(1, 3)m ⊕ d
′
(1, 3)m, x2 ⊕ d(2, 0)m ⊕ d
′
(2, 0)m, x3 ⊕ d(3, 1)m ⊕ d
′
(3, 1)m). The
delay terms cancel. Alice receives the original message (x0, x1, x2, x3) sent by Bob.
3.1.3.4 Symmetric Key Encryption
The original protocol in Section 3.1.3.2 subtracted the delay from the incorrect bit in the
inverse permutation. The protocol shown in Section 3.1.3.3 solves the original problem. However,
it contains a fatal flaw; Using ⊕ for entanglement creates a linear relationship between messages in-
23
Figure 3.5 Sharing a key allows both parties to perform the same permutation. This
ensures the delay is subtracted from the correct bit when performing the inverse
f−1PUFl for l = 1, 2. Entropy is added into public message by bit shifting.
flight between Bob and Alice. An eavesdropper can retrieve the original message from the in-flight
messages.
Consider Fig. 3.4 as an example. The first bit in original message is x0. The encrypted first
bit sent from Bob to Alice in Step 2 is B′ = x0 ⊕D(0, 1).
Then from Alice to Bob in Step 4, B′′ = x0 ⊕ D(0, 1) ⊕ d(0, 2). The decrypted first bit sent
from Bob to Alice in Step 6 is B′′′ = x0 ⊕ d(0, 2). B′, B′′ and B′′′ are all public messages. An
eavesdropper can extract the original message by:
1. Inferring Bob’s PUF’s delay information by taking XOR of B′′ and B′′′. B′′ ⊕ B′′′ = x0 ⊕
D(0, 1)⊕ d(0, 2)⊕ x0 ⊕ d(0, 2) = D(0, 1).
2. Then the original message can be extracted by an XOR of B′ and Bob’s PUF’s delay, B′ ⊕
D(0, 1) = x0 ⊕D(0, 1)⊕D(0, 1) = x0.
In order to eliminate this problem, BS-PUF must permute bits in public messages, which we
could not do and yet preserve commutativity and invertibility. One possible solution that allows
permuted public messages while preserving commutativity and invertibility is to let Bob and Alice
share the same key. The corresponding protocol is shown in Fig. 3.5.
24
In the shared key protocol, Bob permutes the input message with πK entangling it with his delay.
Alice reverses the permutation using π−1K entangling it with her delay. Note that the shared key is
K. The bits are in their original positions in the message sent to Bob for decryption. Note that the
entanglement with both PUFs’ delays protects this message. The delay will be un-entangled from
the correct bits in the subsequent decryption steps. The bit order is different in the message from
Bob to Alice versus in the message from Alice to Bob. This avoids linear leakage of information in
XOR based equations on these two messages.
Details of the shared key scheme presented in Fig. 3.5 are as follows.
• Step 1: Bob permutes x0, x1, x2, x3 with π = (1, 2, 3, 0) and gets (x3 ⊕ D(3, 0)m, x0 ⊕
D(0, 1)m, x1 ⊕ D(1, 2)m, x2 ⊕ D(2, 3)m). It is sent to Alice without any further bit level
routing; this achieves bit-level confusion of the public message.
• Step 3: fAlice performs the reverse permutation π−1 of fBob and simultaneously applies
Alice’s delay (π−1 = (3, 0, 1, 2)). After fAlice is applied, all bits are rotated back to their
original position but each bit is encrypted with two physical delay values. In this example,
after applying fAlice we get (x0⊕D(0, 1)m⊕d(1, 0)m, x1⊕D(1, 2)m⊕d(2, 1)m, x2⊕D(2, 3)m⊕
d(3, 2)m, x3 ⊕D(3, 0)m ⊕ d(0, 3)m).
• Step 5: f−1Bob is applied. Permutation π is applied again and delay added in Step 1 is cancelled
by XOR. Then message sent to Alice is converted to (x3⊕D(3, 0)m⊕d(0, 3)m⊕D(3, 0)m, x0⊕
D(0, 1)m⊕d(1, 0)m⊕D(0, 1)m, x1⊕D(1, 2)m⊕d(2, 1)m⊕D(1, 2)m, x2⊕D(2, 3)m⊕d(3, 2)m⊕
D(2, 3)m) which is (x3 ⊕ d(0, 3)m, x0 ⊕ d(1, 0)m, x1 ⊕ d(2, 1)m, x2 ⊕ d(3, 2)m)
• Step 7: f−1Alice is applied, bit positions are rotated back again, and delay added in Step 3
is cancelled by XOR. The message from the previous step is converted to (x0 ⊕ d(1, 0)m ⊕
d(1, 0)m, x1 ⊕ d(2, 1)m ⊕ d(2, 1)m, x2 ⊕ d(3, 2)m ⊕ d(3, 2)m, x3 ⊕ d(0, 3)m ⊕ d(0, 3)m), which
equals the original message x0, x1, x2, x3.
Evaluating all messages crossing the insecure channel, M ′ = (x3⊕D(3, 0)m, x0⊕D(0, 1)m, x1⊕
D(1, 2)m, x2⊕D(2, 3)m), M ′′ = (x0⊕D(0, 1)m⊕d(1, 0)m, x1⊕D(1, 2)m⊕d(2, 1)m, x2⊕D(2, 3)m⊕
25
Figure 3.6 Block diagram of the delay test circuit with two propagation examples. When
key0 = 0 and key1 = 1, Input0 passes through the light grey path. There is
one bit shift at the first level and no shift at second level, Input0 → Output1.
When key0 = 1 and key1 = 0, Input0 passes through the dark grey path.
There is no shift at the first level and there is a two bit shift at the second
level, Input0 → Output2.
d(3, 2)m, x3⊕D(3, 0)m⊕d(0, 3)m), M ′′′ = (x3⊕d(0, 3)m, x0⊕d(1, 0)m, x1⊕d(2, 1)m, x2⊕d(3, 2)m),
no linear relationships exist among any pairs of messages that yield information to a man-in-the-
middle. No duplicate delays appear at any bit position. There is no way to retrieve original message
from the in flight messages without the shared key and access to Bob and Alice’s PUFs.
All messages are protected while traversing the insecure channel. The permutation applied by
Bob protects the first message as it travels to Alice. Entanglement with both Alice and Bob’s delay
protects Alice’s response. The permutation then protects the final message from Bob to Alice.
3.1.4 Barrel shifter PUF design
We evaluate a barrel shifter as a potential invertible and commutative PUF. The block diagram
of a barrel shifter [Hashmi and Babu (2010); Pereira et al. (1995)] is shown in Fig. 3.6. For
simplicity, only two shift levels are shown.
26
Output Logic is added to capture path delay D(i, i′). A Event Counter is initialized to 0. The
RST signal simultaneously starts the Event Counter and releases the input message. The delay is
captured by reading the Event Counter when the Output Logic detects a transition. Finally, the
entanglement block in Output Logic entangles delay information (LSB or 2nd LSB of delay) with
the output bit.
Each shift stage is logically similar to an arbiter PUF [Fruhashi et al. (2011)] stage.
Key bits determine the shift amount s =
∑k
i=0(keyi ∗ 2i). Thus, keyi is applied from LSB to
MSB, from left to right. The key determines the shift amount. For example, in diagram in Fig.
3.6, key = {0, 1} encodes for right shift by 2 in the second stage. Consequently, Input0 traverses a
different path; provides a different delay results with different keys.
The delay variation is generated by transistor-level mismatch [Lofstrom et al. (2000)] and doping
variability [Seoane et al. (2009)]. Variation accumulates over several stages. It is then significantly
large to be detected by the Output Logic.
BS-PUF must be invertible; this property facilitates decryption. Consequently, the physical
delay measurements must not depend on the bit state; they should be a function only of the path.
3.1.5 Circuit Implementation
A commutative PUF based on a barrel shifter is implemented in hardware. Transmission gates
implement the shift paths. The circuit is subdivided into 3 components:input logic, shift unit and
output logic.
3.1.5.1 Input logic
Input logic is used to trigger the delay test system. It is a 3-input, 1-output circuit that
connects the input signal S or its inverse S to output terminal (Fig. 3.7). Input logic consists of
three transmission gates. RST (reset) is used to control ON/OFF status of the first transmission
gate. When RST is high, Input travels through the first gate and arrives at an intermediate
27
Figure 3.7 Schematic of 1-bit input logic. Each input bit is controlled by an input logic
unit.
Figure 3.8 Shift Unit of barrel shifter. If key = 1 (key = 0), N1/P1 is on (N2/P2 is off),
then output equals Input A; otherwise, output equals Input B.
node. Otherwise, it is blocked. REV (reverse) determines whether Input is inverted. Input will be
inverted when REV = 1. The function definition for input logic is: output = RST •(REV ⊕input).
3.1.5.2 Shift unit
Shift units implement the path selection and form shift stages. Shift unit size determines the
magnitude of delay. We construct a barrel shifter with 8 shift stages for testing. Each layer contains
256 shift units. Each stage shifts by either 27−n or 0 where n is the stage index.
28
Each shift unit is a 4-input, 1-output circuit show in Fig. 3.8. Either inputA or inputB is
mapped to ouput. The mapping is determined by the key. A key value of 1 causes the upper
transmission gate to open; output then becomes inputA. Otherwise, output becomes inputB.
The path delay value should vary depending on the shift path. Path delay primarily depends
on shift units’ transmission gates. Adding additional load capacitance after each transmission gate
or accumulating variation over several stages of transmission gate enlarge the delay; it becomes
detectable by the path delay counter.
In BS-PUFs, PUFs uniqueness depends on how much delay variation could be provided by same
path on different chip. Modifying transistor area is the main method for increasing the inter-chip
variation. Transistor delay variation is inversely proportional to transistor area [Grünebaum et al.
(2001)]. Sizing transistors smaller results in increased delay variation. However, BS-PUF requires
plaintext independent path delay. Path delay for a 1-valued bit compared to a 0-valued bit differs
for minimum transistor sizes. Hence larger transistors are used in shift units.
3.1.5.3 Output logic
Output logic measures/captures path delay. Output logic for each bit contains 3 parts: counter,
edge detector trigger generator and entanglement logic (Fig. 3.9(d)).
Counter takes CLK and RST as input producing a 10-bit output; it counts the number of
rising edges of CLK. Setting RST high resets the counter to 0. The path delay is expressed as
(input clock period)× (counter value).
Edge detector trigger generator generates a pulse in response to at transition at its input. it
includes an edge detector (Fig. 3.9(b)) and a positive edge trigger generator (Fig. 3.9(c)). Edge
detector converts a rising or falling edge into a rising edge at its output. Positive edge trigger
generator converts the rising edge from edge detector into a pulse.
The output logic works as follows. First, a rising/falling edge at input produces a pulse at edge
detector trigger generator output. This pulse enables the transmission gate in Fig. 3.9(d) for a
29
Figure 3.9 (a) D Flip-Flop – Triggered by rising edge.The output, Q, is high when there
is a rising edge at input, in. (b) Edge Detector – The output reflects a
transition at the input. (c) Positive edge trigger generator –Produces a pulse
in response to a positive edge at the input, in. (d) Output Logic –Captures
the path delay;this is provided to entanglement logic.
30
Figure 3.10 The path delay capture unit tests for and stores the path delay. The edge
detector detects an output transition; S equal to output will not be detected.
Consequently, the transmission path receives S and S successively; a transition
at output is guaranteed.
short time period (2ns). During this time, counter output is captured; it must not change while
being captured. Thus, enable time period must be shorter than clock period (4ns).
Entanglement logic extracts the mth LSB of delay D(i, i
′
).
Computing XOR of this bit with the input signal xi results in the entangled output bit.
The output logic works by detecting a transition. An transition occurring depends on the
previous output value. Thus, the output logic is incapable of detecting unchanging output values.
An output transition is forced by providing xi before xi at the input.
3.1.5.4 Path Delay Testing
The input logic, shift unit and output logic work together to capture the path delay. The
following five steps are necessary.
1. Set xi as input and reset input logic.
2. Wait for xi to arrive at output logic.
31
3. Reset input logic and clock counter, set xi as input.
4. Wait as xi travels the path determined by key triggering a transition at the output logic.
5. Encrypt using the captured counter value.
3.2 General PUF-Based Encryption
In this section, the details of proposed general PUF-based public-key encryption protocol are
presented. In this protocol, there are no restrictions on PUF properties. PUF response serves as
part of private key.
3.2.1 PUF Selection
PUF [Guajardo et al. (2007b)] functions as a hardware key extractor when working in challenge-
response mode. When a physical stimulus, challenge, is applied to a PUF, the output, response,
is repeatable for the same chip. However, these responses vary for different chips due to silicon
manufacturing variation. Both challenge and response are binary strings; the response string is
used as the encryption key.
RSA is vulnerable due to the mathematical relationship between public and private keys. At-
tackers can derive private key from public key and modulus. On the contrary, there is no correlation
between responses of 2 PUFs even if they share the same challenge. Sender and receiver can use
their own PUF responses as private keys; mathematically modeling these is difficult for an adver-
sary.
Compared to explicitly introducing randomness, PUF’s intrinsic randomness is preferred. It is
derived naturally from the manufacturing process relying on delay variations due to silicon doping
differences.
Usually, there are two types of PUFs with intrinsic randomness, delay-based PUFs and RAM-
based PUFs. The challenge of delay-based PUFs (e.g. Arbiter PUFs [Suh and Devadas (2007)]) is
a 1-0 sequence. Each challenge bit propagates through two identical paths of circuit components.
32
Figure 3.11 Proposed PUF-based public-key encryption algorithm. (mod n) is omitted for
all in-flight messages
Each path contains a number of delay units; their precise delay depends on the manufacturing
process. Different challenges create unique paths by changing the route through delay units. For
example, each challenge may be a control bit in a multiplexer. The corresponding response bit is
determined by which path is faster. Delay-based PUF responses are affected by challenge bits and
silicon process variation. Contrarily, RAM-based PUF responses (e.g. SRAM PUF [Selimis et al.
(2011)]) are only influenced by doping variation. They depend on the initial state when each cell
is powered on.
RAM-based PUFs produce a single response. The ability to generate multiple different responses
provides multiple private keys; delay-based PUFs (Fig. 3.12) provide this capability. This is ideal
because it protects against known-plaintext attacks. Discovering a particular private key is of no
benefit if different private key is used for each receiver. We use a portion of a receiver’s public
key as a challenge; the unique response becomes the private key. All further discussion focuses
on delay-based PUFs. The response of delay-based PUF PUFi on challenge Kj is represented as
PUFi(Kj).
3.2.2 Encryption Algorithm
Fig. 3.11 shows our proposed PUF-based public-key encryption protocol depicting Bob as the
sender and Alice as the receiver. Both Bob and Alice have their own PUFs. If Bob encrypts a
33
message m with his PUF, Alice has no way to decrypt it except to ask Bob to decrypt it for her.
The following protocol utilizes this property to securely transmit a message from Bob to Alice.
Here, PUF of Bob and Alice are represent as PUFB and PUFA respectively. Bob and Alice need
to have a shared n.
1. Bob encrypts the message m computing mPUFB(K) (mod n).
2. Bob sends mPUFB(K) (mod n) to Alice.
3. Alice encrypts mPUFB(K) (mod n) with PUFA(K). (At this point, Alice does not know the
message m.)
4. Alice sends (mPUFB(K) (mod n))PUFA(K) (mod n) = mPUFA(K)PUFB(K) (mod n) to Bob.
5. Bob decrypts mPUFB(K)PUFA(K) (mod n) with PUFB(K) and obtains m
PUFA(K) (mod n).
6. Bob sends mPUFA(K) (mod n) to Alice.
7. Alice decrypts mPUFA(K) (mod n) with PUFA(K) obtaining the message m.
In this algorithm, there are three in-flight messages( (mod n) is neglected): mPUFB(K), mPUFB(K)PUFA(K)
and mPUFA(K). Where K is the asymmetic encryption public key. Retrieving the message con-
tent m is impossible even if the attack captures all in-flight messages since these messages are
not correlated. Without PUF responses, it is impossible to break mPUFi(Kj) into m and exponent
PUFi(kj).
This algorithm is similar to the Diffie-Hellman key exchange [Bresson et al. (2001)]. It replaces
the base with a message m and the secret integer with a PUF response. There are several ad-
vantages. First, sender and receiver do not have to share the same base. Second, without the
same base, it is harder for man-in-the-middle to decipher in-flight messages. Third, utilizing PUFs
ensures that the secret is unique to each device.
A password-authenticated key agreement (PK) [Juang (2004)] may be used to prevent man-
in-the-middle attacks. One simple solution is to compare the hash of s concatenated with the
34
Figure 3.12 Challenge (public key) and response (private key) of delay-based PUF. No
correlation exists between private keys or between public and private keys.
password calculated independently on both ends of channel. The advantage of this scheme is that
an attacker can only test one specific password on each iteration with the other party, and so the
system provides good security with relatively weak passwords.
3.2.3 Hybrid Key encryption algorithm
For the algorithm depicted in Section 3.2.2, receivers have no idea about the identity of the
sender. Hence, it does not support digital signing. A digital signature is a mathematical scheme
for presenting the authenticity of digital messages or documents. A valid digital signature gives a
recipient a mathematical reason to believe that the message was created by a known sender and
that the message was not altered in transit. In proposed algorithm, private key is a random number
that depends on hardware randomness of sender. Receiver receives and decrypts a message without
knowning the sender.
An advanced PUF based encryption algorithm to support digital signing is shown in Figure 3.13.
To establish sender’s and receiver’s identity, we need to combine RSA with PUF output. Each
party has a pair of public key (K+i ) and private key (K
−
i ). These keys K
+
i and K
−
i are generated
by the RSA algorithm, and are distinct from the physically derived PUF keys. Senders use the
35
Figure 3.13 PUF-based public-key encryption algorithm support digital signing. (Pi rep-
resent PUFi, K
+
i is the public key and K
−
i is the complement key)
receiver’s public keys to encrypt message which the receiver decrypts with the complement key.
This process is similar to RSA but the whole protocol is much more secure because additional
physical randomness is added. Even if public/complement key pairs are known, plaintext is still
protected by PUF output. This hybrid encryption algorithm can be implemented in 7 steps.
1. Bob encrypts the message m computing mK
+
BPUFB(K
+
B ) (mod n).
2. Bob sends mK
+
BPUFB(K
+
B ) (mod n) to Alice.
3. Alice has complement key of Bob (K−B ), decrypts message m
K+BPUFB(K
+
B ) with K−B , to get
mK
+
BK
−
BPUFB(K
+
B ) = mPUFB(K
+
B ). Alice verifies Bob’s identity in this step. She further en-
crypts mPUFB(K
+
B ) with K+APUFA(K
+
A ), to get m
PUFB(K
+
B )K
+
APUFA(K
+
A ). (At this point, Alice
does not know the message m.)
4. Alice sends mPUFB(K
+
B )K
+
APUFA(K
+
A ) (mod n) to Bob.
5. Bob decrypts mPUFB(K
+
B )K
+
APUFA(K
+
A ) (mod n) with K−APUFB(K
+
B ) and obtains m
PUFA(K
+
A )
(mod n). By then, Bob has verified Alice’s identity.
6. Bob sends mPUFA(K
+
A ) (mod n) to Alice.
7. Alice decrypts mPUFA(K
+
A ) (mod n) with PUFA(K
+
A ) obtaining the message m.
36
In this protocol, receiver can not accomplish decryption with sender’s public key if the encrypted
message is changed by a third party.
3.2.4 Entropy Estimation
In the proposed encryption protocol, brute force attack difficulty depends on PUF response
entropy. Although PUF responses benefit from silicon manufacturing randomness, their entropy
is limited. The challenges are not necessarily mapped into a domain of responses consisting of all
2n response strings that are uniformly distributed to maximize entropy. Some response sequences
rarely occur. Here we define the number of possible output sequences corresponding to a single
challenge as number of valid codes (nvalid).
Unlike the mathematical model behind RSA, there is no way to acquire PUF responses by
factoring a number. The only way to perform brute-force attacks is to try each valid code (Cvalid).
Hence, time cost of brute-force attack is linearly proportional to nvalid. For certain PUF types
(e.g. arbiter PUF, butterfly PUF or ring oscillator PUF), nvalid depends strongly on its uniqueness
performance. If different PUF chips produce similar responses to the same challenge, then nvalid
is very small. An adversary may predict PUF responses using a modeling attack [Rührmair et al.
(2010)]. Challenge-response pairs (CRP) CRPi are modeled for a single PUF chip. This model
predicts CRP given by other PUF chips produced in the same batch. Our goal is quantify the
relationship between nvalid and PUF uniqueness metrics.
3.2.4.1 Hamming Distance of PUF
PUF uniqueness performance is evaluated using inter-chip Hamming Distance (HD).
HD between 2 equal-length binary sequences generated on the same challenge is the number of
differing bit positions. For example, two vectors v1 =< 1, 0, 0, 1 > and v2 =< 1, 1, 0, 0 > have a
HD of 2 because their elements divergence at positions 1 and 3.
For a specific PUF type, inter-chip HD is the average HD (HDavg) over all responses produced
by different chips. Assume there are m different PUF chips with the same circuit design (same
37
Figure 3.14 Sample set for HD calculation (M). It contains m, n-bit PUF responses.
M [i][j] represents the jth bit of the ith response. M(:, 2) is the 2nd bit of all
responses.
architecture, transistor size, and layout); challenge C is applied to each chip. The m chips will
give m different responses (R0, R1, ...Rm−1); this is a sample set. An example sample set matrix is
shown in Fig. 3.14. In this matrix M , Ri = M(i, :) is the response of the ith chip. M [i][j] is the
jth bit of the ith chip response and M(:, j) is the jth bit of all chip responses.
HDavg =
2
m(m− 1)
p<m∑
p=0
q<m∑
q=p+1
HD(M(p, :),M(q, :)) (3.2)
Average PUF HD is defined by Eq. 3.2. For m different PUF responses, HD between all m(m−1)
pairs of rows is computed; the resulting HD are averaged.
From another point of view, HD can be calculated for every bit position. Eq. 3.3 shows the
average HD of the jth bit position, where M(:, j) is jth bit of all responses and xj is the number
of 1’s in the jth column.
For the jth column in a sample set matrix M , HD(M(:, j)) can be derived from the number
of 1’s (xj) in M(:, j) =< b0, b1...bm−1 >. As there are m bits in the column, if there are xj 1’s,
then there must be m − xj 0’s. HD increases by 1 for pair of bits < bp, bq > that are different,
38
< bp, bq >=< 1, 0 >. < 1, 0 > is formed by selecting any 1 bit from xj and selecting any 0 bit from
m − xj . The total possible number of < 1, 0 > pairs is xj(m − xj). After summing up HD of all
bit positions, HDavg is computed by averaging over all response pairs.
HDavg =
2
m(m− 1)
j<n∑
j=0
(
xj
1
)(
m− xj
1
)
=
2
m(m− 1)
j<n∑
j=0
xj(m− xj)
(3.3)
For n-bit PUFs, HD should be as close as possible to n/2 for maximum entropy that results
from uniform distribution of 0’s and 1’s in each bit position over 2n entries. This indicates ideal
uniqueness performance.
3.2.4.2 Entropy and Number of Valid Code
The published literature on different types of PUFs often reports the average HD of its reponse
set. We need to translate this HD into number of unique responses nvalid and their distribution
entropy. nvalid can be derived from entropy H(X) of a sample set. Entropy of a sample set matrix is
computed according to a binary entropy function [MacKay (2003)]. Eq. 3.4 shows the mathematical
expression of a binary entropy function, where P (1) is the percentage of 1-bits in response M . H(x)
is a floating point number between 0.0 and 1.0. When P (1) equals 0.5, maximum entropy H(X) = 1
is achieved.
H(X) =
n∑
i=1
p(xi)log2p(xi)
= −P (1)log2P (1)− (1− P (1))log2(1− P (1))
(3.4)
After obtaining H(X), the number of valid bit bvalid is computed. bvalid of an n-bit PUF can
be estimated by multiplying entropy with response length (n). This is the expected entropy of
the response set. For each bit position, there are 2 possible options (bit-0 or bit-1). Therefore
nvalid = 2
bvalid .This says that all possible PUF responses are equivalent to a response set of length
39
bvalid bits with a uniform distribution. For a brute force search, nvalid entries need to be tried. For
example, in a 256-bit PUF with entropy equal to 0.8, there are 204 valid bits and 2204 valid codes.
As P (1) is the only variable in Eq. 3.4, nvalid can be calculated only if P (1) is known. However,
in most previous works, only HD is provided as indicator of uniquness. P (1) is not listed because
it is not a critical performance metric. As a result, we must derive nvalid from HD.
3.2.4.3 HD to nvalid
As shown in Eq. 3.3, HD is related to the number of 1-bits in each column (xj). By assuming
the percentage of 1-bits in each position equals the percentage of bit-1 in the whole sample matrix,
the number of 1-bits in each row becomes equal to P (1) multiplied by the sample number (Eq. 3.5).
xj = m× P (1) (3.5)
Substitute Eq. 3.5 into Eq. 3.3; HDavg is reduced to a simple formula.
HDavg =
2nm
(m− 1)
P (1)(1− P (1)) (3.6)
There are three parameters in Eq. 3.6; n is length of PUF response; m is the number of samples;
P (1) is the percentage of 1 bits. With n,m, and HDavg provided, P (1) can be estimated by solving
a quadratic equation. Estimated result P (1)′ can be expressed as in Eq. 3.7. The sign of the second
term doesn’t affect entropy value.
The P (1)′ calculated from HD is an approximate value. xj might vary in each column. The
estimation accuracy of this algorithm is tested using a pseudo-random sample matrix. Related
simulation details and results are shown in Section 4.5. This estimation is accurate for sufficiently
large sample sets.
After estimating P (1), approximate entropy H(X) is assessed according to Eq. 3.4.
P (1)′ =
1
2
±
√
1− 2HDavg(m−1)nm
2
(3.7)
40
3.2.5 Speed Evaluation
Although we are able to use shorter keys in our PUF-based public-key cryptosystem, the pro-
posed protocol requires 3 more transmission rounds over the insecure channel. This introduces
communication overhead and therefore reduces encryption/decryption speed.
We evaluate encryption and communication overhead for both RSA and PUF based cryptosys-
tem in Section 4.5. PUF-based asymmetric cryptosystem shows its advantage when encryption
overhead dominates the total time for message exchange.
3.3 Variation Improvement
Based on the preceding discussions, non-minimum sized transistors are used in delay units’
transmission gates to preserve physical commutativity. This will limit the delay variation provided
by each stage and therefore makes the path delay variation hard to be captured by output logic.
According to Pelgrom’s Mismatch Model shown in Section. 2.5, both random and systematic silicon
variations are source of delay variation. Based on this, we propose an asymmetric layout structure
to improve the stage variation. Stage variation is important for all delay based PUFs. Here we use
APUF as an example because it has simpler structure. Examples of asymmetric path layouts are
given in Fig. 3.15.
3.3.1 Mismatch Enhancement
Fig. 3.15(a) illustrates a symmetric layout for a 8-stage APUF. Only two rows of delay units
are plotted for simplicity. White and gray squares represent shift units in first row of shifter and
second row of shift units respectively. The ith shift units of first and second row are considered as
a pair of racing units. d is the distance between adjacent shift units. The symmetric layout places
each pair of racing units as close as possible, e.g. unit 1 and 1
′
are neighbors.
Fig. 3.15(b) gives the n-fold asymmetric layout for this 8-stage BS-PUF. The design is divided
into a left area (Al) and right area (Ar). For each pair of racing units, one unit (white square) is
41
Figure 3.15 (a) Symmetric layout of 8-stage BS-PUF with 2 rows. (b) Asymmetric layout
of 8-stage BS-PUF with 2 rows. Block i denotes a shift unit along the first
row. i
′
is the corresponding second row shift unit.
placed in Al and the other unit (gray square) is assigned to Ar. Distance between centers of Al
and Ar is defined to be path distance. The BS-PUF with n rows is folded into n tracks.
This asymmetric design increases separation between pairs of racing shift units compared to
the symmetric layout and it doesn’t require additional area cost. According to Pelgrom’s mismatch
model (Eq. 2.2), mismatch between a pair of racing units increases with Dp (the separation distance
between them). Hence, process variation in an asymmetric layout is expected to be larger than in
a symmetric layout.
Path distances of symmetric (Dpath) and asymmetric (D
′
path) layout APUFs are shown in Eq. 3.8
and Eq. 3.9, where n is the number of stages. D′path is proportional to n. The advantages of
asymmetric layout are more significant in large APUFs.
Dpath = Dy + L (3.8)
D′path = (Dx +W )× n/2 (3.9)
Despite the fact that larger path distance leads to larger variation, an asymmetric layout scheme
needs to be refined to get more managed unpredictability and uniqueness performance.
42
Figure 3.16 Balanced inner-die systematic bias is achieved through alternate bit assign-
ment. Odd challenge bits are arranged as top path in Al and bottom path in
Ar. Even bits are arranged as bottom path in Al and the top path in Ar.
Inner-chip unpredictability Assigning the top path for each bit to Al and the bottom path
to Ar (or vice versa) adds similar systematic bias to all APUF response bits in an asymmetric
design. This will suppress inner-chip unpredictability of APUFs. For example, an APUF built on
a wafer with transistor mobility that increases linearly along x-axis is likely to have faster bottom
paths compared to corresponding top paths.
Assigning challenge bits alternately to top and bottom paths in Ar and Al results in balanced
systematic bias for the entire APUF circuit. Fig. 3.16 depicts odd bits assigned to the top path
in Al and bottom path in Ar. Likewise, even bits are assigned the bottom path in Al and the top
path in Ar.
Inter-chip uniqueness When systematic variation is excessive, random variation in a PUF
circuit is negligible and PUF responses would depend entirely on the doping pattern. This degrades
inter-chip uniqueness because wafers fabricated under the same technology tend to have a similar
doping pattern. If process variation is dominated by the doping bias, PUFs built on dies at the same
wafer position will have similar responses. The Dpath in asymmetric layout should be controlled to
generated acceptable uniqueness performance.
3.3.2 Layout overhead
Before discussing the asymmetric layout overhead, we first introduce the schematic and layout
of each delay unit. The gate level implementation of one delay unit is shown in Fig. 3.17. Each
43
Figure 3.17 Gate level implementation of a MUX (delay unit)
Figure 3.18 Stick diagram of a MUX cell. The layout is minimized according to CMOS
layout rules.
multiplexer is composed of three NAND gates. Fig. 3.18 is the stick diagram of such circuit
structure. According to this stick diagram, the dimension of corresponding delay unit transistor-
level layout is W = 63λ, L = 42λ. The horizontal distance (Dx) and vertical distance (Dy) between
delay units is limited by the minimum spacing between ”Active”s and ”Metal1”s respectively.
Hence, Dx = Dy = 3λ. In this work, IBM 0.13 µm PDK is used for test, so λ = 0.06µm.
Although asymmetric layout does not change the total number of delay units, it introduces
routing overhead.
As given in Fig. 3.15(b), our asymmetric layout requires 3 more horizontal wiring tracks (Wh)
for each APUF bit to maintain path connection. In our design, horizontal wires are ”Metal1”.
According to the rules of CMOS layout design, the minimum Wh is 6λ for ”Metal1” (including
minimum metal width and spacing between metals).
Width and length of 1-bit n-stage APUF with symmetric layout is W1−bit = nW + (n− 1)Dx,
L1−bit = 2L+Dy; length of corresponding asymmetric layout is L
′
1−bit = L1−bit+3Wh. Asymmetric
44
Figure 3.19 An example of 2-1 DAPUF
layout does not increase the width of APUFs. Asymmetric layout overhead is defined as
Poverhead =
W1−bit(L
′
1−bit − L1−bit)
W1−bitL1−bit
(3.10)
For instance, Poverhead = 20.7% for 128-stage APUFs. An m−bit APUF is a collection of m 1-bit
APUFs. Therefore, layout overhead remains constant (does not increase) with the number of bits.
The variation enhancement performance of n-fold asymmetric layout is shown in Section. 4.4.
3.4 Uniqueness Improvement
3.4.1 Entropy of multi-block pattern
To achieve high uniqueness in APUF, signals propagating in the same delay paths should have
equal opportunity to produce 1 and 0 among all chips, regardless of what challenge is given.
However, this is not true in practice. Over all chips fabricated with same process and layout,
unbalanced routing and transistor-level systematic mismatch introduce speed bias in each stage;
the top or bottom multiplexer is likely to be faster. Attackers are able to predict one chip’s response
according to responses of other chips built from the same technology. As discussed in Section. 3.3,
this systematic bias is aggravated in APUF because of asymmetric layout.
For example, if there is a batch of n-bit APUF chips with all shift rows are more likely to
produce output 1, for some specific challenge sequence such as Cs = (1, 1, ...1), most chips in this
batch will produce responses equal to 1 with this challenge. For certain challenges, some responses
are generated with high probability for all chips in the same batch of APUFs. Attacker can build a
45
machine learning model of APUFs by exploiting this correlation between challenges and responses
[Rührmair et al. (2013, 2010); Hospodar et al. (2012)].
To improve the response-generative probability bias in conventional APUFs, the authors of
[Machida et al. (2015)] propose a modeling attack resistant APUF FPGA implementation - DAPUF.
DAPUFs are composed by several duplicate APUFs on neighboring SLICEs. All duplicates have
the same number of stages and share the same challenge C. Response of a 1-bit DAPUF is the
XOR of all duplicates’ responses. The DAPUFs with n duplicates are defined as n-1 DAPUFs. An
example of 1-bit 2-1 DAPUF is shown in Fig. 3.19. Each duplication in DAPUF is defined as a
block. The pair of delay paths in each block is a selector chain.
Each delay path in APUFs could be considered as a block. This means for each shift unit, we
need n duplication. A single bit of PUF output should be the ⊕ of all duplicate paths. As all blocks
are adjacent and share the same key and input bits, similar block -level bias is expected. All PUFs
generate responses by XORing several nearby blocks’ responses could be considered as build in
multi-block pattern. Define the APUFs with n-duplication as n-1 Double Arbiter PUF (DAPUF).
Define the response of a single 1-bit block as Rblock and the response of a 1-bit n-1 DAPUF as
Rn. The probability of Rblock = 1 is Pblock(1); the probability of Rn = 1 is Pn(1). Pn(1) can be
computed with Pn−1(1) and Pblock(1) iteratively (Eq. 3.11).
Pn(1) = (1− Pn−1(1))× Pblock(1) + (1− Pblock(1))× pn−1(1) (3.11)
After obtaining Pn(1), entropy of this multi-block pattern is calculated according to a binary
entropy function [MacKay (2003)] Fig. 3.20 shows the entropy of traditional APUFs, 2-1 DAPUFs
and 4-1 DAPUFs’ responses. Response entropy varies with percentage of block bias. As 50% is the
ideal value of Pblock(1), block bias is defined as bblock = Pblock(1)−50%. Block bias less than 15% is
handled perfectly by 2-1 DAPUFs; more blocks are necessary only when each block is more heavily
biased, as shown in Fig. 3.20.
Although uniqueness is drastically improved in 2-1 and 4-1 DAPUFs, their area and power con-
sumption are also significantly increased compared to conventional APUFs. With high resolution
counter, we could design a high-unique Multi-Block APUF (MBAPUF) structure with much less
46
Block bias (%)
0 5 10 15 20 25
E
nt
ro
py
 (
bi
ts
)
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
1.01
Traditional APUF
2-1 DAPUF
4-1 DAPUF
Figure 3.20 Entropy versus block bias in multi-block pattern
area and power consumption. Uniqueness and reproducibility of its response are verified by Spectre
simulation.
3.4.2 Multi-Block APUF
To reduce area and power costs, we divide the delay path into n shorter paths and XOR their
responses; this saves area over the alternative duplication of blocks. Shorter chains inherently scale
down cumulative systematic bias. In MBAPUF, each block contains a sub-path of APUF delay
path. Structures of 256-stage 2-block and 256-stage 4-block MBAPUF are shown in Fig. 3.21(a)
and (b) [Guo et al. (2018b)].
Although area and power consumption are reduced with shorter path design, fewer stage of
delay is accumulated and it leads to low path delay variation. When path delay variation is not
significant, the output logic with low resolution can’t catch the differences between chips. Shorter
paths reduce average delay and delay variation; this requires enhancement on output logic resolution
to preserve the same uniqueness performance.
In MBAPUF, we choose different supply voltages for shift units and output logic: Low supply
voltage for shift units to guarantee relatively large delay and delay variation, high supply voltage
47
Figure 3.21 (a) 256-stage 2-block Multi-Block APUF (b)256-stage 4-block Multi-Block
APUF
48
for output logic and counter to reduce the delay of counter critical path and ensure higher output
logic resolution. An inverter needs to be placed between each pair of shift unit and output logic
for voltage conversion. This doesn’t harm uniqueness performance and adds more delay variability.
Uniqueness of such design is verified by Cadence simulation; the simulation results are shown in
Section 4.5.
The 1-bit n-block MBAPUF with 2-level voltage supply only requires n − 1 additional output
logics compared to conventional APUFs. Even though power consumption of each output logic and
counter is higher, the total power consumed is still smaller than 2-1 MBAPUF because the number
of transistors in selector chain is halved.
49
CHAPTER 4. RESULTS
4.1 BS-PUF performance
According to Section. 3.1.4, the entanglement logic utilizes a 1-bit result from the path delay.
The path delay capture logic provides a multiple-bit delay counter. One bit must be chosen; it
must be shown to have the requisite properties for BS-PUF: (1) inter-chip variability, (2) intra-chip
reproducability, (3) randomness, (4) commutativity.
Cadence Spectre simulations are used to generate raw delay data. Delay variability assessment
is conducted by 3σ Monte Carlo sampling over process parameters. This test uses IBM 130 nm
PDK. A common centroid layout is employed to reduce linear gradient errors [Long et al. (2005)]
in this subsection.
We construct an 8-level barrel shifter accepting a 256-bit input with a 256-bit output. Output
logic similar to input capture logic in [A (2006)] detects output voltage changes. Voltage transitions
send a control signal to a counter. Path delay is captured at the resolution of the counter’s clock
period; a period of 4ns is used. Delays must be a reasonable multiple of the clock period to express
variation.
In the following experiments, we primarily focus on raw data: (1) Monte Carlo sampling 200
times on the path from input 0 to output 16 (2) Monte Carlo sampling 200 times on all 256 paths
with no shifting.
4.1.1 Inter-chip Variability
Shift path delay is a function of the silicon fabrication process; it potentially exhibits PUF
properties. Each shift path terminates with entanglement logic requiring one bit. A bit from the
delay counter must be selected. The chosen bit must exhibit sufficient variation.
50
Delay (ns)
80 100 120 140 160
C
ou
nt
 (
tim
es
)
0
10
20
30
40
50
60
70
Figure 4.1 Histogram for simulated forward path (x0 7→ y16) delay distribution. 25%
inter-chip variability is shown.
Monte Carlo simulation captures single path delay variability as a proxy for inter-chip delay
variability. As shown in Fig. 4.1, in 200 Monte Carlo samples for process parameters perfromed
on path x0 7→ y16, the delay ranges from 85 ns to 145 ns with an average around 120 ns. It is a
±25% (±30-ns) variation. Counter output varies about ±8. This indicates that roughly the least
significant 3 bits of delay have significant entropy in inter-PUF measurements. Thus, the LSB, 2nd
LSB, and 3rd LSB are candidates for entanglement.
4.1.2 Inter-chip Uniqueness
The chosen path delay bit must exhibit inter-chip uniqueness. This requires significant variance
between responses on different chips. Pair-wise hamming distance (HD) is a criterion that measures
variability.
The HD of 200 path delay samples of 256-bit responses is computed. Table 4.1 shows distribution
of inter-chip HD for LSB. Similar figures are given for 2nd LSB in Table 4.2.
For LSB, the mean HD is 127.99 bits with a standard deviation of 8.04 bits. For 2nd LSB, these
values are 128.01 bits and 7.99 bits, respectively. HD 128 means roughly 50% of the response bits
differ. It is maximally unlikely that two BS-PUFs will generate the same output.
51
Table 4.1 INTRA-CHIP HD OF BS-PUFs LSB
(HD: Hamming distance; %: percentage
of bit-stream pairs with certain HD)
HD [90, 100) [100, 110) [110, 120) [120, 130)
% 0.01% 1.11% 13.46% 42.83%
HD [130, 140) [140, 150) [150, 160) [160, 170)
% 35.03% 7.17% 0.38% 0.01%
Table 4.2 INTRA-CHIP HD OF BS-PUFs 2nd LSB
(HD: Hamming distance; %: percentage
of bit-stream pairs with certain HD)
HD [90, 100) [100, 110) [110, 120) [120, 130)
% 0.12% 2.57% 15.68% 37.12%
HD [130, 140) [140, 150) [150, 160)
% 37.29% 6.25% 0.97%
4.1.3 Intra-chip Reproducibility
The usefulness of a single PUF relies on it producing a consistent response to a challenge;
they should be independent from the environment. Tests are performed subjecting BS-PUF to: (1)
temperature variation (2) voltage supply variation. The frequency of response bit flips is quantified.
Bit flip rate is frequency a bit changes from 0 7→ 1 or 1 7→ 0. It is computed relative to some
baseline response. Gathering responses at common room temperature (25◦C) and supply voltage
(5V ) establishes this baseline. The percentage of path delays where a bit flips is the bit flip rate.
For example, the LSB flipping in 64/256 paths represents a 25% bit flip rate.
BS-PUF retains a bit flip rate smaller than 18% under environment variation. This is similar
to the flip rate of traditional RO PUFs [Gao et al. (2014)].
4.1.3.1 Temperature Variation
Temperature is varied from 0 to 50◦C. Path delay of all 256 bit paths are gathered with Monte
Carlo sampling at 0◦C, 10◦C, 20◦C, 25◦C, 30◦C, 40◦C and 50◦C. The maximum path delay
52
Temperature (°C)
0 10 20 30 40 50
F
lip
 R
at
e 
(%
)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 4.2 Percentage of bit flips under temperature variation. Flip rates demonstrate
signal-to-noise ratio (SNR) under different temperatures. Flip rates of LSB are
shown in blue. Flip rates of 2nd LSB are shown in green.
The flip rate of LSB is much higher than 2nd LSB.
variation is −4ns to 5ns. The counter logic increments at 4ns frequency; a ±1 bit change in path
delay is expected.
Knowing how temperature variation affects the chosen entanglement bit is ideal; bit flip rate
quantifies this. It is computed in response to temperature variation, shown in Fig. 4.2. Vertical
bars represent bit flips for LSB (blue) and 2nd LSB (green). 2nd LSB flip rate is under 12% while
LSB’s flip rate is significantly higher. Thus, the 2nd LSB provides better reproducibility.
4.1.3.2 Voltage Supply Variation
Supply voltage varies under realistic conditions. Path delay of all 256 bit paths are gathered
with Monte Carlo sampling at supply voltages of 4.64V , 4.70V , 4.76V , 4.82V , 4.88V and 4.94V .
Bit flip rate is computed in response to voltage variation, shown in Fig. 4.3. Flip rates for the
2nd LSB is under 18% while LSB rates are significantly higher. The 2nd LSB again provides better
reproducibility; it is the best candidate for the entanglement bit.
A higher order bit could be selected. It would have comparatively better flip rates, but reduced
variability. Many mature techniques exist to compensate for temperature and voltage variation
53
Supply Voltage (V)
4.64 4.70 4.76 4.82 4.88 4.94
F
lip
 R
at
e 
(%
)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 4.3 Percentage of bit flips under voltage variation. Flip rates of LSB are shown in
blue. Flip rates of 2nd LSB are shown in green.
[Kumar et al. (2012); Vivekraja and Nazhandali (2011)]. These techniques operate at the flip rates
expressed by LSB and 2nd LSB. Thus, the advantage of choosing a higher order bit is minimal.
4.1.4 Randomness
Output of a good PUF should look like a pseudo-random number generator so that an attacker
cannot model it easily. Assessing randomness performance of BS-PUF uses data from Monte Carlo
sampling of path delays. Delay values are converted to binary responses by extracting the mth
LSB bit from the delay. Each 256-bit response (one bit from each path) is examined using NIST
statistical test suite.
Table 4.3 NIST TEST RESULTS OF LSB RESPONSE
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 P-VALUE PROPORTION STATISTICAL TEST
16 21 13 19 19 23 16 18 21 34 0.099513 198/200 Frequency
12 24 20 24 12 28 22 15 27 16 0.068999 199/200 BlockFrequency
18 20 20 12 16 19 26 11 31 27 0.028817 199/200 CumulativeSums
17 19 15 14 15 17 34 15 27 27 0.011791 200/200 CumulativeSums
19 14 24 32 16 15 21 18 19 22 0.191687 198/200 Runs
19 16 15 16 24 19 25 22 23 21 0.769527 194/200 Serial
18 20 20 24 16 17 20 24 24 17 0.890582 197/200 Serial
54
Table 4.4 NIST TEST RESULTS OF 2nd LSB RESPONSE
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 P-VALUE PROPORTION STATISTICAL TEST
15 24 22 19 15 17 10 21 20 37 0.005166 200/200 Frequency
12 18 24 27 15 26 20 13 29 16 0.048716 200/200 BlockFrequency
11 21 20 26 16 22 19 9 24 32 0.012650 200/200 CumulativeSums
15 21 15 21 18 18 28 11 28 25 0.099513 200/200 CumulativeSums
22 25 26 20 18 20 16 18 19 16 0.807412 199/200 Runs
17 20 22 21 24 22 18 14 20 22 0.917870 197/200 Serial
24 19 20 19 21 17 18 25 14 23 0.825505 197/200 Serial
Table 4.3 and Table 4.4 give the detailed test results for LSB and 2nd LSB of the BS-PUFs
output. The minimum pass rate for each statistical test is 193 for a sample size of 200 binary
sequences according to NIST documentation. Thus, both LSB and 2nd LSB pass the randomness
test; a proportion greater than 193 is achieved on all tests.
4.1.5 Commutativity
Encryption and decryption rely on function composition. Decrypting a message encrypted by
both self and another party is required. The other party may have changed the bit values (0 or
1). Thus, Delay variation must be independent of the bit value. An input of 1 must have the same
path delay as an input of 0.
BS-PUF path delays depend only on the permutation key. Shift units are sized to achieve
balanced pullup and pulldown resistance. Transmission gate NMOS sizing is Wn/Ln = 2/3 PMOS
sizing is Wp/Lp = 1/1, where Ln = Lp.
Two tests are performed to verify pullup and pulldown variability.
1. Testing rising/falling edge delay in four different (FF, FS, SF, SS) process corners. Trans-
mission time difference for 0 and 1 must be smaller than the counter period (4ns).
2. Performing Monte Carlo sampling of path delay for inputs 0 and 1. Delays are recorded for
all paths without bit shifting. No bit flips should occur in the path delay.
55
Maximum transmission time difference for 0 and 1 is 2.34ns; this is much smaller than the 4ns
clock period. Consequently, no path delay bits flip in Monte Carlo sampling.
4.2 Encryption Performance Evaluation
4.2.1 Modeling Attack
According to [Rührmair et al. (2010)], all examined Strong PUFs under a given size can be
modeled with machine learning with success rates above their stability in silicon. Consider the barrel
shifter in our communication protocol to be a black box. Attackers know nothing about the key
and physical delay of barrel shifter. An attacker should not be able to model the relationship from
input bits to the output bits without the key. Such a model provides an eavesdropper information
about the plaintext given a ciphertext.
Table 4.5 LR ON LSB WITH 6 AND 8 STAGES
BS-PUFs
ML Method Bit Length Prediction Rate PCPs Training Time
LR 64
17.5%
28.6%
58.3%
800
8,000
80,000
0.0203 sec
0.3580 sec
1.3157 sec
LR 256
9.1%
18.3%
25.5%
1000
10,000
100,000
0.0186 sec
0.3670 sec
2.3212 sec
Table 4.6 LR ON 2ND LSB WITH 6 AND 8
STAGES BS-PUFs
ML Method Bit Length Prediction Rate PCPs Training Time
LR 64
43.2%
52.6%
79.5%
800
8,000
80,000
0.0315 sec
0.1658 sec
1.0104 sec
LR 256
32.4%
41.0%
62.8%
1000
10,000
100,000
0.0157 sec
0.4620 sec
1.6245 sec
56
To investigate the resilience of BS-PUFs against modeling attacks, various ciphertexts are gen-
erated with different keys and plaintexts for training and cross-validation.
Logistic Regression (LR) [Bishop (2006)] and Evolution Strategies (ES) [Back (1996); Schwefel
(1993)] are commonly used to model PUF output. ES is specialized to modeling PUFs under noisy
conditions [Rührmair et al. (2010)]; it does not apply when voltage supply and temperature are
certain.
Thus, only LR modeling is performed. Since the error rate of machine learning prediction
decreases with the size of training set, LR modeling is tested for LSB response and 2nd LSB
response with a variety of training sets with different sizes.
Monte Carlo Sampling [Robert (2004)] utilizes randomness to generate n challenge response
pairs (CRP). n random keys, K = {K0,K1, . . . ,Kn} are generated. Responses, R, are generated
by entangling plaintext, P , using these keys, Ri = BS − PUF (P,Ki). Note that the response is
the shift path delay; this is dependent on the key only. Hence, the plaintext need not be modified.
This random CRP sample is assumed to be representative of the distribution of all CRPs.
Simulating BS−PUF (P,Ki) requires computationally expensive Cadence Spectre simulations.
An efficient method for computing Ri given Ki is needed. Thus, we apply Monte Carlo Sampling
to create a delay matrix, D, modeling the delay of all shift paths. The delay of each shift unit is
recorded. Path delay is then computed by: (1) summing the delay of all shift units along a path,
(2) dividing it by 4ns capture logic resolution, (3) extracting LSB or 2nd LSB. Thus, D enables
computations of path delays given Ki.
For example, Eq. (4.1) is a sample delay matrix for 4 inputs, 2 stage BS-PUFs. di,j represents
exact delay value of top and bottom transmission gates in ith row, jth column shift unit.
D =

(d0,0,t, d0,0,b) (d0,1,t, d0,1,b)
(d1,0,t, d1,0,b) (d1,1,t, d1,1,b)
(d2,0,t, d2,0,b) (d2,1,t, d2,1,b)
(d3,0,t, d3,0,b) (d3,1,t, d3,1,b)

(4.1)
57
Plaintext-ciphertext pairs (PCP) are computed using D. For the delay matrix in Eq. (4.1)
using a key = {1, 0} encoding for right shift in the first stage, the plaintext (i0, i1, i2, i3) generates
the response in Eq. (4.2).
R =

i3 ⊕ ((d0,0,b + d0,1,t)/4)m
i0 ⊕ ((d1,0,b + d1,1,t)/4)m
i1 ⊕ ((d2,0,b + d2,1,t)/4)m
i2 ⊕ ((d3,0,b + d3,1,t)/4)m

(4.2)
This process makes extraction of all possible PCP feasible.
For a BS-PUF with an input message length of 256-bit, there are 2256 possible input messages.
There are 8 stages with 28 possible keys. It is infeasible to generate all 2264 PCPs. Linear Regression
(LR) is performed with a training set of size n = {10, 100, 1000} PCPs per key. To obtain a
representative sample of PCPs, responses are computed with 100 keys and 10, 000 plaintexts. PCPs
not part of the training set are used for cross-validation.
Scalability experiments are conducted on a 6-stage, 64-bit input BS-PUF; delay matrix of this
BS-PUF is the top left 64 × 6 sub-matrix of the 8-stage delay matrix acquired from Monte Carlo
Sampling. The number of CRPs NCRP that are required to learn a k-stage arbiter PUF with error
rate ε is 0.5× (k + 1)/ε [Rührmair et al. (2010)]. Thus, for a 6 stage BS-PUF, we also scale down
n to 8, 80, and 800 PCPs per key.
Table 4.5 and Table 4.6 show the prediction accuracy of LR on LSB and 2nd LSB. LR is
implemented by an iterative program written in Matlab. The regression coefficients’ initial values
are set to (0, 0) in all LR applications. Silicon stability of BS-PUFs is 75%. Thus, all modeling
reaching a higher prediction rate should be considered a success.
LSB provides better result than 2nd LSB. LR achieves 79.5% prediction rate for 6-stage BS-
PUF 2nd LSB output. If 2nd LSB is used as the delay bit, then LR can successfully model 6-stage
BS-PUF with sufficient number of PCPs. On the other hand, with the same modeling process,
LSB cannot be modeled even with a large number of training samples. This is expected as the
LSB is inherently more variable. Consequently, the choice to use LSB or 2nd LSB for the delay bit
58
presents a tradeoff between security and reproducibility; LSB provides the former while 2nd LSB
provides the latter.
4.3 General PUF-Based Public key Encryption
4.3.1 Brute Force attack
The most stright forward attack that can be applied to PUF based cryptosystem is brute
force attack. Assume sender’s public/complement key is already known to the attacker and PUF
responses are random numbers which cannot be modeled by machine learning techniques an attacker
needs to try all possible PUF responses (valid code) to perform brute force attack. As a consequence,
corresponding time cost increases with number of valid codes (nvalid).
To estimate the time cost of brute force attack, we test delay of each attempt tattempt and assess
how many attempts nattempt need to be made. tattempt is tested on 2.2GHz Intel Core i7 and nattempt
equals nvalid.
As we discussed in Section. 3.2.4.2, HD is the performance matrix usually provided by PUF de-
signer. Hence, H(X)/nvalid can be estimated according to HD. However, this estimation algorithm
has not been verified yet. To evaluate corresponding estimation accuracy, we perform an accuracy
test in 4 steps:
• Generate m × n binary matrices M with random number generator. m, n and percentage
of 1 (P (1)) are controllable. This helps us simulate PUF response sample set with different
sizes and P (1).
For example if we need a 3× 2 pseudo sample matrix with P (1) = 0.2, first we build a 3× 2
matrix. Then for each element M [i][j], generate a random number ri,j between 0 and 1. If
ri,j < 0.2, then M [i][j] = 1, otherwise M [i][j] = 0.
• For pseudo sample matrix M , calculate P (1) by iterating through all elements to count and
count number of 1s. At the same time, calculate hamming distance HD by comparing each
pair of rows.
59
P(1)
10% 20% 30% 40% 50%
E
rr
or
 R
at
e 
(%
)
0
2
4
6
8
10
12
14
16 m = 10
m = 50
m = 100
m = 500
Figure 4.4 Error rate of P (1) estimation. Estimation error rate dramatically decreases
with m and getting larger when P (1) is close to 50%.
• Estimate percentage of 1s for M by solving Eq. 3.7, to get P (1)′
• Calculate diffm,n,P (1) = P (1)′ − P (1) for M . diffm,n,P (1) reflects estimation accuracy for
sample sets with m n-bit PUF responses and percent of 1s equals to P (1).
Fig. 4.4 shows the P (1) estimation error rate (P (1)′ − P (1)) in different sample matrices. In
this simulation, n is a constant equals to 256. Number of samples m is varied from 10 to 500 and
P (1) is varied from 10% to 50%. As we can see in Fig. 4.4, estimation error rate gets smaller when
there are more samples. Besides, the error rate increases when the actual P (1) is close to 50%,
which means our estimation algorithm is more effective in extreme cases (heavy bias).
Although P (1) estimation error rate varied with m and actual P (1), we can see that the error
rate is always below 5% when the number of samples (m) is larger than 100. By selecting sample set
with more than 100 samples, we guarantee accuracy and functionality of our estimation algorithm.
P (1) estimation is the first step. With guidelines above, we utilize HD information provided in
previous works to calculate P (1), entropy and nvalid. Previous tested HD and corresponding P (1)
estimation result of different PUFs are shown in Table. 4.7. All of these tests are based on 256-bit
PUFs.
60
Table 4.7 P (1) of Different kinds of PUFs
PUF Type HD References P (1)
APUF 3.6% Machida et al. (2015) 1.81%
RO PUF 47% Sahoo et al. (2013) 36.83%
SRAM PUF 49.97% Guajardo et al. (2007a) 44.84%
Table 4.8 Entropy of Different kinds of PUFs
PUF Type H(X) bvalid nvalid
APUF 0.1306 33 8.59× 109
RO PUF 0.9494 243 1.41× 1073
SRAM PUF 0.9923 254 2.89× 1076
According to Table. 4.7, both RO PUF and SRAM PUF show almost ideal uniqueness perfor-
mance while APUF built on FPGA platform generates low-unique responses among devices. As
we have concluded in Section. 3.2.1, delay-based PUF is more suitable to PUF based public-key
cryptosystem. Hence, RO PUF is the best candidate which can be used in asymmetric encryption.
nvalid = 2
nH(X) (4.3)
After getting P (1), entropy H(X) can be easily calculated according to binary entropy function
Eq 3.4. With H(X), the number of valid code (nvalid) of a n-bit PUF is derived from Eq. 4.3. nvalid
calculated from data in Table. 4.7 are shown in Table. 4.8. Note how close bvalid is to n = 256 bits
for RO PUF and SRAM PUF.
With nvalid, we get the number of attempts brute force attacker needs to make. To obtain
the overall time consumption of brute force attack, we still need to test the time cost of a single
attempt.
Each brute force attempt is a logarithm operation on encrypted message. We perform 200
decryption tests in java with 256-bit keys. We determine the average decryption time cost at
187.6ns. This test is performed on 2.2GHz processor and the average time cost is expressed as t2.2.
61
Assume that the attackers use the fastest CPU (4.3GHz) in the world and try different keys
continuously. Time cost t4.3 on fastest processor is approximately equal to t2.2/1.95, which is 96.2ns.
Here we assume computation time cost is inversely proportional to CPU clock frequency.
Multiply time cost of single attempt with number of valid codes, to get the total time cost of
brute force attack ttotal.
ttotal = t4.3 × nvalid (4.4)
According to Eq. 4.4, ttotal equals 4.3 × 1076 years. In other words, even with the fastest
processor, it will take 1076 years to finish brute force attack on 256-bit RO PUF based asymmetric
encryption. Contrarily, 576-bit RSA is still breakable with sieve algorithms. Thus, PUF based
encryption outperforms RSA under brute force attack.
4.3.2 Modeling Attack
While it is hard to break proposed PUF based public key encryption with brute force attack,
PUFs’ outputs can be predicted by numerical modeling attacks [Rührmair et al. (2010)].
From a recent study, as long as adversaries get a set of challenge-response pairs(CRPs) of
a PUF, they are able to build a computer algorithm which performs indistinguishably from the
original PUF on most CRPs. Modeling attack is based on some machine learning techniques (e.g.
Logistic Regression and Evolution Strategies). With Logistic Regression (LR) [Harrell (2015)] we
are capable of breaking small instance of PUFs (with low number of stages). Evolution Strategies
(ES) [Michalewicz (1996)] performs better when PUF outputs are noisy. Both of them requires
attacker to collect several CRPs of the same PUF. The approximate number of CRPs NCRP that
is required to learn a k-stage arbiter PUF with error rate ε should obey Eq. 4.5.
NCRP ' 0.5
k + 1
ε
(4.5)
If adversaries have access to PUF hardware, the time used to retrieve a private key generated
by a certain PUF is expressed as Eq. 4.6.
62
Where ttest is the time cost of getting one CRP pair and ttrain is the time of training all NCRP
pairs. For delay-based PUF, the lower bound of ttest is time of single message round trip. Hence,
ttest increases with PUF size (number of stages k). According to Eq. 4.5, NCRP is also linearly
proportional to k. As a result, larger PUF is more robust under modeling attack.
T = NCRP × ttest + ttrain (4.6)
In our PUF based encryption algorithm, we assume PUFs are physically protected and attacker
does not have access to PUF CRPs. However, we should choose relatively large PUFs for encryption
in order to increase difficulty of modeling attacks.
4.3.3 Side-channel Attack
Side-channel attack is the kind of attacks that exploit information gained from computer system
implementation. Several side-channel attacks can be applied to RSA encryption/decryption. For
example, power analysis can be used to decode RSA private key bit. Decryption algorithm of RSA
has steps with multiplication. The peak in CPU power profile represents the step of algorithm
without multiplication. The broader pulse of power profile reflects there is a multiplication in the
algorithm. The data dependent power profile then allows attackers to differentiate between key
bits 0 and 1.
Since side-channel attack relies on the relationship between information leaked through side-
channel and the secret data, there are two kinds of countermeasures, either reducing information
leak or weaken the relationship between leaked information and secret data. The second kind of
countermeasure is also called decorrelation.
Blinding is a simple decorrelation technique that can be applied to RSA. For an RSA decryption
with secret exponent d, encryption exponent e and modulus n, encryption operation is the same as
normal RSA and the decryption operation requires additional steps:
• Assume the ciphertext is y. Receiver selects random r < n and generates y′ = yre (mod n).
This is called blinding y.
63
Sender Receiver
y = (mod n)me
r Blinding Term 
y Message to be shared 
(d,n) Server's private key 
(e,n) Server's public key 
rChoose
(y (mod n)re)d
= r (mod n)yd
= (mod n)ydred
Figure 4.5 Blinding technique
Figure 4.6 Time cost of RSA encryption/decryption with different modulus length
• Then calculate y′d (mod n) = (yre)d (mod n) = ydred = ydr
• Since r is chosen by receiver, receiver can compute its inverse modulo r−1 to cancel out the
factor r in the result and obtain yd, the actual result of decryption.
For attacks that require collecting side-channel information from operations, blinding is an
effective countermeasure, since the actual operation is executed on a randomized version of the
data, over which the attacker has no control or even knowledge.
Blinding introduces several additional operations:
• Generate random number r.
• Calculate y′ = yre (mod n) instead of res = ye (mod n). So there is an additional multipli-
cation.
64
• Perform y′d (mod n). So there is additional multiplication and mode (its cost equals to
another RSA encryption)
• Cancel out factor r. So there is one additional inverse modulo operation.
Blinding RSA needs 4 more operations compared to traditional RSA , but it is necessary to
defend against power side-channel attack.
In our proposed PUF based cryptosystem, power side-channel is also possible and risky. In
traditional RSA, the only secret we need to hold is the private key d. In proposed PUF based asym-
metric encryption, both PUFA(KA) and PUFB(kB) may be leaked through power side-channel.
All 4 steps depicted in Section. 3.2 contain multiplication steps. So every intermediate result of
our PUF based asymmetric encryption needs to be blinded.
4.3.4 Speed verification
In public-key cryptosystem, key sharing speed is limited by both computation and communica-
tion overhead.
4.3.4.1 Computation Overhead
According to [McIvor et al. (2004)], speed of RSA exponentiation (encryption and decryption)
can be calculated according to Eq. 4.7. Where k is the modulus bit length, ke and kd are the public
and private exponent bit length respectively.
nencrypt = (k + 2)(ke + 3)
ndecrypt = (k/2 + 2)(kd/2 + 3)
(4.7)
Assuming the CPU clock speed is 3GHz, time cost of encryption/decryption operations with
different key lengths is shown in Fig. 4.7. With different key lengths, the operation time varied
from 0.1µs to 32µs.
65
Figure 4.7 Time cost of RSA encryption/decryption with different modulus length
Calculations in PUF based asymmetric cryptosystem are similar to RSA encryption/decryption
but with much shorter key. Hence, time cost of those operations can be estimated by using the
same equation. Short key greatly reduces time cost of each operation.
4.3.4.2 Communication Overhead
Although PUF based public-key cryptosystem reduces computation overhead, it needs 3 more
rounds of message transmission (Section. 4.3.3). Therefore, PUF based cryptosystem is more
suitable for computation dominated applications. In this thesis, we assume TCP/IP is used to
transmit encrypted data. Speed of TCP/IP decreases with distance between sender and receiver.
As a result, PUF based key sharing is good for communication between different nodes in high
speed computer clusters. A computer cluster is a single logical unit consists of multiple computers
that are lined through a local-area network (LAN). Nowadays, cluster computing is widely used to
compute scientific tasks and machine learning algorithms. In a cluster, all nodes (computers) are
close to each other and corresponding data transmission is very fast.
tdelivery =
Message size
Network throughput
(4.8)
66
Table 4.9 Communication Overhead of TCP/IP
Environment Throughput
(Mbit/sec)
Single
Round Tc
(µs)
ATM Network 26
Modeklev
et al.
(1994)
177.2
Satellite Link 155
Partridge
and
Shepard
(1997)
29.7
Ethernet Cluster 800
Bierbaum
(2002)
5.8
Optical Networks 40, 000
Callegati
et al.
(1999)
1.2
Table 4.9 shows throughput of TCP/IP under different circumstances. Network throughput is
the rate of successful message delivery over a communication channel. It directly reflects speed of
networks. According to Bierbaum (2002), with a standard gigabit network interface card (NIC),
TCP communications in high-speed cluster can be as much as 800 Mbit/sec.
In TCP/IP, message needs to be warped and then transmitted in several packets. Some of
TCP/IP standards, such as message wrapping procedure and minimum datagram size, influences
transmission overhead.
• Message Wrapping: In TCP/IP, all message packets are composed of data and header.
First, data is wrapped in a TCP segment by adding 20 bytes header. The TCP segment
is then wrapped again in an IP packet where another 20 bytes of header is inserted. For
example, a 1024-bit (128 bytes) encrypted data should be at least 168 bytes.
67
• Minimum Datagram Size: The actual message size must be a multiple of minimum data-
gram size. The minimum datagram size that all hosts must be prepared to accept is 576 bytes
for IPv4 AND 1280 bytes for IPv6. Data should arrive in one packet only if it is smaller than
536 bytes (4288-bit) for IPv4. Here, 4288-bit is called single package data limit (Sp) of IPv4.
tavg =
⌈n+ 40
MDS
⌉ MDS
network throughput
(4.9)
Assume there is an n-byte message, time cost (t) of single round transmission is expressed in
Eq. 4.9, where MDS is minimum datagram size in bit.
Nowadays, the largest RSA number in RSA factoring challenge is 2048-bit, which is still much
smaller than Sp. Therefore, ciphertext C of RSA always arrives in 1 packet and introduces same
amount of transmission overhead regardless of its size. On the other hand, ciphertext of our
proposed PUF based encryption protocol should be even smaller and it also arrives in one packet.
Message delivery time of single packet (Tc) is listed in Table 4.9. The communication overhead
introduced by single round message transmission is varied from one to hundreds of microseconds
in different TCP/IP networks.
4.3.5 Overall Time Cost
As we discussed in section 4.3.1, 256-bit key is sufficient for proposed PUF based asymmetric
encryption. On the other hand, even 768-bit RSA number is breakable through parallel computing.
Here we evaluated overall time cost of 256-bit key PUF based encryption and 768-bit RSA on a 3.3
GHz single processor. Since the advanced PUF based encryption protocol requires additional step
in each encryption/decryption step, its time consumption is relatively longer.
The performance comparison between RSA and PUF based encryption is shown in Fig. 4.8
(a) and similar comparison of RSA and advanced PUF based digital sigining is shown in Fig. 4.8
(b). Here the x-axis is the network type and the y-axis is the time cost of one time info sharing.
According to Table. 4.9, optical network is fastest and ATM network is slowest. As we can see,
RSA works better when there is a slow network, such as ATM network and Satellite link. In high
68
Figure 4.8 Overall time cost of RSA and proposed PUF based asymmetric cryptosystem.
(a) PUF based asymmetric encryption (b) PUF based digital signature.
speed network (ethernet cluster and optical network), PUF based asymmtric encryption is better.
As advanced PUF based encryption protocol includes communication overhead, it shows advantage
only when there is a very high speed network such as optical network.
4.4 Asymmetric Layout
Asymmetric layout pattern performance is evaluated in Cadence Spectre through Monte Carlo
Sampling of process parameters. Systematic mismatch is specified by editing ”statistics” block.
200 samples are taken. Each sample represents the performance of a single BS-PUF chip. BS-
PUFs with the same transistor size but symmetric layout design are taken as the reference. All
transistors are minimum sized and Vdd = 2.5V , Vss = −2.5V . 1.8um technology is employed here,
since corresponding process variation coefficients are easy to find.
We evaluate the impact of the asymmetric layout strategy on path delay variation.
The probability density of delay difference (TA − TB) in 64-stage and 128-stage APUFs is
shown in Fig. 4.9(a) and (b). From the plot, we can observe that asymmetric layout improved
delay difference variation by 1.2% and 8.1% on 64-stage and 128-stage APUFs respectively. This is
69
Figure 4.9 (a) Probability density of delay difference between two racing paths in 64-stage
APUF. (b) Distribution of delay difference between two racing paths in
128-stage APUF.
because path distance in 64-stage asymmetric APUFs is small, and the effect of systematic variation
is still negligible.
We also estimate the noise ratio caused by insufficient delay differences. We construct a cross-
coupled CMOS NAND arbiter for reliability tests. Arbiter resolution is measured with 2 rising
edges (E1 and E2). We continue decreasing the delay between E1 and E2 until the arbiter circuit
generates an error output. The corresponding delay threshold is labeled as δt. In our case, δt equals
65 ps under 1.2V voltage supply.
As shown in Fig. 4.10, asymmetric layout APUFs only need 128 stages to achieve 100% valid
rate while symmetric layout APUFs need 256 stages. Compare to symmetric layout APUFs, num-
ber of transistors in asymmetric layout APUFs can be halved. Considering the layout overhead,
asymmetric layout saves 39.7% of space.
4.5 Multi-Block APUF
The performance of Multi-Block pattern is evaluated on Arbiter PUFs. The performance of
MBBSPUF would be evaluated later. For Multi-Block Arbiter PUFs (MBAPUF), appropriate
70
n
16 32 64 128 256
R
el
ia
bi
lit
y 
(%
)
60
65
70
75
80
85
90
95
100
105
Symmetric Layout
Asymmetric Layout
Figure 4.10 Percentage of valid bits with symmetric and asymmetric layout.
voltage supply must be chosen for arbiter circuits, it must be shown to provide significant resolution
enhancement to maintain MBAPUF reliability. The impact of the multi-block pattern on PUF’s
other requisite properties should also be evaluated.
Cadence Spectre simulations are used to generate raw delay data and assess arbiter resolution
separately. Delay variability assessment is conducted by 3σ Monte Carlo sampling over process
parameters. Arbiter resolution is tested with ideal rising edges. The device models used are from
the IBM 130nm PDK library. Random mismatch model is included in this PDK.
We construct two 128-stage and two 256-stage APUFs accepting a 256-bit input with a 256-bit
output. DAPUF and MBAPUF outputs could be computed by incorporating result of multiple
APUFs.
Since Monte Carlo simulation here doesn’t consider long correlation distance mismatch by
default, systematic variation is simulated by modifying transistor model file manually. The standard
deviation of threshold gradient (σVth) is derived with mismatch coefficients acquired from [Kinget
(2005)]. Layout distance between racing multiplexers is assumed to be equal to 1.3um. This
distance should be larger in FPGA or ASIC without precise layout modifications.
71
Table 4.10 Block bias under different challenge input
P (Ci = 1) 256-stage APUF 128-stage APUF
100% 9.03% 5.88%
75% 4.70% 2.72%
50% -0.28% -0.52%
The primary simulation procedure is: (1) Monte Carlo sampling 200 times on the top path with
Ci = (1, 1, 1...1). Export transient data, get delay of top multiplexers as TD = (dt,0, dt,1, ...dt,n−1).
(2) Replace vth0 in model file with vth0 + ∆Vth, Monte Carlo sampling 200 times on the bottom
paths with the same Ci. Get delay of all bottom multiplexers as BD = (db,0, db,1, ...db,n−1). TD
and BD are defined as multiplexer delay matrix. In each time of Monte Carlo sampling, ∆Vth
is random chosen from [0, 3σV th] by assuming Vth decreases along y-axis. Result of each time of
Monte Carlo Sampling is considered as data of a different chip. With TD and BD, delay of two
signal propagation paths (D1 and D2) in each chip are calculated according to input challenge.
Take regular APUF as an example, if n is 4 and challenge is (1, 0, 1, 1), then D1 is dt,0 + db,1 +
db,2 + db,3 and D2 is db,0 + dt,1 + dt,2 + dt,3. The response created by this selector chain depends on
the difference between D1 and D2.
4.5.1 Block Bias
As we have concluded in Section 3.4.1, more than 2 blocks are not necessary unless block bias is
higher than 15%. By assuming Vth decreases along y-axis, top multiplexers tend to be faster than
bottom multiplexers. Block bias is maximized when 100% of challenge bits are equal to 1.
To determine appropriate n for 256-stage n-block MBAPUF, we evaluate block bias in 128 and
256 stage selector chains on different challenges.
Table 4.10 shows block bias in 256 paths of all 200 chips (Monte Carlo samples). The maximum
block bias caused by transistor-level systematic mismatch is about 9% and 6% for 256-stage and
128-stage selector chains. Both of them are smaller than 15% implying that the 2-block structure
72
n
16 32 64 128 256
R
el
ia
bi
lit
y 
(%
)
45
50
55
60
65
70
75
80
85
90
95
Vdd = 5V
Vdd = 3.3V
Vdd = 1.2V
Figure 4.11 Valid rate in traditional APUF with arbiter operating under different voltage
supplies. n is the length of selector chain.
is sufficient for bias correction in ASIC design. All following discussions focus on 256-stage 2-block
MBAPUFs.
4.5.2 Reliability Test
MBPUFs produce an invalid response bit as long as one of the block responses is invalid. The
percentage of valid bits in a n-bit response is defined as valid rate. If the valid rate of a single block
is V RBlock, then valid rate of n-block MBAPUF is estimated by Eq. 4.10.
V R = V Rnblock (4.10)
We construct a cross-coupled CMOS NAND arbiter. Its resolution is tested with two ideal rising
edges (E1 and E2). For each supply voltage, the delay between E1 and E2 is decreased until the
arbiter circuit produces an incorrect output. This delay threshold is defined as δt. The simulation
result shows varying the voltage supply from 1.2V to 5V results in δt changing from 65 ps to 30 ps.
Threshold voltage in the top multiplexers is more likely to be lower than in the bottom mul-
tiplexers. Thus, difference between the two paths’ arrival times reaches a minimum when the
challenge contains half 1s and half 0s. Hence we compute the delay between two propagation sig-
73
nals of all 200 256-bit APUFs with (c0, c1...c127) 7→ (1, 1, ...1), (c128, c129...c255) 7→ (0, 0, ...0) and
multiplexer delay matrices getting from Monte Carlo Sampling. Then, check what percent of racing
paths create delay differences smaller than the arbiter resolution (δt). This percentage is the valid
rate.
In our design, supply voltage of the selector chain is set to 1.2V to produce large delay and delay
variation. Fig. 4.11 shows the valid rate of n-stage APUF with arbiter operating under different
voltage supply. According to the plot, the valid rate of 128-stage APUF with 5V arbiter voltage
supply is 89.94%. According to Eq. 4.10, valid rate of corresponding 256-stage 2-block MBAPUF
is 89.94% × 89.94% = 81%, which is very close to the valid rate of 256-stage conventional APUF
with 1.2V arbiter voltage supply (83.18%).
Hence, in 2-block 256-stage MBAPUFs with 1.2V selector chain, we choose 5V as arbiter voltage
supply to achieve similar reliability performance as conventional APUFs.
4.5.3 Uniqueness test
We estimate the uniqueness of APUF, DAPUF and MBAPUF by the average inter-die Hamming
distance (HD) over a group of chips (Monte Carlo Samples). With a pair of chips, i and j (i 6= j),
both having n-bit response Ri and Rj respectively, the average inter-die HD among a group of k
chips is defined as:
HD =
2
k(k − 1)
k−1∑
i=1
k∑
j=i+1
HD(Ri, Rj)
n
× 100% (4.11)
To simulate a heavily biased situation, responses used for uniqueness validation are generated
with challenge (c0, c1, ...c255) 7→ (1, 1, ...1). For n-bit PUFs, HD should be as close as possible to
n/2. As shown in Fig. 4.12, mean of HD is left shifted by around 6 bits in 256-stage APUFs due
to systematic variation. 256-stage 2-block MBAPUF corrects this shifting.
Table 4.11 shows average HD for 256-stage and 128-stage traditional APUF, 256-stage 2-block
MBAPUF and 256-stage 2-1 DAPUF. For 256-stage 2-1 DAPUFs, the mean HD is 127.61 bits with
a standard deviation of 8.01 bits. For 256-stage 2-block MBPUF, these values are 127.89 bits and
74
Inter-chip HD
100 110 120 130 140 150 160
P
ro
ba
bi
lit
y 
D
en
si
ty
0
0.05
0.1
0.15
0.2
0.25
1x256-stage
2x128-stage
Figure 4.12 Hamming distance distribution of 256-stage APUFs and 128-stage 2-block
MBAPUFs
Table 4.11 Inter-die HD of different APUF structures
µ(HD) δ(HD) Avg. HD
256-stage APUF 122.01 8.05 47.66%
128-stage APUF 125.27 8.26 48.93%
256-stage 2-block MBAPUF 127.89 7.99 49.96%
256-stage 2-1 DAPUF 127.61 8.01 49.85%
7.99 bits, respectively. From this we can say, MBAPUFs provide similar uniqueness performance
as the DAPUFs with same number of blocks.
4.5.4 Reproducibility test
The usefulness of a single PUF relies on it producing a consistent response to a challenge;
it should be independent from the environment. Tests are performed subjecting MBAPUF to
temperature variation. The frequency of response bit flips is quantified.
Bit flip rate is the frequency a bit changes from 0 7→ 1 or 1 7→ 0. It is computed relative to
some baseline response. Gathering responses at common room temperature (27◦C) establishes this
baseline. The percentage of path delays where a bit flips is the bit flip rate. For example, 64/256
bits of the response under 70◦C being different from response under 27◦C represents a 25% bit flip
rate.
To simulate performance under different temperatures, multiplexer delay matrices of n-stage
selector chains are gathered by Monte Carlo sampling at 0◦C, 10◦C, 20◦C, 27◦C, 30◦C, 40◦C,
75
Temperature (°C)
0 10 20 30 40 50 60 70
F
lip
 R
at
e 
(%
)
0
2
4
6 64-stage128-stage
256-stage
Figure 4.13 Bit flip rate of different stages Arbiter PUFs under temperature variation
Table 4.12 Area and power cost of different APUFs
Area (gate) Power (watt)
256-stage APUF 1667 0.2310
256-stage 2-block MBAPUF 1670 0.2574
256-stage 2-1 DAPUF 3334 0.4620
50◦C, 60◦C and 70◦C. The same seed is used in all of these sampling instances to make sure no
other parameter other than temperature is changed.
Flip rate of 64-stage, 128-stage and 256-stage selector chains are computed with challege
(c0, c1, ...c255) 7→ (1, 1, ...1), shown in Fig. 4.13. While flip rate of 64-stage selector chains is
slightly higher, flip rate of 128-stage and 256-stage chains are quite similar. With flip rate of one
selector chain (FRb), flip rate of 2-block DAPUF and MBAPUF can be calculated with Eq. 4.12.
FRMBPUF = 2× FRb × (1− FRb) (4.12)
The maximum difference between flip rate of 128-stage and 256-stage selector chains is 0.52% with
FRb,128 = 4.2% and FRb,256 = 3.7%. Corresponding flip rate of 2-block MPAPUFs and 2-1 DAPUF
are 8.1% and 7.1%. This indicates reproducibility of MBAPUF is similar to DAPUFs.
4.5.5 Area & Power
Compared to DAPUF, a major advantage of MBAPUF is low area and power consumption. We
evaluate these metrics for both DAPUFs and MBAPUFs. Relevant results are shown in Table 4.12.
76
We estimate area of 1-bit MBAPUF and DAPUF in number of gate equivalents. In our im-
plementation, each pair of multiplexers contains 26 transistors and each arbiter is composed of 12
transistors. Compared to 2-1 DAPUFs, area of 256-stage 2-block MBAPUFs is reduced by about
50%.
Since static power consumption is very small, only transient power dissipation is evaluated.
Transient power estimation is conducted on multiplexer and arbiter separately. It is performed
in 3 steps: (1) Plot transient current in cadence ADE L, to find its peak. (2) Multiply peak transient
current with voltage supply. (3) Sum up power consumption of all circuit blocks.
According to Cadence transient simulation results, the power dissipation of MBAPUFs with
2-level voltage supply is only 55.71% of DAPUFs with a similar uniqueness performance.
77
CHAPTER 5. SUMMARY AND DISCUSSION
In this work, we explore a variety of encryption protocols based on commutative PUFs and
propose a circuit implementation of the required commutative PUFs (BS-PUF). Commutativity
relies on symmetric delays in forward and backward paths regardless of the message bit state.
Spectre Monte Carlo simulations indicate only less than 1 bit delay offset is introduced by plaintext
bit state variation. This ensure the commutativity of the system. Simulation shows that inter-
chip variability (up to ±25% chip-to-chip variation) is acceptable. These encryption PUFs have
potential to root the encryption in hardware, hence increasing robustness beyond current software
only solutions.
Asymmetric encryption methods are valued for their ability to establish a secure communication
channel in the absence of a shared secret. Such methods require complex computations resulting
in low throughput compared to symmetric encryption. Basing encryption in hardware limits the
attack surface. An adversary cannot retrieve the message even when either encryption key or
ciphertext are known; information about the PUF behavior is not available to them. The behavior
of the encryption function becomes a secret. Thus, more entropy is added to the system.
We also developed a general PUF based public key encryption protocol, which is suitable for
all kinds of high entropy PUFs. Both its reliability and speed are proven by simulation.
Besides, a new layout scheme improving inter-die process variation for delay-based PUF circuits
is presented. Simulation results demonstrate appropriate layout design can utilize the linear gradi-
ents effect in the doping process to improve delay variation. This layout strategy amplifies process
variation experienced by each delay unit. Hence, fewer transistors achieve equivalent process vari-
ation. However, asymmetric layout reduce the inter-chip uniqueness of PUFs, Multi-Block pattern
could be applied to eliminate this problem. From the experimental results, we confirmed that in
78
ASIC APUF design, Multi-Block pattern effectively eliminate the systematic bias introduced by
layout.
79
BIBLIOGRAPHY
A, C. (2006). 8-bit avr microcontroller with 128kb in-system programmable flash, atmega 128.
Technical Report.
Bace, M. M. (2007). Cipher block chaining decryption. US Patent 7,200,226.
Back, T. (1996). Evolutionary algorithms in theory and practice: evolution strategies, evolutionary
programming, genetic algorithms. Oxford university press.
Bastos, J., Steyaert, M., Graindourze, B., and Sansen, W. (1996). Matching of mos transistors
with different layout styles. In Microelectronic Test Structures, 1996. ICMTS 1996. Proceedings.
1996 IEEE International Conference on, pages 17–18. IEEE.
Bellare, M., Kilian, J., and Rogaway, P. (1994). The security of cipher block chaining. In Annual
International Cryptology Conference, pages 341–358. Springer.
Bellare, M., Kilian, J., and Rogaway, P. (2000). The security of the cipher block chaining message
authentication code. Journal of Computer and System Sciences, 61(3):362–399.
Bellare, M. and Rogaway, P. (1994). Optimal asymmetric encryption. In Workshop on the Theory
and Application of of Cryptographic Techniques, pages 92–111. Springer.
Bennett, C. H. and Landauer, R. (1985). The fundamental physical limits of computation. Scientific
American, 253(1):48–56.
Bertoni, G., Daemen, J., Peeters, M., and Van Assche, G. (2011). The keccak reference. Technical
Report.
Bertoni, G., Daemen, J., Peeters, M., and Van Assche, G. (2016). The keccak sponge function
family. Technical Report.
80
Bierbaum, N. (2002). Mpi and embedded tcp/ip gigabit ethernet cluster computing. In Local
Computer Networks, 2002. Proceedings. LCN 2002. 27th Annual IEEE Conference on, pages
733–734. IEEE.
Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
Boneh, D. et al. (1999). Twenty years of attacks on the rsa cryptosystem. Notices of the AMS,
46(2):203–213.
Bresson, E., Chevassut, O., Pointcheval, D., and Quisquater, J.-J. (2001). Provably authenticated
group diffie-hellman key exchange. In Proceedings of the 8th ACM conference on Computer and
Communications Security, pages 255–264. ACM.
Callegati, F., Casoni, M., and Raffaelli, C. (1999). Packet optical networks for high-speed tcp-ip
backbones. IEEE Communications Magazine, 37(1):124–129.
Cao, Y., Zhang, L., Chang, C.-H., and Chen, S. (2015). A low-power hybrid ro puf with improved
thermal stability for lightweight applications. IEEE Transactions on computer-aided design of
integrated circuits and systems, 34(7):1143–1147.
Cavallar, S., Dodson, B., Lenstra, A., Leyland, P., Lioen, W., Montgomery, P. L., Murphy, B.,
Te Riele, H., and Zimmermann, P. (1999). Factorization of rsa-140 using the number field
sieve. In International Conference on the Theory and Application of Cryptology and Information
Security, pages 195–207. Springer.
Che, W., Saqib, F., and Plusquellic, J. (2015). Puf-based authentication. In Computer-Aided
Design (ICCAD), 2015 IEEE/ACM International Conference on, pages 337–344. IEEE.
Chen, Q., Csaba, G., Ju, X., Natarajan, S., Lugli, P., Stutzmann, M., Schlichtmann, U., and
Rührmair, U. (2009). Analog circuits for physical cryptography. In Integrated Circuits, ISIC’09.
Proceedings of the 2009 12th International Symposium on, pages 121–124. IEEE.
81
Choi, W., Kim, S., Kim, Y., Park, Y., and Ahn, K. (2010). Puf-based encryption processor for the
rfid systems. In Computer and Information Technology (CIT), 2010 IEEE 10th International
Conference on, pages 2323–2328. IEEE.
Conti, M., Crippa, P., Orcioni, S., and Turchetti, C. (1999). Statistical modeling of mos transistor
mismatch based on the parameters’ autocorrelation function. In Circuits and Systems, 1999.
ISCAS’99. Proceedings of the 1999 IEEE International Symposium on, volume 6, pages 222–225.
IEEE.
Daemen, J. and Rijmen, V. (2013). The design of Rijndael: AES-the advanced encryption standard.
Springer Science & Business Media.
Davis, J. A. and Holdridge, D. B. (1984). Factorization using the quadratic sieve algorithm. In
Advances in cryptology, pages 103–113. Springer.
Devadas, S., Suh, E., Paral, S., Sowell, R., Ziola, T., and Khandelwal, V. (2008). Design and im-
plementation of puf-based” unclonable” rfid ics for anti-counterfeiting and security applications.
In RFID, 2008 IEEE International conference on, pages 58–64. IEEE.
Fruhashi, K., Shiozaki, M., Fukushima, A., Murayama, T., and Fujino, T. (2011). The arbiter-puf
with high uniqueness utilizing novel arbiter circuit with delay-time measurement. In Circuits
and Systems (ISCAS), 2011 IEEE International Symposium on, pages 2325–2328. IEEE.
Gao, M., Lai, K., and Qu, G. (2014). A highly flexible ring oscillator puf. In Proceedings of the
51st Annual Design Automation Conference, pages 1–6. ACM.
Grünebaum, U., Oehm, J., and Schumacher, K. (2001). Mismatch modeling and simulation?a
comprehensive approach. Analog integrated circuits and signal processing, 29(3):165–171.
Gu, C., Murphy, J., and O’Neill, M. (2014). A unique and robust single slice fpga identification
generator. In Circuits and Systems (ISCAS), 2014 IEEE International Symposium on, pages
1223–1226. IEEE.
82
Guajardo, J., Kumar, S. S., Schrijen, G.-J., and Tuyls, P. (2007a). Fpga intrinsic pufs and their use
for ip protection. In International workshop on Cryptographic Hardware and Embedded Systems,
pages 63–80. Springer.
Guajardo, J., Kumar, S. S., Schrijen, G.-J., and Tuyls, P. (2007b). Physical unclonable functions
and public-key crypto for fpga ip protection. In Field Programmable Logic and Applications,
2007. FPL 2007. International Conference on, pages 189–195. IEEE.
Guo, Y., Dee, T., and Tyagi, A. (2018a). Barrel shifter physical unclonable function based encryp-
tion. Cryptography, 2(3):22.
Guo, Y., Dee, T., and Tyagi, A. (2018b). Multi-block apuf with 2-level voltage supply. In 2018
IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 327–332. IEEE.
Harrell, F. E. (2015). Ordinal logistic regression. In Regression modeling strategies, pages 311–325.
Springer.
Hashmi, I. and Babu, H. M. H. (2010). An efficient design of a reversible barrel shifter. In VLSI
Design, 2010. VLSID’10. 23rd International Conference on, pages 93–98. IEEE.
Holcomb, D. E., Burleson, W. P., and Fu, K. (2009). Power-up sram state as an identifying
fingerprint and source of true random numbers. IEEE Transactions on Computers, 58(9):1198–
1210.
Hori, Y., Yoshida, T., Katashita, T., and Satoh, A. (2010). Quantitative and statistical performance
evaluation of arbiter physical unclonable functions on fpgas. In Reconfigurable Computing and
FPGAs (ReConFig), 2010 International Conference on, pages 298–303. IEEE.
Hospodar, G., Maes, R., and Verbauwhede, I. (2012). Machine learning attacks on 65nm arbiter
pufs: Accurate modeling poses strict bounds on usability. In Information Forensics and Security
(WIFS), 2012 IEEE International Workshop on, pages 37–42. IEEE.
83
Juang, W.-S. (2004). Efficient multi-server password authenticated key agreement using smart
cards. IEEE Transactions on Consumer Electronics, 50(1):251–255.
Kinget, P. R. (2005). Device mismatch and tradeoffs in the design of analog circuits. IEEE Journal
of Solid-State Circuits, 40(6):1212–1224.
Kleber, S., Unterstein, F., Matousek, M., Kargl, F., Slomka, F., and Hiller, M. (2015). Secure
execution architecture based on puf-driven instruction level code encryption. IACR Cryptology
ePrint Archive, 2015:651.
Kumar, R., Patil, V. C., and Kundu, S. (2011). Design of unique and reliable physically unclonable
functions based on current starved inverter chain. In VLSI (ISVLSI), 2011 IEEE Computer
Society Annual Symposium on, pages 224–229. IEEE.
Kumar, R., Patil, V. C., and Kundu, S. (2012). On design of temperature invariant physically
unclonable functions based on ring oscillators. In VLSI (ISVLSI), 2012 IEEE Computer Society
Annual Symposium on, pages 165–170. IEEE.
Kumar, S. S., Guajardo, J., Maes, R., Schrijen, G.-J., and Tuyls, P. (2008). The butterfly puf
protecting ip on every fpga. In Hardware-Oriented Security and Trust, 2008. HOST 2008. IEEE
International Workshop on, pages 67–70. IEEE.
Lofstrom, K., Daasch, W. R., and Taylor, D. (2000). Ic identification circuit using device mis-
match. In Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000 IEEE
International, pages 372–373. IEEE.
Long, D., Hong, X., and Dong, S. (2005). Optimal two-dimension common centroid layout gen-
eration for mos transistors unit-circuit. In Circuits and Systems, 2005. ISCAS 2005. IEEE
International Symposium on, pages 2999–3002. IEEE.
Machida, T., Yamamoto, D., Iwamoto, M., and Sakiyama, K. (2014). A new mode of operation
for arbiter puf to improve uniqueness on fpga. In Computer Science and Information Systems
(FedCSIS), 2014 Federated Conference on, pages 871–878. IEEE.
84
Machida, T., Yamamoto, D., Iwamoto, M., and Sakiyama, K. (2015). Implementation of double
arbiter puf and its performance evaluation on fpga. In Design Automation Conference (ASP-
DAC), 2015 20th Asia and South Pacific, pages 6–7. IEEE.
MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge university
press.
Maes, R. (2016). Physically Unclonable Functions. Springer.
Maiti, A., Casarona, J., McHale, L., and Schaumont, P. (2010). A large scale characterization of
ro-puf. In Hardware-Oriented Security and Trust (HOST), 2010 IEEE International Symposium
on, pages 94–99. IEEE.
Maiti, A. and Schaumont, P. (2009). Improving the quality of a physical unclonable function using
configurable ring oscillators. In Field Programmable Logic and Applications, 2009. FPL 2009.
International Conference on, pages 703–707. IEEE.
Maiti, A. and Schaumont, P. (2011). Improved ring oscillator puf: an fpga-friendly secure primitive.
Journal of cryptology, 24(2):375–397.
Mansouri, S. S. and Dubrova, E. (2012). Ring oscillator physical unclonable function with multi
level supply voltages. In Computer Design (ICCD), 2012 IEEE 30th International Conference
on, pages 520–521. IEEE.
McIvor, C., McLoone, M., and McCanny, J. V. (2004). Modified montgomery modular multipli-
cation and rsa exponentiation techniques. IEE Proceedings-Computers and Digital Techniques,
151(6):402–408.
Michalewicz, Z. (1996). Evolution strategies and other methods. In Genetic Algorithms+ Data
Structures= Evolution Programs, pages 159–177. Springer.
85
Modeklev, K., Klovning, E., and Kure, O. (1994). Tcp/ip behavior in a high-speed local atm
network environment. In Local Computer Networks, 1994. Proceedings., 19th Conference on,
pages 176–185. IEEE.
Morozov, S., Maiti, A., and Schaumont, P. (2010). An analysis of delay based puf implementations
on fpga. In ARC, pages 382–387. Springer.
Partridge, C. and Shepard, T. J. (1997). Tcp/ip performance over satellite links. IEEE network,
11(5):44–49.
Pelgrom, M. J., Duinmaijer, A. C., and Welbers, A. P. (1989). Matching properties of mos tran-
sistors. IEEE Journal of solid-state circuits, 24(5):1433–1439.
Pereira, R., Michell, J., and Solana, J. (1995). Fully pipelined tspc barrel shifter for high-speed
applications. IEEE Journal of Solid-State Circuits, 30(6):686–690.
Robert, C. P. (2004). Monte carlo methods. Wiley Online Library.
Rührmair, U., Sehnke, F., Sölter, J., Dror, G., Devadas, S., and Schmidhuber, J. (2010). Modeling
attacks on physical unclonable functions. In Proceedings of the 17th ACM conference on Computer
and communications security, pages 237–249. ACM.
Rührmair, U., Sölter, J., Sehnke, F., Xu, X., Mahmoud, A., Stoyanova, V., Dror, G., Schmidhuber,
J., Burleson, W., and Devadas, S. (2013). Puf modeling attacks on simulated and silicon data.
IEEE Transactions on Information Forensics and Security, 8(11):1876–1891.
Sahoo, D. P., Mukhopadhyay, D., and Chakraborty, R. S. (2013). Design of low area-overhead ring
oscillator puf with large challenge space. In Reconfigurable Computing and FPGAs (ReConFig),
2013 International Conference on, pages 1–6. IEEE.
Schwefel, H.-P. P. (1993). Evolution and optimum seeking: the sixth generation. John Wiley &
Sons, Inc.
86
Selimis, G., Konijnenburg, M., Ashouei, M., Huisken, J., de Groot, H., van der Leest, V., Schrijen,
G.-J., van Hulst, M., and Tuyls, P. (2011). Evaluation of 90nm 6t-sram as physical unclonable
function for secure key generation in wireless sensor nodes. In Circuits and Systems (ISCAS),
2011 IEEE International Symposium on, pages 567–570. IEEE.
Seoane, N., Martinez, A., Brown, A. R., Barker, J. R., and Asenov, A. (2009). Current variability
in si nanowire mosfets due to random dopants in the source/drain regions: A fully 3-d negf
simulation study. IEEE Transactions on electron devices, 56(7):1388–1395.
Suh, G. E. and Devadas, S. (2007). Physical unclonable functions for device authentication and
secret key generation. In Proceedings of the 44th annual Design Automation Conference, pages
9–14. ACM.
Suh, G. E., O’Donnell, C. W., Sachdev, I., and Devadas, S. (2005). Design and implementation
of the aegis single-chip secure processor using physical random functions. In ACM SIGARCH
Computer Architecture News, volume 33, pages 25–36. IEEE Computer Society.
Tajik, S., Dietz, E., Frohmann, S., Seifert, J.-P., Nedospasov, D., Helfmeier, C., Boit, C., and
Dittrich, H. (2014). Physical characterization of arbiter pufs. In International Workshop on
Cryptographic Hardware and Embedded Systems, pages 493–509. Springer.
Takagi, T. (1998). Fast rsa-type cryptosystem modulo p k q. In Advances in Cryptology-
CRYPTO’98, pages 318–326. Springer.
Urbi Chatterjee, R. S. C. and Mukhopadhyay, D. (2016). A puf-based secure communication
protocol for iot. Cryptology ePrint Archive, Report 2016/674. http://eprint.iacr.org/2016/
674.
Vivekraja, V. and Nazhandali, L. (2011). Feedback based supply voltage control for temperature
variation tolerant pufs. In VLSI Design (VLSI Design), 2011 24th International Conference on,
pages 214–219. IEEE.
Weisstein, E. W. (2003). Rsa-576 factored.
87
Yin, C.-E., Qu, G., and Zhou, Q. (2013). Design and implementation of a group-based ro puf. In
Proceedings of the Conference on Design, Automation and Test in Europe, pages 416–421. EDA
Consortium.
Yin, C.-E. D. and Qu, G. (2010). Lisa: Maximizing ro puf’s secret extraction. In Hardware-Oriented
Security and Trust (HOST), 2010 IEEE International Symposium on, pages 100–105. IEEE.
