Techniques for design and implementation of physically unclonable functions by Majzoobi, Mehrdad
RICE UNIVERSITY 
Techniques for Design and Implementation of Physically 
Unclonable Functions 
by 
Mehrdad Majzoobi 
A THESIS SUBMITTED 
IN PARTIAL FULFILLMENT OF THE 
REQUIREMENTS FOR THE DEGREE 
Master of Science 
APPROVED, THESIS (COMMITTEE: 
^ 
Dr. Farinaz Koushgnfar, Chair 
Assistant Professor, Electrical and Com-
puter Engineering 
Dr. Richard Baraniuk 
Victor E. Cameron Professor, Electrical 
and Computer Engineering 
Dr. KartilrMMiafu-ani 
Assistant Professor, Electrical and Com-
puter Engineering 
s 
ffluhtfa* 
Dr. Miodrag Potkonjak 
Professor, Computer Science, UCLA 
HOUSTON, TEXAS 
APRIL 2009 
UMI Number: 1466803 
INFORMATION TO USERS 
The quality of this reproduction is dependent upon the quality of the copy 
submitted. Broken or indistinct print, colored or poor quality illustrations 
and photographs, print bleed-through, substandard margins, and improper 
alignment can adversely affect reproduction. 
In the unlikely event that the author did not send a complete manuscript 
and there are missing pages, these will be noted. Also, if unauthorized 
copyright material had to be removed, a note will indicate the deletion. 
® UMI 
UMI Microform 1466803 
Copyright 2009 by ProQuest LLC 
All rights reserved. This microform edition is protected against 
unauthorized copying under Title 17, United States Code. 
ProQuest LLC 
789 East Eisenhower Parkway 
P.O. Box 1346 
Ann Arbor, Ml 48106-1346 
ABSTRACT 
Techniques for Design and Implementation of Physically Unclonable Functions 
by 
Mehrdad Majzoobi 
Physically unclonable functions (PUFs) provide a basis for many security, 
and digital rights management protocols. PUFs exploit the unclonable 
and unique manufacturing variability of silicon devices to establish a se-
cret. However, as we will demonstrate in this work, the classic delay-based 
PUF structures have a number of drawbacks including susceptibility to 
prediction, reverse engineering, man-in-the-middle and emulation attacks, 
as well as sensitivity to operational and environmental variations. 
To address these limitations, we have developed a new set of techniques 
for design and implementation of PUF. We design a secure PUF archi-
tecture and show how to predict response errors as well as to compress 
the challenge/responses in database. We further demonstrate applications 
where PUFs on reconfigurable FPGA platforms can be exploited for pri-
vacy protection. The effectiveness of the proposed techniques is validated 
using extensive implementations, simulations, and statistical analysis. 
£ 
Contents 
Abstract ii 
1 Introduction 1 
2 Preliminaries 4 
3 Related Work 6 
4 Vulnerabilities 9 
4.1 Reverse engineering 9 
4.2 Emulation 15 
4.3 Man-in-the-middle 17 
4.4 Reconfiguration 17 
4.5 Collision of Responses 17 
5 Countermeasures and Safeguards 19 
5.1 State-of-the-art 19 
5.2 New safeguarding methods 21 
6 Applications 39 
6.1 Configure-and-erase Method 39 
7 Implementation Results 42 
8 Conclusion 53 
References 54 
List of Figures 
2.1 PUF fundamental building blocks. 5 
2.2 Parallel PUF structure. The feed forward arbiter (shown in the dashed 
line) is used to introduce nonlinearity. . 5 
4.1 Evaluation of accuracy distributions versus the number of measure-
ments obtained for a collection of 50 chips . 1 1 
4.2 Modeling accuracy distributions versus the measurement errors. Dis-
tributions are obtained for a collection of 50 chips 12 
4.3 The feed-forward PUF architecture. The arbiter (A*) introduces non-
linearity. . 12 
4.4 Modeling accuracy versus the number of measurements obtained for a 
collection of 20 FF PUFs. 16 
4.5 An example of emulation attack. 16 
4.6 The increase in the collision probability compared to the parallel PUF 
vs. feed-forward arbiter's input (x-axis) and output (y-axis) locations. 18 
5.1 Delay characterization circuitry 22 
5.2 Arbiter characteristics 23 
5.3 The general architecture of the proposed Secure PUF 27 
5.4 The input network realization using XOR logic. 30 
5.5 Example: the difficulty of inverting the output network. 32 
5.6 An example of output network for Q = 5, Q' =4, x = 4 and s = 1. . . 32 
5.7 An m-bit circular shift interconnecting scheme that connects Q rows 
of parallel PUFs with transformed challenges 34 
V 
5.8 Probability of the response flip given a flip in the k-th challenge bit 
for a single row parallel PUF. The dot (circular) markers show the 
probability before (after) transforming the challenges 35 
5.9 Deviation of transitional probabilities of individual PUF instances from 
the SAC 36 
5.10 The amount of CRP compressibility for parallel, FF and the proposed 
secure PUFs. Each box represents the results of 10 experiments on 
different PUF realizations 38 
5.11 Sensitivity of PUF transitional behavior to outlier switch delays. . . . 38 
6.1 An example of a database entry. 40 
6.2 Configure-and-erase scheme steps 40 
7.1 The detailed structure of the delay characterization test circuit. The 
clock frequency is consciously swept using an external clock generator 
and the frequency range is shifted up by the internal FPGA PLL. . . 43 
7.2 The PUF circuit on the FPGA floor plan after manual placement. A 
screen shot from FPGA Editor tool in the Xilinx ISE software 43 
7.3 Measured arbiter characteristic 44 
7.4 Measured arbiter characteristic in region 4 and the corresponding Gaus-
sian fit 45 
7.5 The top and bottom path delays of PUFs on XC5VLX50 chips for 
(a) rising edge transition, and (b) falling edge transition; The top and 
bottom path delays of PUFs on XC5VLX110 chips for (c) rising edge 
transition, and (d) falling edge transition 46 
7.6 The sample flip flop speed a of PUF circuits on XC5VLX50 chips for 
(a) rising edge transition (ariae), and (b) falling edge transition (O/OH); 
The sample flip flop speed o of PUF circuits on XC5VLX110 chips for 
(c) rising edge transition (crrise)> and (d) falling edge transition (crfaii)- 48 
7.7 Faulty flip flop behavior 49 
7.8 The distribution of delay difference between the top and bottom paths. 
The distribution is calculated for 8 input challenges, 5 XC5VLX50 
chips, and 3 XC5VLX110 chips. Figures (a) and (b) show the dis-
tribution for rising and falling transition delays on XC5VLX50 chips. 
Similarly, Figures (c) and (d) show the distribution for rising and falling 
transition delays on XC5VLX110 chips. The vertical lines show the 3a 
detection edges of slow and fast arbiters (FF) 52 
List of Tables 
5.1 Latency and area of common hash functions 20 
7.1 Delay variability in current and future FPGA technologies 47 
CHAPTER 1 
Introduction 
Physically unclonable functions (PUFs) axe physical systems with well-defined and 
stable mapping from a set of inputs (challenges) to a set of outputs (responses). 
Mapping is such that the owner of the system can rapidly obtain the output for any 
specified input but there is small probability of obtaining the output in any reasonable 
time by other parties [1]. PUFs should be also prohibitively hard to copy (clone), 
emulate, simulate, or predict. 
There is a wide consensus that intrinsic manufacturing variability of modern 
and pending deep submicron silicon is an excellent PUF implementation platform 
[2, 3, 4, 5, 6]. Silicon technologies form the basis for almost all computing plat-
forms today, while it is not technologically possible to reproduce the inherent silicon 
variability. Security techniques that employ silicon PUFs have numerous important 
advantages over traditional cryptography-based security techniques including much 
better resiliency against physical attacks (e.g. radiation, reverse engineering) [7, 8], 
the absence of covert channels (e.g. power, delay, electromagnetic measurements), 
and much lower time, speed, and power overheads [2, 3, 9]. PUFs have been used 
for a variety of security applications ranging from ID creation and authentication 
[2, 3, 4, 5] to hardware metering and remote enabling and disabling of integrated 
2 
circuits [2, 3, 4, 5, 10]. 
Our research has two conceptual sources: (i) natural PUF evolution through vul-
nerability analysis; and (ii) quest to identify and create the best ways to leverage 
reconfigurability to improve PUF's security and operational properties. Unfortu-
nately, recent analysis have demonstrated that many of the current state-of-the-art 
PUF structures are susceptible to a variety of security attacks. Our objective is to 
design and analyze reconfigurable robust PUFs that are resilient against different 
types of attacks. 
Our analysis considers four types of PUF security attacks: (i) reverse engineer-
ing; (ii) emulation and statistical modeling; (iii) replay (man-in-the-middle); and (iv) 
reconfigurability-specific vulnerabilities. Reverse engineering aims at extracting the 
delay parameters of each delay element. The goal of emulation attack is to efficiently 
compress and store the PUF challenge/responses. Statistical attacks predict the value 
of the PUF outputs by exploiting the correlation among them and/or between the 
outputs and inputs. Replay attack looks for repeated challenges. This attack is in 
particular dangerous for PUF-based digital rights management protocols. A related 
attack, to a certain level, is the one where PUFs are fabricated in such a way that 
their replication is easy for a specific level of manufacturing variability. Finally, recon-
figurability attacks aim to leverage the properties of reconfigurable implementation 
platform to compromise the security of the PUF. Our goal is to create reconfigurable 
PUF structures and the accompanying test procedures that ensure resiliency against 
all the stated attacks. 
The starting point for our research is a new, generic, modular and easy to param-
eterize PUF structure. The structure includes modules for combination of individual 
challenge bits, different configuration schemes of delay elements, and combinations of 
a subsets of the outputs using combinational circuitry to defend against the stated 
attacks. We show how reconfigurability can be employed to strengthen each of these 
3 
defense mechanisms, to enable delay characterization, and to create notions of one-
time PUFs. 
The remainder of the thesis is organized as follows. A brief background on PUFs 
is given in Chapter 2. Chapter 3 presents a survey of related literature. We analyze 
i • -
;• vulnerabilities of PUFs and the potential attacks in Chapter 4. Next in Chapter 5, we 
introduce a set of countermeasures and safeguards to address the exisitng limitations. 
A testing and characterization mechanism is presented for defining the response error 
probability and acheiving robust operation, as well as PUF diagnosis, and compression 
of the challenge-responses. We further in this chapter show the design of special logic 
input/output networks and interconnecting method for higher security and reliability 
of PUFs and demonstrate the effectiveness of the proposed concepts and methods by 
extensive simulations. In Chapter 6, a secure FPGA-based authentication system for 
privacy protection is presented. Chapter 7 presents the measurements and character-
ization results from implemeting the test circuit and delay-based PUFs on Virtex 5 
FPGAs. Finally, we conclude the paper in 8. 
CHAPTER 2 
Preliminaries 
Silicon PUFs exploit manufacturing variability to generate a unique input/output 
mapping for each IC. Delay-based silicon PUFs use the delay variations of CMOS logic 
components to produce unique responses. The responses are generated by comparing 
the analog timing difference between two delay paths that must be equivalent by 
logic-level construction, but are different because of manufacturing variability. The 
delay-based structures use a digital component, arbiter, that translates the analog 
timing difference into a digital value. An arbiter is a sequential component with two 
inputs and one output. The arbiter output is one if a rising edge signal arrives at 
its first input earlier by at least a threshold value compared to the signal arriving at 
the second input. The arbiter's output is zero otherwise. Figure 2.1 (a) shows an 
arbiter implemented using an edge-triggered latch. If the time difference between the 
arriving signals are smaller than the setup and hold times of the latch, the arbiter 
may become metastable and not be able to produce an accurate and deterministic 
output. 
[11] proposed a parallel delay-based PUF circuit shown in Figure 2.2. Generating 
one bit of output requires a signal to travel through two parallel paths with multiple 
segments that are connected by a series of 2-input/2-output switches. As depicted in 
5 
Figure 2.1 (b), each switch is configured to be either a cross or a straight connector, 
based on its selector bit. The arbiter compares the signal arrival times at the end 
of parallel paths (i.e., at its inputs) to produce the corresponding response. The 
'hold 'setup 
-4 -> •«-+• 
—I 
D 
Q 
>G 
1/0 
AD>TeMup 
AD < TtoM 
b) Switch a) Arbiter 
Figure 2.1: PUF fundamental building blocks. 
path segments are designed to have the same nominal delays, but their actual delays 
differ slightly due to manufacturing variability. The difference between the top and 
bottom path delays on the segment n is denoted by 6n on Figure 2.2. To ensure 
larger variations, one could insert additional delay elements on the path segments. 
The PUF challenges (inputs) are the selector bits of the switches. The output bit of 
the arbiter depends on the challenge bits and is permanent for each IC (for a range of 
operational conditions). Parallel PUF's liability to reverse engineering was previously 
Jr 
61 
Rising T 
82 
edge ^ d T 
C2 
83 
Feed Forward 
Arbiter — • 
SN 
C3 C N > 
t 
Response 
bit 
1—{""•-• Challengejbits 
Figure 2.2: Parallel PUF structure. The feed forward arbiter (shown in the dashed 
line) is used to introduce nonlinearity. 
addressed by introducing nonlinearities, such as feed forward (FF) arbiters, in the 
PUF structure [2]. Figure 2.2 also includes a FF arbiter (dashed line) that controls a 
switch selector. Unfortunately, our preliminary study shows even this structure can 
be reverse engineered using a combination of combinatorial and linear programming 
technique [12]. 
CHAPTER 3 
Related Work 
There is a wide and diverse body literature related to the research presented in this 
manuscript including reconfigurable computing, secure and trustable computing sys-
tems, physically unclonable functions (PUFs), techniques for hardware intellectual 
property protection, manufacturing variability (MV) and computer-aided techniques 
for addressing MV. We restrict our attention only on the most directly related re-
search and development results. There are four major conceptual starting points 
for our research: (i) MV-based unique identifiers (IDs); (ii) security and reconfig-
urability (FPGAs); (iii) hardware security attacks; and (iv) integrated circuits (ICs) 
characterization. 
Inevitable manufacturing variability, mainly due to dopants fluctuations, has been 
recognized as one of fundamental physical and technological CMOS scaling barriers 
in early and mid-seventies [13, 14, 15]. In late nineties, it again received a great deal 
of attention, since the first experimental studies demonstrate the validity of early 
predictions [16]. Inspired by these studies and developments, Lofstrom at SiidTech 
in Portland and his collaborators where first to propose intrinsic silicon MV for ID 
extractions [17]. Soon, several works from other groups followed [18, 19]. Also, MV 
has been used as a basis for creation of high quality random number generators [20]. 
7 
Extraction of unique gate-level features from the legacy designs for using as IDs was 
proposed in [21]. 
IC IDs are completely static features that provide excellent accounting mecha-
nisms, but essentially have no security features. A great conceptual step forward 
was achieved by [1] who introduced the notion of PUF. Their initial targeted PUF 
platform was an optical coherence system. A significant practical step to enable in-
stantaneous and widespread application of PUF concept was proposal of Devadas et 
al. who leverage silicon MV for this task [2, 22, 23, 11, 9]. In addition, they devel-
oped a set of PUF architectures and a suite of PUF-based security protocols. These 
works motivated several silicon PUFs that use various mechanisms to extract a secret 
[24, 25, 26]. Recently, by exploring the relationships between PUF-based IDs and 
functionality of the pertinent IC, researchers were able to create a comprehensive and 
powerful system of digital rights management protocols, including remote IC enabling 
and disabling and passive hardware metering [10, 27]. Interestingly, the application 
domain of PUFs is much larger; They can be powerful candidates for creation of a 
new generation of security and cryptographical protocols that are intrinsically more 
resilient against physical and side channel attacks [5]. This wide range of PUF ap-
plications has one ramification: significantly more stringent operation and security 
requirements. There are also conceptually sharply different mechanisms, one that use 
small scale reconfigurability, to associate unique IDs to each IC of a specific design 
[28, 29]. 
Unfortunately, the current generation of MV-based PUFs often is subject to sig-
nificant security vulnerabilities. Recently, we have demonstrated surprisingly simple 
ways to reverse engineer and even emulate several PUF classes as well as their suscep-
tibility to other types of attacks including (statistical) guessing and induced instability 
[12]. Our primary research objective in this paper is to demonstrate that reconfigura-
bility may serve as a principal component of techniques for PUF fortification against 
8 
vulnerabilities. 
Field programmable gate arrays are by far the most popular and practical re-
configurable computing platform [30]. The impact and techniques to address MV in 
FPGA recently attracted a great deal of attention [31, 32, 33, 34, 35, 36, 37]. Sev-
eral class of PUFs for static ID creation including SRAM and Butterfly PUFs were 
introduced and implemented on FPGAs in Philips Research Lab in Europe [37, 36]. 
Important conceptual and positional FPGA security references include [38, 4, 39]. 
An excellent collection of security and intellectual property protection papers can 
be accessed at http://www.cl.cam.ac.uk/ sd410/fpgasec/. Some more recent papers 
include [31, 32, 33, 34, 35]. 
Silicon manufacturing is a widely studied topic in many areas of computer-aided 
design. A recent excellent survey on CMOS MV is [40]. There are two set of tech-
niques for gate level characterization. The first one employs direct wafer microscopic 
measurements [41]. The other set of techniques use nondestructive indirect power 
and delay measurements and sophisticated techniques for solving systems of overcon-
strainted system of linear equation in presence of noisy data [42, 43]. 
Our primary research objective in this paper and our earlier conference manuscript 
[44] is to demonstrate that reconfigurability may serve as a principal component of 
techniques for PUF fortification against vulnerabilities. 
CHAPTER 4 
Vulnerabilities 
The PUF vulnerabilities are discussed by presenting attacks. The possible attacks 
are as follows: 
4.1 Reverse engineering 
The reverse engineering attacks aim at estimating component-wise characteristics of 
the system (e.g., gate delays), so that the adversary could either clone the system or 
develop a software counterfeit for the PUF. Since cloning the PUFs is technologically 
infeasible, the attacker's objective is focused on soft-modeling the structure's behav-
ior. In an effective reverse engineering attempt, the adversary models the system of 
N components in polynomial time with respect to N. This is because by linearly 
increasing N, one can easily provide countermeasures against the reverse-engineering 
attacks that have an exponential complexity. 
4.1.1 Linear P U F 
Let us briefly show the reverse engineering attack on the delay based PUF shown in 
Figure 2.2 (ignoring the added FF arbiter). Figure 2.1 (b) shows that each switch 
can be represented by four delays; dy, i,j = 0,1, where i/j denote the switch in-
10 
put/ output port indices respectively. However, it can be easily deduced that this 
model contains significant redundancy and the only important parameter in defining 
a switch's effect is the delay difference between its following top and bottom path seg-
ments. One can eliminate the redundancy and combine the series switches by lumping 
their delays to abstract thr representation of each switch using only one parameter 
shown as 5s in Figure 2.1 (b). We refer to 5 as the (differential) path segment delay. 
Thus, a linear PUF with N switches can be fully abstracted using N +1 parameters. 
The parallel PUF can be easily reverse engineered using a linear number of CRPs 
and forming a system of linear inequalities. The system can be solved by linear 
programming in order to find the (differential) path segment delays (<5's). For each 
challenge input vector (ci[Z],... ,CN[1]) used in l-th. measurement and the correspond-
ing response bit r[l], one can form an inequality: 
N
 r[l]=0 
VJ, . y V - i y ^ ^ + w ^ o, (4.i) 
U rW=l 
where pj(.) is the result of the transformation T on challenges as defined in Equation 
4.2. 
pj(ci,...,cjV) = p?(c) = c i e c j + i e . . . 0 C j ) for i<j (4.2) 
The direction of inequality in 4.1 is determined by the PUF response to the l-th 
challenge vector. In presence of measurement errors, an error term e[l] is added to 
the left side of each term in Equation 4.1. We formulate a linear program (LP) where 
the set of inequalities in Equation 4.1 are the constraints and the objective function 
is to minimize a norm of error over L measurements, e.g., minj]) /=1 \e[l]\. 
In order to minimize the absolute values, the objective function optimization can 
be written as a linear system, min^= 1e[Z] with added linear sign constraints (e.g., 
e[l}>0). 
11 
100 
95 
? 90 
S 
8 85 
4 
o> 
c 
1 80 
o 
75 
70 
100 500 1000 1500 2O00 2500 3000 
Number of Measured CRPs 
Figure 4.1: Evaluation of accuracy distributions versus the number of measurements 
obtained for a collection of 50 chips. 
We evaluated our reverse engineering approach using a set of CRPs. The box 
plot in Figure 4.1 shows the modeling accuracy in percentage versus the number of 
measurements (CRPs) for a population of 50 chips. We see that by using only 3000 
CRPs, the adversary can model the PUF with 99% accuracy. The test set in our 
experiment contains 10000 CRPs. 
We also test the susceptibility to reverse engineering in presence of measurement 
or arbiter errors. In the experiment we use 3000 CRPs, and randomly inject errors 
in the response measurements. Figure 4.2 shows the model accuracy versus different 
degrees of measurement error for a population of 50 PUFs. To improve resiliency 
against measurement errors, we used the maximum likelihood approach. For example, 
for a Gaussian delay distribution, the likelihood function would have a quadratic form 
which changes the LP to a convex programming problem. The improved results are 
also shown in Figure 4.2. 
. 1 1 1 1 1—1—1—1—1— 
• J 1 
• A 1 . 
i 
i 
i 
1 
i — i — i — i — i — i — i — i — i — i — i r J ~ I — i — i — i — i — i — i — r 
.
 + + 4 
-
-
12 
i i i i i i i 
A 
i i i—i—i—i i i i i i i i i i i i i i 
0.9 
0.8 
g 
§ 0.7 
§0.6 
0.5 
0.4 
! ! ' ! T • . . • . • • i • i L . 1 1 ¥ l 
I 
I 
X 
I 
A ! i * x ] ; T T T ' i « i i 
| , l , - - | -
v i uvu 
x i U i I ' l l 
T T T T 
I T .' I ' ' 
I I 
i ! 1 1 1 x 1 1 I T , i y ^ v i i 
..
 l l
 i 1 i i 
i n ; i ; i i T l 
1 
T 1 1 
1 i 
Linear OF -
Quadratic O F -
0.3 ' i i i i i i i i i - j i i i i i i i i i i i i i i i i i i i i 
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55 1.65 1.75 1.85 1.95 2 
Measurement Error (%) 
Figure 4.2: Modeling accuracy distributions versus the measurement errors. Distri-
butions are obtained for a collection of 50 chips. 
4.1.2 Feed-forward P U F 
To prevent reverse engineering, Lee et al. [11] suggest adding nonlinearities to the 
circuit. They insert feed-forward arbiters (FFA) in the path such that the added 
arbiters provide the challenge to a selector as shown in an example in Figure 4.3. The 
A* arbiter's inputs are coming from stage K. The output of A* is fed-forward to the 
K + K'-th. selector challenge bit. 
•K • ^ IC— • ~*-(N-K-K')-» 
JI—I ' . " "" .1 
Rising 
Cl 
T 
5K+ 
CK 
5* K+1 8K+I 
CK+1 • • • CK+K' 
§K+I SN+1 
Response 
bit 
CK+IOH • • • CN 
Figure 4.3: The feed-forward PUF architecture. The arbiter (A*) introduces nonlin-
earity. 
We reverse engineer FFAs in the following way. If we denote the total path delay 
difference incurred by the signal till the K + K' switch with A, then 
13 
K-\-K K±IC< 
SK+K'+I- (4.3) 
i=K+2 
The delay in the segment between the switch K and switch K +1 is broken down 
into two parts, 5M+i and <^+ 1 and therefore the PUF has one more parameter than 
linear PUF. For the sake of simplicity, the measurement index I (previously defined for 
Equation 4.1) is removed. The feed-forward arbiter's result, CK+K', provides another 
inequality 
K CK+K'=° 
X ) ( - l ) ^ i + fe+i $ 0- (4-4) 
i = l CK+K'=1 
We also use the following identity that can be directly derived from the definition 
ofpj 
K+K' _
 nK+K'-l m - nK ffi nK+K' 
Pi — Pi ® CK+K' — Pi tb PK+1 
pfSpZlf-'QcK+K, (4.5) 
Observing that (-l)a®6 = (—l)a(—1)^, Equation 4.3 is further simplified to 
A = ( - l ) ^ ' - 1 | A / i 7 . s t | (4.6) 
+ (-iyK+K> ( ( - 1 ) ^ _ 1 ^ + 1 + Amime) + VK+K'+I-
Where p^\^ l is the parity (XOR results) of the challenges to middle stage. 
Afirst, Amiddie , and Aiast are the first, middle, and last stage differential delays 
14 
computed respectively as, 
K 
A/irst = J ] ( - l ) ^ + < +^1, (4.7) 
i= l 
K+K'-l
 f 
^middle = Y^ i-iy^^Si + SK+K', (4.8) 
i=K+2 
N 
&last= Y, (-^Vi + VN+l. (4.9) 
i=K+K'+l 
The total delay can now be expressed as 
Atotai = A x (-lfK+K'+i + slast. (4.10) 
We complete reverse engineering of FF PUF by using the following observations, 
(i) By fixing the selector bits of the switches in first stage (K first switches), we 
estimate the delays of switch elements in the middle and last stage by solving an 
LP problem similar to the one in Section 4.1.1. However, we need to make two 
assumptions on the FF arbiter output and the LP would have two solutions. The 
solutions obtained by using these two assumptions only differ in sign which can be 
easily resolved later. 
(ii) Knowing the delays of switches in the middle and last stages (with a sign ambiguity 
for the delays of the middle stage) and considering the PUF formulation (Equation 
4.10), we set the challenges to the middle and last segments in a way so that any 
transition of the final arbiter is closely linked to the transitions of the FF arbiter 
output. This can be realized by choosing a challenge configuration that yield a large 
delay difference for the middle stage ( A m j ^ e > > 0), while causing a negligible delay 
difference at the last stage (Ajast « 0). 
(iii) While the challenge bits to the middle and last stages are fixed to the appropriate 
15 
configuration found in (ii), complementary challenges are applied to the first stage 
switches and transitions of PUF responses (final arbiter response transitions) are 
recorded. Any time the final arbiter response flips, we obtain a constraint for the LP. 
Since we are concerned with transitions rather than absolute output values, we need to 
/address two LP problems by trying two different bit assignments. However, the delay 
values obtained from the incorrect solutions can be easily rejected by cross-validating 
the results on a few new CRPs. 
(iv) Using the estimated delays of the first stage, we can eliminate the ambiguity in 
the sign of the middle-stage delay difference. Therefore the delays of all switches can 
be estimated successfully. 
In our experiment, the PUF has the structure presented in Figure 4.3, where K = 
24, K' = 20 and N = 64. After reverse engineering the PUF, we validate our model 
using 10,000 CRPs and measure the accuracy of the modeled PUF. Figure 4.4 shows 
the model accuracy versus different number of measurements for 20 PUFs. The first 
15,000 CRPs (measurements) are used to estimate the middle and last stage switch 
delays and the rest are used to estimate the first stage switch delays. Also note that 
for step (ii), finding the challenge configuration to the middle and last stages that 
yields the largest and smallest possible delay differences is, in general, an NP-complete 
problem. But we do not need an exact solution and a rough approximation that gives 
a very small (large) delay is sufficient. For example, we can try 1000 challenge bit 
combinations and find the one that gives the minimum delay difference. 
4.2 Emulation 
The goal of this attack is to emulate the PUF by effectively storing the CRPs in a 
memory. If the number of CRPs grows exponentially with respect to the number 
of inputs, the required memory would be very large, making the full CRP storage 
16 
100 
98 
96 
£• 94 > o 
E 
g 92 
o 
< 
g) 90 
| 88 
86 
84 
82 
1.5 2 2.5 3 
Number of Measured CRPs
 x1Q-> 
Figure 4.4: Modeling accuracy versus the number of measurements obtained for a 
collection of 20 FF PUFs. 
infeasible. Instead, the attackers attempt at exploiting the predictability of the CRPs 
(lack of randomness) to achieve high degrees of compressibility and reduce the storage 
demands drastically. For example, if a group of challenges that differ only in their 
first two bits produce the same response, there is no need that the first two bits are 
stored (Figure 4.5). A closely relevant attack would be to guess the responses to a 
given challenge with high probability by performing statistical analysis on the PUF 
responses. The data from the statistical analysis could also be used to efferently 
emulate the PUF. 
Challenge 
10100111 
11100111 
00100111 
01100111 
XX100111 
R 
Figure 4.5: An example of emulation attack. 
»ffifr«ft—iJkmllffi »iii«lMn 
» Results from Multiple PUFs 
Fit 
- a - Single FF PUF 
17 
4.3 Man-iii-the-middle 
During the authentication process, CRPs stored in the database on a server are 
compared with those obtained from the PUF. In case there are a limited number of 
CRPs stored for each PUF in the database, the adversary can impersonate the PUF, 
if he can build a copy of the data base content. The man-in-the-middle attack involves 
eavesdropping the communication between the PUF and authentication server and 
recording the responses to the attempted challenges to later impersonate [45]. 
4.4 Reconfiguration 
While FPGA provides a versatile platform for implementing the PUF, the possibility 
of reconfiguring the FPGA by an unauthorized party poses a threat. For instance, if 
an adversary knows how to read the configuration bit stream and configure the FPGA, 
then he can gain full knowledge of the circuit structure. The attacker may reconfigure 
the FPGA to remove the nonlinearities or other added transformation circuitry at the 
input or output to facilitate reverse engineering by modeling the delays of the linear 
parts. 
4.5 Collision of Responses 
Collision of responses happenes when a pair of PUFs generate same responses to 
given challenges. Ideally, if the PUF responses come from uniform distributions, the 
probability of collision will be only a function of the number of response bits. But 
in reality the PUF structure can distort the uniformity of responses and introduce a 
bias. 
We test the PUF to obtain the collision probability for different PUF structures. 
For each given challenge, the PUF responses on various chips must form a uniform dis-
18 
tribution to yield the minimum collision probability. The nonlinearity introduced by 
the FF arbiter distorts the uniformity of output responses and causes higher collision 
probability, even in presence of completely independent delays and perfect arbiters. 
Depending on the PUF circuit structure and the location of nonlinearity, there is a 
lower bound on collision probability. 
For a parallel PUF that consists of M response bits (M rows), the minimum 
collision probability is ^ . For example, if the PUF has 8 output bits, then the 
collision probability is ^g. Figure 4.6 shows the collision probability for the FF PUF 
normalized to that of the parallel PUF (z-axis) vs. feed-forward arbiter input /output 
locations. The collision probability will be at least 65 times greater than the parallel 
PUF when the FF arbiter is placed at (input = 59, output — 64) location (i.e. close to 
the final arbiter). One way to compensate for this increase in collision probability is to 
increase the number of responses (PUF rows) to lower the overall collision probability. 
Figure 4.6: The increase in the collision probability compared to the parallel PUF vs. 
feed-forward arbiter's input (rc-axis) and output (?/-axis) locations. 
CHAPTER 5 
Countermeasures and Safeguards 
The vulnerabilites and potential attacks presented in Chapter 4 can be alleviated by 
taking a number of safeguards. In Section 5.1 we review the countermeasures proposed 
earlier. In Section 5.2 we introduce new safeguards that are more comprehensive than 
the presently available approaches. 
5.1 State-of-the-art 
To protect PUFs against reverse engineering and emulation two lines of methods were 
mainly used: (i) introduction of non-linearities, and (ii) challenge-response hashing 
[11,3]. The proposed non-linearity based methods are typically of two types: (a) feed 
forwarding and (b) MAX (MIN) operations. 
As we discussed in previous chapters, a feed forward non-linearity is introduced by 
inserting an internal arbiter that compare the signal delays at a certain point in the 
circuit [2]. The internal arbiter then forwards the arbitration results to a switch that 
is located ahead on the delay path. The feed forward arbiter introduces non-linear 
behavior that complicates the reverse engineering process. - However, our studies in 
Chapter 4 reveal that non-linear PUF structures with a small number of feed forward 
arbiters are still prone to reverse engineering attacks [12]. To safeguard against this 
20 
attack, multiple interleaved feed forward arbiters must be used. A major drawback of 
this protection method is that adding the non-linearities skews the bit probabilities. 
Thus, the resulting non-linear PUF is more vulnerable to statistical modelling and 
emulation attacks. The added internal arbiters also increase the circuit response's 
instability. 
The use of MAX (MIN) functions was first proposed by [11]. MAX (MIN) oper-
ations are carried out on delay values by using AND (OR) logic gates in the PUF 
structure. If two rising edge signals with delays d\ and di arrive at a two-input AND 
(OR) logic, then the delay of the output signal is d0 = MAX(di,d2)- AND (OR) 
logics are inserted in the parallel PUF circuit and are connected to the bottom and 
top paths in between or at the end of the structure. Our studies show addition of 
this type of non-linearity also renders the circuit more prone to emulation attacks. 
As another countermeasure, the use of cryptographic hash function was proposed 
by [3]. Hashing is performed on both challenges and responses of the PUF. To estimate 
the PUF model parameters, the adversary needs direct responses of the PUF arbiters 
for known challenges. However the use of a one-way output hash function inserted 
immediately after the arbiters will make the responses obfuscated and obscure. To 
discover the response, one needs to invert a one-way function which is known to 
be a hard problem. This process should also be repeated until sufficient number of 
responses are collected. An input hash function is attached to the PUF challenges 
to prevent bit-level control of the challenges. Due to the confusion and diffusion 
properties of hash functions, the final system is safe against emulation attacks. 
Table 5.1: Latency and area of common hash functions 
Algorithm 
SHA-256 
SHA1 
MD5 
MD4 
Chip area 
10,868 
8,120 
8,400 
7,350 
Clock cycles 
1,128 
1,274 
612 
456 
A drawback of the hash functions is that they incur significant hardware area 
21 
and power overheads. Besides, the PUF needs to evaluate multiple clock cycles to 
prepare a standard size input message block to deliver to the hash function (MD5 
can accept variable input message size). The input and output hash evaluations by 
themselves take many clock cycles, imposing a large overall latency on the system. 
I 
fTable 5.1 shows latency (in cycles) and area (in gate equivalents) of commonly used 
hash functions [46]. 
5.2 New safeguarding methods 
In this section, we first present a mechanism to characterize the PUF components for 
a linear structure. We discuss how the characterization can help in achieving a higher 
robustness in presence of variations in operational conditions. In addition, using 
characterization one can exponentially compress the challenge-response database and 
provide a diagnostic tool for calibration and structural modifications. Next, we pro-
pose a secure PUF architecture which adds a set of logic input and output networks 
to the parallel PUF to secure the PUF against various attacks. 
5.2.1 Testing and Characterization 
There are at least three objectives for PUF characterization: (i) The measured delays 
and parameters can be used to achieve a higher robustness against variations in 
operational conditions and environment. This is accomplished by estimating the 
detection error probability for a given challenge, (ii) Switch delay values fully describe 
the PUF behavior and could be stored instead of challenges and responses, (iii) The 
delay values can be used to perform diagnosis, calibration and structural modifications 
for better performance. A similar test circuit to the one used in this paper was 
suggested by [47] as a BIST structure to estimate the delays of any combinatorial 
logic on FGPAs. 
22 
Launch 
Flip Flop 
Sample 
Flip Flops 
PUF-under-test 
Stimuli 
Generator 
Test Clock 
Generator 
Challenge 
Generator 
hffll 
mm 
S 
Capture 
Flip Flops 
•Ml 
Figure 5.1: Delay characterization circuitry 
The suggested delay characterization circuit consists of three flip flops: launch, 
sample and capture. A transition is invoked by the launch flip flop at the combinatorial 
circuit under test (CUT) input. The output of the CUT is sampled t seconds later. 
The sampled value is compared to the real value by an XOR logic and the result is 
recorded by the capture flip flop. If t is smaller (larger) than the CUT delay, then 
the sampling occurs before (after) the transition appears at the output, and thus 
the sampled value would be different from (same as) the input test signal. Note 
that the sample FF has certain setup and hold times which make the FF unable to 
sense smaller delay differences. Violation of setup and hold times places the FF into 
a metastable state and causes non-deterministic outputs. If t is swept by varying 
the clock frequency from fi to fu (// < 1/t < fu) with steps of A / , at some point 
the speed of CUT would be almost equal to the clocking speed. By counting the 
number of times the capture flip flop records a sampling error and then by forming a 
histogram, it is possible to accurately find the CUT delay. 
Figure 5.2 depicts the probability that the sample FF outputs " 1" versus the clock 
frequency. The symbol fc marks the frequency at which the sample FF produces 
totally random outputs; l / / c is in fact equal to the CUT delay. In other words the 
clock edge and the signal edge coming from the CUT arrive at FF at the same time. 
The transition slope in Figure 5.2 implies the speed of the sample flip flop. For 
example, flip flop 1 has smaller setup/hold times than FFs 2 and 3. 
The combinational test circuit used here is a PUF. The procedure is repeated N 
23 
Figure 5.2: Arbiter characteristics. 
times for different challenge configurations. Then, a system of linear equations is 
solved to find each switch delay. To save time and efforts, the range of the scanned 
frequency can be adaptively adjusted in each iteration to scan a smaller window 
around the target frequency. Also instead of linearly sweeping the frequency to spot 
the transition point, a binary search algorithm can be used. If the frequency sweep 
range is partitioned into CF steps, then the binary search would find the transition 
point in log{CF) steps [48]. Most advanced FPGAs, such as Xilinx Virtex family, 
provide Digital Clock Management (DCM) and Phase Locked Loops (PLL) blocks 
which enable building clock synthesizer with on-chip resources. This is useful if a 
stand-alone built-in system needs to be designed. In Chapter 7, we will implement 
the characterization circuit on Xilinx Virtex 5 FPGAs and present the measured 
results. 
After solving Equation 5.1 to find the switch delays ( A = {5\, 52,..., £JV+I} ) and 
measuring the probabilistic characteristic of the sample flip flops g(.) (see Figure 5.2), 
the FPGA is reconfigured so that the test circuitry is removed. One of the sample flip 
flops will be used as the PUF arbiter. Since the arbiter response only depends on the 
input delay differences rather than the absolute values, the flip flop characteristic, 
g(.) is transformed to represent the arbiter characteristic: h(x) = 9{xJt+i)- The 
estimated values for A and h(.) completely characterize the PUF and are stored in a 
24 
database to be later used for identification and authentication purposes. 
5.2.2 Response Error Prediction 
Small delay differences at the arbiter inputs can cause metastability and inaccuracy 
of the response. Metastable arbiters are extremely sensitive to changes in operational 
conditions such as temperature variations and electromagnetic noise. Error correct-
ing codes with syndrome decoding have been proposed to correct for such errors [3]. 
However, since the syndrome is public information, it can reveal some information 
about the responses and undermine the security of the PUF. In addition, correct-
ing multiple bits requires a complicated decoding circuitry with large latency and 
hardware overheads. 
To determine to what extent the responses are affected by metastability of arbiters, 
we propose a method that assigns a level of confidence to each response using the 
parameters obtained during the characterization step. 
Let us represent the challenge vector by C = {ci,C2, ...,CJV}. We define d as the 
delay difference (the top path delay minus the bottom path delay) in response to C. 
Now, d can be written as 
d= Ei i ( - l ) P i ^ + W = [P 1J.A-1, (5.1) 
where P = { ( - l ) P l , ( - l ) p 2 , - , (-1)PJV} and pi = ci® ci+1 0 ... 0 cN. The goal is to 
estimate the probability of false negative detection error, i.e., Prob(Hi \ H0) for a 
given C where the hypotheses Hi and Ho are defined in Equation 5.2: 
H0:PUF = PUF' (5.2) 
Hx : PUF ^ PUF'. 
25 
In fact, for a given delay difference d caused by the challenge C, h(d) (or 1 — 
h(d)) is the probability that the arbiter produces a zero (or one) output while the 
delay difference at its inputs is greater (smaller) than zero, and h(.) is the arbiter 
characteristic obtained by the test circuit explained in Section 5.2.1. We define the 
probability of false negative error {Perrar) as the probability that at least one of the 
PUP responses to K challenges has an error, therefore: 
K 
Perrar = Prob^ | H0) = J j N C 1 ~ M * ) ) + 0- ~ <*)*&)] (5-3) 
i = l 
<*i = < 
1, di > 0 
(5.4) 
0, di<0. 
In writing Equation 5.3, we assume that the delays values caused by the K chal-
lenges are independent. Using this method, the delays resulting from a number of 
randomly selected challenges can be calculated by Equation 5.1 (assuming the switch 
delays are available from the characterization step). Then, the probability of false 
negative error is estimated using Equation 5.3. To ensure robustness against ar-
biter metastabilities, the responses with high estimated probability of error must be 
ignored. 
5.2.3 Challenge-response compression 
The characterization scheme allows an effective way to compress the challenge re-
sponse pairs. A PUF with N switches in fact performs a transformation from N 
real numbers to 2^ binary numbers of length N + 1. Therefore, by measuring the 
N parameters (that are in fact the path segment delay differences), one can fully de-
scribe the challenge-response space. In this way a huge reduction in database storage 
requirements can be achieved. Also it enables one-time pad encryption for large N 
26 
values (e.g., N > 128). The idea of compressing the CRPs by collecting responses 
and performing reverse engineering is suggested in [49]. However in this method, 
arbiter errors can cause large errors in estimating the switch delays [12]. We attempt 
to directly measuring the delays of the PUF before the arbiter. Our method also 
allows for arbiter characterization and stores the characteristic as part of the PUF 
parameters. 
5.2.4 Diagnosis 
The data obtained in the test phase can be used to diagnose and analyze the PUF. 
Small variations in delays, long setup/hold times for arbiters, large bias caused by 
systematic effects, or non-symmetric routing may diminish the PUF's performance. 
Adding extra delay elements or switch to the PUF increases the total delay variation. 
Rerouting the connections and/or relocating the PUF can be utilized to overcome 
the delay bias. Also noisy flip flops or those with large setup/hold times should be 
avoided being uses as arbiters. 
5.2.5 Secure P U F Architecture 
In this section, we introduce a secure and robust PUF structure. The proposed 
PUF as shown in Figure 5.3 consists of the four fundamental building blocks: (i) 
input (logic) network, (ii) output logic network, (iii) wire interconnect network, and 
(iv) parallel PUFs. After testing and characterizing each linear PUF in the parallel 
structure, the FPGA is reconfigured to integrate and attach these blocks to the core 
parallel PUFs. 
5.2.5.1 Input network 
We design the input network attached to the parallel PUF (see the dashed box in 
Figure 5.3) to satisfy Strict Avalanche Criterion (SAC) for a parallel PUF circuit. 
27 
Output 
— Interconnect Network 
Figure 5.3: The general architecture of the proposed Secure PUF. 
A function is said to satisfy SAC if, whenever a single input bit is complemented, 
each of the output bits changes with a probability of one half. In Section 5.2.5.3 
we will show how to bind multiple rows of the resulting structure to construct an 
N input, Q output PUF structure that satisfies SAC. When introducing the output 
network, we demonstrate that the SAC property is required to achieve the maximum 
security. Before discussing the design steps of the input network, we first discuss the 
input/output characteristics of the parallel PUF. 
As stated earlier, the PUF behavior can be represented by Equation 4.1. Let us 
assume the differential delay values (6) in Equation 4.1 are independent and identically 
distributed. For simplification and without loss of generality, we assume the random 
variables have Gaussian distributions with zero mean, i.e., 8j ~ N (0, a2). Our goal 
is to find the probability that the PUF output flips given that a challenge bit in the 
PUF input is flipped, i.e., Prob{~ 0\ ~ c^}. Any change in the sign of the summation 
relates to a change in the output (response) bit value. Whenever a challenge bit value 
flips, some of the terms in Equation 4.1 change their sign (as a result of a change in 
the corresponding p values). Let us denote the set containing the indices of p's that 
flip (do not flip) as result of a flip in the k-th. challenge bit by T^ (A&). Note that 
T and A partition the index set, Q, = {1,2, ...,iV}, where N is the total number of 
Input 
• • • • : : : ; • • 
....
 9 
... • 
•••>$• | 
* 
input 
Net. (G) 
,. r.T?».^.^.,. 
Input 
Net. (G) 
«-«• 
- * ? - • » 
« * * 
PUF 
V^-rri's* 
PUF 
«:: 
* 
#:• • : 
4S 
• f f * ¥ 
• • 
• ' • ' • 
<*-% % 
• 
• • 
• • 
• •• • 
• 
• 
• 
Input 
Net. (G) 4r+ PUF 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Output
 Netw
ork
 (Z) 
28 
switches. 
*erfc 
wk = ^ ( - 1 ) P ^ + < W (5.5) 
If the absolute value of the sum of the terms whose indices are in Tk (i.e., | Vk | in 
Equation 5.5) is greater than the absolute value of the sum of terms whose indices 
are in A*, (i.e., | Wk | in Equation 5.5), then the response bit flips whenever ck flips. 
We define a new random variable Xk which has a value one if the output flips and 
zero otherwise, i.e., 
Prob{Xk = 1} = Prob{~ 0\ ~ ck} (5.6) 
then, 
( l , \Vk\>\Wk\ 
Xk=l (5.7) 
10, Otherwise. 
It is desired that E{Xk} = 0.5 for k = 1,2, ...,JV. The expectation is over all 
PUF realizations. Recalling that the sum of Gaussian random variables forms a new 
Gaussian, if Ui ~ N(^j, erf ) and U — Y^Ut, then U ~ DsT(^ A*i, ^2ai )• Therefore, 
V and W can be viewed as realizations of Gaussian variables given by 
Wk ~ N ( 0 , ( | A f c | + l ) x a 2 ) 
Vk ~ . tt (0, | i y | xa2) . (5.8) 
where |.| denotes the set cardinality. If | Tk \ = | A^ | + 1, then V and W will be 
identical and independent Gaussian random variables. Also | Tk \ + | A^ | = N. 
Therefore, if 
N + 1 
| r * | = ^ , (5.9) 
29 
then E{Xk} = 0.5. Thus, if ^ ^ (almost half) of p's in Equation 4.1 flip as a result 
of a flip in fc-th challenge bit (c^), then the output of the PUF would flip with a 
probability 0.5. The result is in accordance with our initial intuitive observation. 
We now verify if this property holds in the parallel PUF structure. The p's in 
Equation 4.1 are related to the challenges by the transformation T denned in Equation 
4.2, i.e., P = T(C). It can be seen that a flip in c& causes a flip in pj, where j < k. 
Thus, | Tfc | = k. For example, if a flip in CJV happens, all of the p's flip as a result. 
Hence, Equation 5.9 is not satisfied for the parallel PUF structure. We define a 
transformation G(.) on challenges that combined with T meets the criterion set by 
Equation 5.9. 
Objective: Find G(.) so that P = T(G(C)) satisfies |Tfe | = ^±± for all k. 
Method: (1) Before finding the proper transformation, we make an observation. 
Consider two challenge bits, c^  and ck+N±i in the parallel PUF that flip in succession. 
The first flip (c^) causes pj for all j < k to flip, and the second challenge flip (cfc, N+I ) 
causes pj for all j < k + ^~ to flip. The p's that flip twice return to the first value 
and do not flip in effect. Therefore, flipping Ck and cfe+jv±i at the same time causes pj 
for all k < j < k + ^y^ to flip, hence satisfies the criterion set by Equation 5.9. The 
observation implies a constraint on the challenges. Whenever a challenge bit flips, 
another challenge bit at ^~ selectors apart must flip as well to guarantee SAC. Note 
that N must be an odd integer to yield integer challenge indices. However, it is easy 
to prove that it is infeasible to impose the derived constraint on the PUF challenges, 
although high quality approximations can be made. We design an input network that 
transforms the input challenges of the PUF and imposes constraints on the toPUF 
challenges. 
(2) Find a transformation C = G(D), G : {0,1}M -> {0,1}^, so that any bit flip 
in the input causes two output bits at approximately ^ ^ locations apart to flip. We 
use two approximations to realize the proposed concept. One approximation can be 
30 
implemented using a wire-only network, while the other one can be done using XOR 
logic. 
(i) Wire-only network: for N an even integer and M = y , G performs the following 
transformation: 
N G: Ci = ci+K=di, for i = 1,2,...,—. (5.10) 
The wire network connects the challenge bits of the PUF that are located y apart. 
However, the input dimension of the wire network is half of the PUF. Hence, one needs 
to use twice the number of switches to achieve the same number of PUF inputs. 
(ii) XOR network: for an even integer N and M = N, we define the transformation 
G as follows: 
CN+i+i = GL for i = l 
2 
d+i = di®di+i, for i = 1,3,5, ...,N— 1 
CN+J+2 = di®di+i, for i = 2,4,6,..., N-2 (5.11) 
Unlike the wire-only network, the XOR network achieves a one-to-one mapping. How-
ever, an adversary with full knowledge of the circuit structure can apply the inverse 
transformation to make the input network ineffective. We alleviate this issue later by 
introducing a wire interconnecting scheme that physically binds the inputs of multiple 
PUF rows. 
DN DM-! DN .2 D Z D I 
o 
C|M/2 C N C(N-2)/2 C I C(N+4)/2 C(N+2)/2 
Figure 5.4: The input network realization using XOR logic. 
In addition to the expectation of X^ being equal to 0.5, it is desired that the Xk 
31 
has as small variance as possible. Smaller variation guarantees that a larger number 
of PUFs satisfy the SAC property. The variance of Xk is related to the variances of 
W and V in Equation 5.7, that are themselves related to the number of switches and 
the variance of 5. The variance of S is determined by the technology and the amount 
of process variations. Therefore, one can achieve a smaller variance for Xk by adding 
to the number of switches or incorporating multiple rows of the same structure as 
explained in Section 5.2.5.2. 
5.2.5.2 Output network 
We introduce an XOR-based output network structure (see Figure 5.3) which achieves 
(i) fortification against reverse engineering attacks, and (ii) smaller deviation from the 
SAC on each PUF by combining multiple rows of parallel PUFs with the transformed 
challenges. 
The output network performs a mapping denoted by Z(.) from the PUF arbiter 
responses, R, to the output, O. The mapping is defined as O = Z(R), Z : {0, l}*3 —• 
{0,1}«', Q' < Q, and 
°i = © r(j+s+i) mod Q for j = 1,2,..., Q' (5.12) 
i=l,...,x 
where 0 denotes the parity generator function and s indicates shifting step. The 
transformation calculates the parity value for sets of x adjacent PUF arbiter responses 
where each set starting point is circularly shifted by s bits with respect to each other. 
The transformation can be parameterized by s (the shifting step) and x (the parity 
input size). We will discuss later how these parameters govern a trade-off among 
security, overhead, and randomness properties. 
(i) The proposed transformation can hinder the efforts to reverse engineer the 
PUF in the following way. As stated in Section 4.1.1, to reverse engineer a linear 
. _ _ 32 
PUF structure and estimate the delays of switches, the adversary needs to collect a 
set of challenge-responses from the PUF and solve a system of inequalities. 
Suppose that the responses of Q parallel PUFs are mapped to a Q'-bit output by 
the transformation Z(.). There are 2Q~Q' possible inputs that map to a given output. 
Therefore, the adversary is faced with solving an ambiguity to discover the real PUF 
response. The number of assumptions grows exponentially, if he/she is not able to 
reject some of them at each step. It can be shown that the problem has exponential 
complexity of order 0(2^Q"Q'^ c) with respect to the number of ambiguities 2^_<3'^ 
and the number of CRPs needed to estimates the PUF switch delays Nc- [12] show 
that to reverse engineer a linear parallel PUF having 64 switches with accuracy of 
over 98%, a minimum number of 2000 CRPs (Nc = 2000) are required. Then for 
Q — Q' — 1, the complexity of reverse engineering the secure PUF would be O(2200°). 
Nevertheless, if the attacker can control (the transition of) the PUF arbiter responses, 
then it would be possible to reduce the number of assumptions by performing a 
differential attack. Let us illustrate the problem using an example. Consider a trivial 
case of Z where Q = 5, Q' = 4, and every four adjacent response bits are XOR-ed to 
produce the output, i.e., s = 1, x = 4 (see Figure 5.6). 
1:10100- ^00100 
X S / O : 0101—•HIO 
l':01101'- *11101 
Figure 5.5: Example: the difficulty of inverting the output network. 
rs r4 r3 r2 n 
WW 
04 03 02 01 
Figure 5.6: An example of output network for Q = 5, Q' =4, x = 4 and s = 1. 
Now imagine a transition in the output occurs from 0101 to 1110. As shown in 
33 
Figure 5.5, there are four possible transition hypotheses about the Z inputs depicted 
by arrows. If we know only the first bit (or only one bit) of the input has caused 
such transition in the output then we can reject two hypotheses shown in dashed 
arrows. Also by associating probabilities with transitions and ranking assumptions 
I accordingly, one can guess the PUF responses. 
Thus, Z(.) by itself does not guarantee significant resiliency against reverse engi-
neering. To achieve a high level of resiliency, it is required that the PUF response bits 
(or Z inputs) could not be deterministically controllable. The Maximum resiliency 
is obtained if PUF response bits flip with a probability of 0.5 which is equivalent to 
SAC. We will use an interconnect network that connects rows of parallel PUFs, to 
design a Q-output PUF with SAC property. 
(ii) The mixing property of XOR logic or in general the parity generator func-
tion also fortifies the PUF against emulation and statistical guessing attacks even in 
presence of outrageously large PUF switch (element) delays. For larger values of x 
in Equation 5.12, higher number of PUF rows and responses are mixed and smaller 
deviation from the transition probability of 0.5 (i.e., vax(Xk)) is achieved. 
5.2.5.3 Interconnect network 
In Section 5.2.5.1 we designed an input network that satisfied SAC for a single row 
PUF with one bit output. We design a PUF structure that consists of multiple rows 
of parallel PUFs and maps N challenge bits to Q response bits. The PUF is designed 
to satisfy SAC. The PUF is built upon an interconnect network that connects the 
challenge bits of rows of parallel PUFs (See the leftmost solid box in Figure 5.3). 
In order to satisfy the SAC, it is required and sufficient that one challenge bit on 
each row is connected to another challenge bit on a different row. A challenge bit is 
broadcasted to all PUFs, and since each PUF output flips with a probability of 0.5, 
34 
the SAC is met. The interconnection can be expressed formally as follows: 
m _ m+1 foriJeQ,, m = 1,2,...,Q - 1, (5.13) 
where c™ is the i — th challenge bit in the m-th row, Q = {1,2,..., N}, j = gm(i) and 
gm: Cl —> fi is a one-to-one permutation function. In Section 5.2.5.1, we mentioned 
that the XOR input network can be bypassed by applying the inverse transformation. 
If the inputs of PUF rows are connected in parallel (with no permutation), i.e., i = j , 
by applying the inverse transformation (G"1) all of the input networks are bypassed 
and thus, ineffective. By imposing a constraint on gm to be non-identity for all ra's 
the attacker can fully bypass only one input network and the other input networks can 
only be partially bypassed. Figure 5.7 depicts an ra-bit circular shift interconnecting 
scheme, i.e., j = gm(i) = (i + m — 1) mod Q. 
_ A — rQ rCZ ••  
n-1 Lr l *-r L r " T nr 
& & & " • • & 
1—I—f 
C2 C3 C4 1 r CN-I CN C I 
K P A H 2 
a .& & a 
1—1—1 1—1—r 
C N C I C 2 ••• C N - 3 C N - 2 C N - 1 
^ & a 
HHHHHa A - n 
1 1 1 1 1 T" 
C i C2 C3 ••• C N - 2 C N - I C N 
Figure 5.7: An m-bit circular shift interconnecting scheme that connects Q rows of 
parallel PUFs with transformed challenges. 
We thoroughly examined the secure PUF architecture that uses the proposed 
input/output network and the interconnecting method explained in Section 5.2.5. In 
the following experiments, we model each switch with four delays - two for straight 
35 
connection and two for cross connection links. We assume that the delay components 
are samples from independent identical Gaussian distributions with /u, = 0.5ns and o 
= Aps. The mean and variance conform with the 65nm technology [50]. 
For a single row parallel PUF circuit with 64 switches, we simulated the probability 
of output transition conditioned on each challenge bit transitions. In this experiment, 
we apply to the PUF 100 random 64-bit challenge vector pairs that differ only in 
the i — th bit, where i = 1,...,64, and record the percentage of times the output 
transitions. We repeat this experiment for 50 PUF circuit realizations and find the 
expectation. Figure 5.8 shows the value of -E[-Xfc] before and after applying the input 
XOR transformation (defined in Equation 5.11) on the PUF challenges. The figure 
shows that the probability of output flip conditioned on the fc-th challenge bit before 
input transformation increases monotonically from less than 0.1 to over 0.9, where k 
= 1,2,...,64. This can be intuitively viewed as the cumulative effect of switch delays in 
the parallel PUF circuit structure. Note that after applying the XOR transformation 
on the PUF challenges, the output flips with a probability close to 0.5 for a flip in 
input bits, which per se satisfies the SAC. A smaller deviation from the transition 
1 
0.9 
0.8 
0.7 
0.6 
3* 0.E 
0.4 
0.3 
0.2 
0.1 
* Before Applying Input XOR network 
— o After Applying Input XOR network 
9 , t 
»'.-
10 20 30 40 
Challenge Bit Nubmber (k) 
50 
Figure 5.8: Probability of the response flip given a flip in the k-th challenge bit for 
a single row parallel PUF. The dot (circular) markers show the probability before 
(after) transforming the challenges. 
36 
probability of 0.5 is desired for each individual PUF circuit realization. There are two 
ways to reduce such deviation: (i) by using more switches in the parallel PUF circuit 
(increasing N); (ii) by mixing the outputs of larger number of parallel PUF circuits 
(increasing x in Equation 5.12). The black solid line in Figure 5.9 indicates how 
the variance (var(Xk)) decreases as the number of switches in a single row parallel 
PUF increases from 8 to 128. For a fixed number of switches in a row, the variance 
rapidly drops as 2, 4, and 8 adjacent outputs of rows of parallel PUFs (x = 2,4,8) 
are mixed (by an x-input parity generator function). Note that the challenges of 
the parallel PUFs arranged in rows are connected by the interconnection network 
presented earlier. 
0.04 
0.035 
0.03 
0.025 
* 0.02 
S 
0.015 
0.01 
0.005 
0 
8 16 32 64 128 
PUF Length 
Figure 5.9: Deviation of transitional probabilities of individual PUF instances from 
the SAC. 
We now investigate the security of the proposed PUF against emulation attacks. 
We devised an algorithm that randomly selects a challenge vector within which it 
searches for the largest number of bits that can be represented with don't-cares. We 
set an upper-bound on the number of search efforts for each challenge and use the 
knowledge of the statistical PUF characteristics to expedite the search. For example, 
we know that the left most challenges have a lower impact in determining the parallel 
PUF output (same scenario for FF PUF), thus they can be represented with don't-
37 
cares with higher probability. 
Figure 5.10 shows the amount compression achieved for a single row parallel PUF, 
FF PUF and Secure PUF; (Q,Q',x,s) = (9,8,8,1) and 64 switches in each row (N.= 
64). The feed forward arbiter in the FF PUF compares the delays at switch 20-th and 
feeds the result to the selector of the 40-th switch. The emulation attack is performed 
on 10 PUFs of each type. The ten values are shown in box plots in Figure 5.10. Note 
that compressibility of the challenge response pairs of the secure PUF structure is 
four and five orders of magnitude smaller than a single row parallel and FF PUF 
respectively. The smaller level of compressibility corresponds to lower predictability 
of the responses. 
We also examined PUF sensitivity to very large switch delays. In general, delay 
outliers cause high predictability, high compressibility of CRPs, and facilitate building 
of statistical models. We studied the sensitivity of secure PUF and the single row 
parallel PUF structures to outliers. A fault is injected as an outlying delay of 5ns (10 
times larger than mean delay) into the 20-th switch - of the first row for the secure 
PUF. Figure 5.11 shows the expected probability of output transition for both single 
row parallel PUF and the Secure PUF with the parameters (Q,Q',x,s) = (9,8,8,1). 
The expectation is taken over 50 PUF realizations. For the parallel PUF with one 
row, the transition probability is highly distorted; flipping inputs 1 (42) to 39 (64) 
(does not) flips the output with a probability of 0.8. Such divergence from SAC 
leads to high predictability of PUF responses and facilitates emulation and statistical 
modeling attacks. However, as it can be seen in Figure 5.11, the transition probability 
of output (any of the eight PUF output bits) does not change because of the mixing 
introduced by the output network. In addition, if the PUF responses in some of the 
rows do not show significant changes and variations due to arbiter failure, arbiter 
insensitivity, or large delay biases, the effect would not be transparent at the output. 
38 
i' 
§ 
V 
I* 
1 
0 
x10 
I 
+ -
1
-
• 
• 
• 
• 
PUF with 
eight rows 
Single row 
parallel PUF 
Single row 
FT PUF 
Figure 5.10: The amount of CRP compressibility for parallel, FF and the proposed 
secure PUFs. Each box represents the results of 10 experiments on different PUF 
realizations. 
-a— PUF with a single row 
• e - PUF with eight rows 
20 30 40 50 
Challenge Bit Number (k) 
60 
Figure 5.11: Sensitivity of PUF transitional behavior to outlier switch delays. 
CHAPTER 6 
Applications 
There are many possible applications for the introduced secure PUF as pointed out 
in previous research [2, 3, 4, 5, 10]. In this chapter, we present applications that 
leverage reconfigurable platforms to provide system security. 
6.1 Configure-and-erase Method 
We introduce an FPGA-based authentication method for smart cards which limits 
the user's knowledge about the PUF circuit structure and its location on FPGA. In 
this method, the FPGA owner is identified by the unique manufacturer variability 
(MV) on his/her FPGA. The permanent placement of the PUF circuit, as in ASIC 
technology, would give the adversary unlimited access to the PUF inputs/output 
and thus make the PUF vulnerable to reverse engineering and emulation attacks. 
In the proposed method, the user is furnished with a blank FPGA and a Personal 
Identification Number (PIN). Before the FPGA is given to the user, the PUF is 
characterized using the methods described in Section 5.2.1. Thus, the switch delays 
and arbiter parameters are derived and stored. PUFs with different lengths and on 
various locations can be implemented and characterized. Therefore, each database 
entry could consist of multiple fields such as location attributes, circuit structure 
40 
parameters (length), switch delay values, and arbiter parameters (see Figure 6.1). 
Because of the linear number of components, the space required to store all this data 
is still smaller than saving many challnge/response pairs. 
Path Segment 
Delays (A) 
Arbiter Characteristic 
h(.), ST, HT,... 
Location Parameters 
LUT(X,Y) 
Configuration 
Bitstream 
• t • 
Figure 6.1: An example of a database entry. 
When the user presents the FPGA and his PIN to the authentication system the 
following steps are performed: (i) The system retrieves the database entry associated 
with the provided PIN. (ii) Then, the FPGA is configured according to the database 
entry field values. The PUF will be placed on the location specified in the database 
and the input/output networks will be added to it. The configuration bitstream could 
be stored in each database entry along with other parameters or instead it could be 
generated online according to the circuit parameters and location attributes stored 
in the database. The latter method can drastically reduce the storage requirements, 
although it introduces a latency in generating the bitstream. Note that the number of 
feasible locations for placing the PUF is merely dependent on the size of the FPGA. 
(iii) At the third step, the binary challenges are sent and tried on the configured 
PUF; the responses are retrieved. Meanwhile, the database derives responses from 
the stored PUF parameters for the given challenge. Probability of error for each 
response is also calculated. The derived responses with lower error probabilities are 
compared with the received responses for authentication, (iv) After authentication is 
performed, the FPGA content is erased and the FPGA is returned to the user. 
Blank FPGA R N #: xxx 
Config. FPGA 
LExch. CRPSj 
Erase \nwf 
Authentication 
Server 
PIN #: xxx 
Figure 6.2: Configure-and-erase scheme steps 
41 
The proposed system can be subject to the a number of attacks. First, by tap-
ping the communication link between the FPGA and the authentication server and 
reading the configuration bitstream, the adversary might be able to discover the PUF 
structure and its location on the FPGA and later attempt to configure the FPGA 
and model the PUF. There are two ways to eliminate such threats. One way is to 
physically secure communication link which is not feasible in all applications. The 
other way is to encrypt the bitstream and make the FPGA non-reconfigurable by an 
unauthorized party. A timed authentication method, which is explained next, can 
be also used to hinder the efforts to perform PUF modeling attacks. The use of 
one-time PUFs (where each CRP is tried only once) would protect the PUF against 
man-in-the-middle attacks. 
6.1.1 Timed Authentication 
Swift evaluation of the arbiter-based PUF is a unique feature that can be used to 
safeguard the authentication process. Unlike ring oscillator PUFs, the arbiter-based 
PUFs can produce a response to a given challenge in a single clock cycle. The clock 
frequency is however limited by the delay of PUF structure itself. Attempts to build 
a software counterfeit of the PUF by either emulating the responses or reverse engi-
neering can be encumbered by imposing a tight timing constraint on PUF evaluation. 
The method could be realized by time stamping the responses using either the em-
bedded system clock or the authenticating server clock. The former requires a tamper 
resistant clock since otherwise an adversary with high speed clocking resources at his 
disposal might be able to reduce the system clock frequency and calculate the PUF 
response more quickly. Using the authentication server clock removes concern about 
clock tampering, however it would be limited to applications where the PUF and the 
server can communicate through a high speed channel whose latency is not much 
higher than the PUF evaluation time. 
CHAPTER 7 
Implementation Results 
We present the measurement and characterization results obtained by implementing 
the test circuit described in Section 5.2.1 on Xilinx Virtex 5 FPGAs. Figure 7.1 
shows the details of the implemented circuit structure. The circuit benefits an ex-
ternal clocking source which sweeps the clock frequency continuously from 13 to 15 
MHz. The frequency is swept every 65 milliseconds. The clock generator's output 
is then connected to a PLL inside the FPGA which multiplies the input frequency 
by 7, shifting the frequencies up to the 91-105 MHz interval. The PUF under test 
is triggered by a toggle flip flop which alternately produces a falling and rising edge 
signal. A 9-bit counter driven by the system clock resets the error counter values and 
issues a read signal every 512 clock cycles. The errors are counted by 8 bit counters. 
The linear PUF structure was used for testing and characterization. The PUF 
consists of 8 switches with 6 delay elements in between. The delay elements are 
implemented by a series of 6 NOT gates. Each NOT gate is realized by a separate 
LUT. To balance the path delays, the PUF was manually placed and routed using the 
FPGA Editor in Xilinx ISE design tool. Figure 7.2 shows the PUF after placement 
on the FPGA floor plan. Eight different challenge values with corresponding decimal 
values of 0, 1, 3, 7, 15, 31, 63, 127 are tried at each sweep. With the delays measured 
43 
by applying the challenges, one can find each path segment delay (S) by solving a 
system of linear equations. The error counter values are read using a Tektronix LA714 
logic analyzer when the READ signal goes high. The circuit was implemented on five 
XC5VLX50 and three XC5VLX110 Xilinx Vertex 5 chips. 
Launch 
Flip Flop 
Sample 
Flip Flops] 
PUF-under-test 
PLL 
«m 
njinruuinnn 
Clock Gen. 
13-15 MHz 
SYNCH Challenge 
Generator 
Capture 
Flip Flops 
Hft>n 
* I |vl:X". I 
Error 
Counter 1 
Counter 
Error 
Counter 2 
LsSbiteL/o 
C1 
C2 
READ 
SYNCH 
Figure 7.1: The detailed structure of the delay characterization test circuit. The clock 
frequency is consciously swept using an external clock generator and the frequency 
range is shifted up by the internal FPGA PLL. 
Toggle-FF > .* , . ^ s . ,. .. ,. .,.;'. 
. - , .. . 
., x * . 
FF, 
:_,'•'':,...-'' ....' :,.:, ' ' _, ' ' — ' ' ,.;.'' Switches - Pejay :~ —• ^F 2 
, — , — . , . , » . , , , . - , . , . . . . . : — , . . , . : , , -,— , Elements 
Figure 7.2: The PUF circuit on the FPGA floor plan after manual placement. A 
screen shot from FPGA Editor tool in the Xilinx ISE software. 
Figure 7.3 shows the relationship between the clock period and the PUF output 
sampling failure rate. At clock periods above 11.45ns (i.e., Region 1), the PUF 
output is sampled and captured successfully, making Region 1 a fault-free region. As 
the clock period decreases, sampling errors begin to appear (Region 2). The failure 
rates reach a plateau of 0.5 (Region 3). In this region, the sample flip flop always 
fails to properly sample the rising edge transitions but it can capture the falling 
edge transitions successfully. This is due to the fact that the delays for positive 
and negative transitions through the PUF are different. Since half of the transitions 
44 
are positive and the other half are negative, the failure rate would be 50%. If the 
clock period is further decreased, errors would appear for the falling edge signals too 
(Region 4). Finally in Region 5, all of the sample values would be erroneous. The 
curves in Regions 2 and 4 are in fact the flip flop characteristics. The flip flop setup 
and hold times are indicated by markers in Figure 7.3. The 10% and 90% values for 
the ST and HT times are chosen respectively. The points where the failure rates are 
equal to 0.25 and 0.75 virtually correspond to the cases where the clock period is 
equal to the circuit delay for rising and falling transitions respectively. Also note the 
fineness of delay resolutions at which the measurements are carried out. 
t 
2 0.8 
11.45 11.4 
I 
Region 
1 
Region 
2 
i ST HT 
• 
•T:. 
• £ ' 1 
I ST HT ! 
( <*-+•<—> 
Region 
3 
\ 
••••'{• 
p 
! : j r ' 
, :, 
* s 
Region 
4 
, 
i 
i 
s 
! 
• 
Region 
G 
, 
11.35 11.3 
Delay (ns) 
Figure 7.3: Measured arbiter characteristic. 
We begin with characterizing the arbiter. To efficiently represent the arbiter, 
a parametric model could be fit to the arbiter characteristic and the pertinent pa-
rameters are estimated and stored in the database. The arbiter's non-deterministic 
behavior in presence of small input delay differences is influenced by circuit noise 
and many other surrounding effects. Due to central limit theorem we argue that the 
arbiter output can be represented by Gaussian cumulative function (CDF). We fit 
the Gaussian CDF to the measured arbiter characteristic in least square sense and 
estimate its mean and standard deviation. Figure 7.4 shows a section of the measured 
arbiter characteristic along with the Gaussian fit. The standard deviation (a) of the 
45 
fit determines the speed of the arbiter and the mean value corresponds to the PUF 
delay. Note that the a (arbiter speed) can be different for rising edge and falling edge 
signals. Therefore, the arbiter can be effectively represented by two parameters, i.e., 
arise and 0/o/j. Setup time and hold times are functions of a. 
a 
1 
0.95 
0.9 
0.85 
2 
ui 0.8 
"S 
£ 0.75 
~s 
£ 0.7 
S 
°" 0.65 
0.6 
0.55 
0.5 
9.39 9.4 9.41 9.42 9.43 9.44 9.45 9.46 
Time (ns) 
Figure 7.4: Measured arbiter characteristic in region 4 and the corresponding Gaus-
sian fit. 
We used the characterization circuit to measure the top and bottom path delays 
of the PUF. The delays are measured for eight different challenge values. Figures 7.5 
(a) and (b) respectively show the path delays for falling edge and rising edge signals 
propagating through the PUFs implemented on XC5VLX50 chips. Figures 7.5 (c) 
and (d) show the same data but for XC5VLX110 chips. Each plot in these figures 
correspond to one chip. There are two sets of data on each plot distinguished by 
circle and dot markers. The circle marker represents the top path delay while the dot 
marker refers to the bottom path delay. The top and bottom path delays are shown 
for the given challenge values on the x-axis. As the measurements suggest, the path 
delays are in some case correlated among the chips (e.g., see how the delays on the 
second and third plots in Figure 7.5 (d) follow the same trend). 
We next estimated the arise and crfau of the flip flops from the measurement data. 
With two sample flip flops per circuit and a total of 5 circuits on XC5VLX50 chips 
46 
10.66L 
0 
10.3 
,10.26 
! I 
; 10.27, 
.. - - « 
! 10.26 
110.25 
& ' 
10.38 
10.37' 
10.36 
'«-
10.35 
10.34.t 
- . - • - & • • 
, * . r . ^ - * - ^ 
127 
t....— 
0~ J 
- - © • -
127 
. - ^ • : 
. . .^ ^ - J . * ' ' - , - " - , * ' i - c . . 
127 
"• -o--' - -&:;• . :* ---•; 
. , - . - « • • 
.--.• 
3 7 15 31 63 
Challenge Value (in Decimal) 
io.( 
10.75 
10.7 
AT 
10.35f 
10.3 
T^Tr 
10.3 
1 
1- &__ 
3 
— e — 
7 
— 4 - -
15 
._ ..*-_. 
31 
- - A -
63 I! 
a. 
' ' 
^rrr: J*-" 
7 15 31 63 
Challenge Value (hi Decimal) 
^ 
^:v-' 
„ * . - • -.-$.-r. --.9.-.. ~.-.-$r.~~. ,k.,:,,-rX 
(a) (b) 
9.38 
9.375* 
! . 
, &.:-,. 
' -^ y- — -a-.. .^  
63 127 
-e> * - - a i> 
s
'@ 
3 7 15 31 63 
Challenge Value ( In Decimal) 
(c) 
sasr 
f 9.4 
••-?-..-.-..•• 
*.-.-.-^'' 
. - A 
• - • O - - - - ' : O T -
15 31 63 127 
15 31 63 127 
. « - • - • - A 
..<?• 
0 1 3 7 15 31 63 127 
Challenge Value ( In Decimal) 
(d) 
Figure 7.5: The top and bottom path delays of PUFs on XC5VLX50 chips for (a) 
rising edge transition, and (b) falling edge transition; The top and bottom path 
delays of PUFs on XC5VLX110 chips for (c) rising edge transition, and (d) falling 
edge transition. 
47 
and 3 circuits on XC5VLX110 chips, there were 16 flip flops to characterize. The 
measurements were repeated 8 times for each flip flop. The box plots in Figures 
7.6 (a) and (b) show the estimated ajau and arise respectively for the flip flops on 
XC5VLX50. Figures 7.6 (c) and (d) shows the same result for the flip flops on three 
XC5VLX110 chips. The adjacent flip flops in the figures (i.e., FF1 and FF2) come 
from the same circuit and are physically close to each other in the circuit. FF1 
and FF2 reflect similar speed characteristics. The flip flops on the third XC5VLX50 
chip is the fastest among the other XC5VLX50 chips. Also the flip flop on the first 
XC5VLX110 chip is the fastest. The narrowness of the boxes confirms the accuracy 
of the measurements and parameter estimations. 
The measurements can also help locate faulty or unstable flip flops. For example, 
the test results on one of the the flip flops as shown in Figure 7.7 demonstrates a 
noisy glitch marked by the black circle. The glitch keeps repeating at every round 
of measurement. Thus, when implementing PUFs, the designer should avoid noisy 
arbiters and those with large setup and hold times. Even if the noisy flip flop (like 
the one shown in Figure 7.7) has to used as an arbiter, then those challenges that 
cause a delay difference coinciding with the glitch must be avoided. 
The amount of delay variability in future FPGA technology follows an upward 
trend. Sedcole et al. in [50] predict that with a slightly rapid increase in stochastic 
variability (2% every three years), stochastic variation will amount to 11.5% in 22nm 
technology node while systematic variations keeps decreasing. Table 7.1 shows the 
amount of variability in the current and pending FPGA technologies [50]. 
Table 7.1: Delay variability in current and future FPGA technologies. 
Production year 
Node 
Stochastic var. (3a) 
2004 
90nm 
3.3% 
2007 
65nm 
5.5% 
2010 
45nm 
7.5% 
2016 
22nm 
11.5% 
We estimated the amount of observed variability on 65nm Virtex 5 FPGA family 
48 
20 
19 
18 
17 
| 16 
" l 1 5 
e 
14 
13 
12 
11 
10 
a ft 
X ^ 
Chip 2 
^ 3 E ^ 
Chipl 
: 
Chip 3 
* 
Chip 5 
Chip 4 
• 
F 1 FF2 FF1 FF2 FF1 FF2 FF1 FF2 FF1 FF2 
Flip Hop 
22 
20 
18 
i 
| 16 
14 
12 
10 
§ g 
+ 
Chip 2 
• Chipl 
• 
• 
Chip 3 
$ 4. 
Chip 4 
Chips 
"• 
FF1 FF2 FF1 FF2 FF1 FF2 FF1 FF2 FF1 FF2 
Flip Flop 
(a) (b) 
12 
11.5 
11 
- , 10.5 I 
| 10 
9.5 
9 
8.5 
8 
. 
Chipl 
i e 
Chip 2 
+ + 
. 
a i 
Chips 
FF1 FF1 FF2 
Flip Flop 
FF1 FF2 
(c) (d) 
Figure 7.6: The sample flip flop speed a of PUF circuits on XC5VLX50 chips for 
(a) rising edge transition (crrjSe), and (b) falling edge transition (afaii); The sample 
flip flop speed a of PUF circuits on XC5VLX110 chips for (c) rising edge transition 
(arise), and (d) falling edge transition (afaii)-
49 
• 
f 
' 
f1 
Figure 7.7: Faulty flip flop behavior. 
from the measurement data. Assuming the additive delay model for the PUF, the 
total path delay (Z) can be written as sum of independent and identically distributed 
random variables (Xi), i.e., Z = X\ + X2 + ... + Xn where n is the total number of 
delay elements in the PUF structure, Xi ~ N([i, a) and Z ~ N(fiz, az). Then 
ez/»z = (l/y/K)x(a/»). (7.1) 
The mean and variance of Z were estimated as fiz = 10.41ns and az = 0.068ns 
using the delay measurements obtained from XC5VLX50 chips. Since the PUF uses 
8 switches and 6 delay elements in between then the total length of the PUF would 
be n = 56. Therefore from Equation 7.1, the variability is derived as (3 x cr)/// ~ 5%. 
The amount of variability conforms with the estimated values shown on Table 7.1. 
The measurement results show that even though variability exists in the structure, 
the amount of variability in presence of the arbiter imperfections and measurement 
biases is not enough to make a robust and yet random response on all of our test 
chips. In what follows, we present a number of interesting observations from our 
implementation experiments. 
Figure 7.8 shows the distribution of the delay differences between the top and 
bottom paths for XC5VLX50 test chips. Figures 7.8 (a) and (b) show the distribution 
for rising and falling edge transition delays. Each distribution contains 40 points; 5 
chips and 8 challenges per chip. The flip flop 3a setup/hold-times are depicted on 
the same figure by vertical lines for the fastest and slowest flip flops on the chips. 
50 
It can be seen that in Figure 7.8 (a), on our test chips, the delay differences are 
smaller than 3cr setup/hold times of the arbiter, resulting in metastable responses. In 
the falling edge transition case shown in Figure 7.8 (b), 87.5% and 70% of the points 
fall inside the metastable region of the slow and fast flip flops respectively. Figures 
7.8 (c) and (d) show the same results for XC5VLX110 chips. For the falling and 
rising edge transition cases (Figure 7.8 (d) and (c)) 87.5% and 100% of the points 
fall inside the metastable region of both slow and fast flip flops. As it can be seen, 
because of the lack of calibration, the distributions are skewed toward positive and 
negative values. 
Also, it is important to note that a delay difference between the top and bottom 
paths after the last switch and before the arbiter may cause a bias in the responses 
toward zero or one since this is the only path that does not switch. As shown on Figure 
7.2, the sample flip flops (FF1 and FF2) are symmetrically placed on the top and 
bottom paths to minimize the delay difference in the measurements. To implement 
the PUF, the two sample FFs must be replaced by a single FF whose inputs could 
potentially follow asymmetric routes introducing bias in the PUF responses. As the 
measurement results suggest, any delay difference on the path segment between the 
last switch and the arbiter in the order of 100 pico-seconds can be deadly and force 
the responses completely into zero or one by moving the mean of the distribution. 
Thus, careful calibration and compensation of this bias is crucial for obtaining robust 
results. This could be achieved by insertion of extra delay elements or trying different 
routes on either of the top or bottom path 
We have also tried changing the placement strategies and the PUF lengths (64 
switches, 128 switches) on the FPGAs. The small delay variability can partially be 
compensated by including more delay elements. As Equation 7.1 implies oz increases 
as the square root of the number of delay elements (n) assuming that Hz increases 
linearly n. Assuming the amount of variability on 22nm FPGA with same nominal 
51 
delays, the total delay variation of the PUF (n=56) would be 3az = 160 ps which is 
more than twice the measured 3cr of the slowest arbiter. 
Successful and systematic implementation of PUFs requires a larger variability 
(that is technologically inevitable with the current trends), details of the variability 
^information about the FPGA fabric and switches, as well as development of tools for 
automatic timing-aware PUF placement and routing. 
52 
Max 3o (Slowest FF) 
. Min 3o {Fastest FF) ^ 
•^ —— pi 
- 4 0 - 3 0 - 2 0 - 1 0 0 10 20 30 40 
Top and Bottom Path Delay Difference (ps) 
(a) 
MaxaafSfcmnstFF) 
*!% 
Min 3a (Fastest FF)i 
• ^ — — & 
^Ksmavmnil 
-140 -120 -100 - 8 0 - 6 0 - 4 0 - 2 0 0 
Top and Bottom Path Delay Difference (ps) 
(b) 
Max 3d (Slowest FB 
Min 3a (Fastest FF) ^ 
I, i 
i 
-15 - 1 0 - 5 0 5 10 15 
Top and Bottom Path Delay Difference (ps) 
- 6 0 - 5 0 - 4 0 - 3 0 - 2 0 - 1 0 0 10 20 
Top and Bottom Path Delay Difference (ps) 
(c) (d) 
Figure 7.8: The distribution of delay difference between the top and bottom paths. 
The distribution is calculated for 8 input challenges, 5 XC5VLX50 chips, and 3 
XC5VLX110 chips. Figures (a) and (b) show the distribution for rising and falling 
transition delays on XC5VLX50 chips. Similarly, Figures (c) and (d) show the dis-
tribution for rising and falling transition delays on XC5VLX110 chips. The vertical 
lines show the 3<r detection edges of slow and fast arbiters (FF). 
CHAPTER 8 
Conclusion 
In this work, we developed new techniques for the design and implementation of PUFs. 
The PUF vulnerabilities to various types of potential attacks were discussed. We 
demonstrated how reconfigurability can be exploited to ensure that PUFs are resilient 
against the potential attacks and are robust to unpredictable operational conditions. 
A PUF testing and characterization mechanism enabled by reconfigurability feature 
provides tools for diagnosis, CRP compression, and determining the level of confidence 
in responses. A unique input/output logic network along with an interconnecting 
approach is introduced to encumber attempts at reverse engineering or modeling the 
PUF. The proposed building blocks are added to the PUF after the characterization 
step. We have shown applications where FPGA-based PUFs can be used for security 
and privacy protection. The effectiveness of all the proposed claims were validated 
using extensive implementations, simulations, and statistical analysis. 
References 
[1] R. Pappu, B. Recht, J. Taylor, and N. Gershenfeld, "Physical one-way functions," 
Science, vol. 297, pp. 2026-2030, 2002. 1, .7 
[2] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, "Silicon physical random 
functions," in Conference on Computer and communications security (CCS), 
2002, pp. 148-160. 1, 2, 5, 7, 19, 39 
[3] , "Controlled physical random functions," in Computer Security Applica-
tions Conference (ACSAC), 2002, pp. 149-160. 1, 2, 19, 20, 24, 39 
[4] S. Trimberger, "Trusted design in FPGAs," in Design Automation Conference 
(DAC),I 2007, pp. 5-8. 1, 2, 8, 39 
[5] F. Koushanfar and M. Potkonjak, "CAD-based security, cryptography, and dig-
ital rights management," in Design Automation Conference (DAC), 2007. 1, 2, 
7,39 
[6] L. Bolotnyy and G. Robins, "Physically unclonable function-based security and 
privacy in RFID systems," in International Conference on Pervasive Computing 
and Communications, 2007, pp. 211-220. 1 
[7] R. Anderson, Security Engineering: A guide to building dependable distributed 
systems. John Wiley and Sons, 2001. 1 
[8] B. Hoeneisen and C. A. Mead, "Fundamental limitations in microelectronics I-
MOS technology," Solid-State Electronics, vol. 15, no. 7, pp. 819-829, 1972. 1 
[9] G. Suh and S. Devadas, "Physical unclonable functions for device authentication 
and secret key generation," in Design Automation Conference (DAC), 2007, pp. 
9-14. 1, 7 
[10] Y. Alkabani, F. Koushanfar, and M. Potkonjak, "Remote activation of ICs for 
piracy prevention and digital right management," in International Conference 
on Computer Aided Design (TCCADj, 2007, pp. 674-677. 2, 7, 39 
55 
[11] J. Lee, L. Daihyun, B. Gassend, G. Suh, M. van Dijk, and S. Devadas, "A 
technique to build a secret key in integrated circuits for identification and au-
thentication applications," in Symposium of VLSI Circuits, 2004, pp. 176-179. 
4, 7, 12, 19, 20 
[12] M. Majzoobi, F. Koushanfar, and M. Potkonjak, "Testing techniques for hard-
ware security," in International Test Conference (ITC), 2008, pp. 1-10. 5, 7, 19, 
26, 32 
[13] B. Hoeneisen and C. A. Mead, "Fundamental limitations in microelectronics I-
MOS technology," Solid-State Electronics, vol. 15, no. 7, pp. 819-829, 1972. 6 
[14] R. Keyes, "Physical limits in digital electronics," Proceedings of the IEEE, vol. 63, 
no. 5, pp. 740-767, 1975. 6 
[15] C. A. Mead, "Scaling of MOS technology to submicrometer feature sizes," Ana-
log Integrated Circuits and Signal Processing, vol. 6, no. 1, pp. 9-25, 1994. 6 
[16] A. Asenov, "Random dopant induced threshold voltage lowering and fluctuations 
in sub-0.1 m MOSFET's: A 3-d atomistic simulation study," IEEE Transactions 
on Electron Devices, vol. 45, no. 12, pp. 2505-2513, 1998. 6 
[17] K. Lofstrom, W. Daasch, and D. Taylor, "IC identification circuits using device 
mismatch," in International Solid-State Circuits Conference (ISSCC), 2000, pp. 
372-373. 6 
[18] S. Maeda, H. Kuriyama, T. Ipposhi, S. Maegawa, Y. Inoue, M. Inuishi, N. Kotani, 
and T. Nishimura, "An artificial fingerprint device (AFD): a study of identifi-
cation number applications utilizing characteristics variation of polycrystalline 
silicon TFTs," IEEE Transactions Electron Devices, vol. 50, no. 6, pp. 1451-
1458, 2003. 6 
[19] Y. Su, J. Holleman, and B. Otis., "A 1.6J/bit stable chip ID generating cir-
cuit using process variations," in International Solid State Circuits Conference 
(ISSCC), 2007, pp. 606-611. 6 
[20] B. Sunar, W. J. Martin, and D. R. Stinson, "A provably secure true random 
number generator with built-in tolerance to active attacks," IEEE Transaction 
on Computers, vol. 58, no. 1, pp. 109-119, 2007. 6 
[21] Y. M. Alkabani and M. P. F. Koushanfar N. Kiyavash, "Trusted integrated cir-
cuits: A nondestructive hidden characteristics extraction approach," in Informa-
tion Hiding, 2008. 7 
[22] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, "Delay-based circuit au-
thentication and applications," in Symposium on Applied computing, 2003, pp. 
294-301. 7 
56 
[23] B. Gassend, D. Lim, D. Clarke, M. van Dijk, and S. Devadas, "Identification and 
authentication of integrated circuits," Concurrency and Computation: Practice 
and Experience. John Wiley & Sons, vol. 16, no. 11, pp. 1077-1098, 2004. 7 
[24] E. Ozturk, G. Hammouri, and B. Sunar, "Physical unclonable function with 
tristate buffers," in IEEE International Symposium on Circuits and Systems, 
2008, pp. 3194-3197. 7 
[25] L. Jie and J. Lach, "At-speed delay characterization for IC authentication and 
trojan horse detection," in International Workshop on Hardware-Oriented Secu-
rity and Trust(HOST), 2008, pp. 8-14. 7 
[26] P. Tuyls, G. Schrijen, B. koric, J. V. Geloven, N. Verhaegh, and R. Wolters, 
"Read-proof hardware from protective coatings," in Cryptographic Hardware and 
Embedded Systems Workshop, 2006, pp. 369-383. 7 
[27] Y. M. Alkabani and F. Koushanfar, "Active hardware metering for intellectual 
property protection and security," in USENIX Security Symposium, 2007, pp. 
1-16. 7 
[28] F. Koushanfar, G. Qu, and M. Potkonjak, "Intellectual property metering," in 
Information Hiding Workshop, 2001, pp. 81-95. 7 
[29] F. Koushanfar and G. Qu, "Hardware metering," in Design Automation Confer-
ence (DAC), 2001, pp. 490-493. 7 
[30] I. Kuon, R. Tessier, and J. Rose, FPGA Architecture. Now Publishers, 2008. 8 
[31] S. Drimer, "Authentication of FPGA bitstreams: Why and how," Reconfigurable 
Computing: Architectures, Tools and Applications Lecture Notes in Computer 
Science, vol. 4419, pp. 73-84, 2007. 8 
[32] J. Fry and M. Langhammer, "RSA and public key cryptography in FPGAs," 
in Tech. rep., Altera Corporation, 2005. 8 
[33] J. Brizek, M. Khan, J. P. Seifert, and D. Wheeler, "A platform-level trust-
architecture for hand-held devices," in Workshop on Scalable trusted computing, 
2005, pp. 19-20. 8 
[34] T. Eisenbarth, T. Gneysu, C. Paar, A. Sadeghi, D. Schellekens, and M. Wolf, 
"Reconfigurable trusted computing in hardware," in Workshop on Scalable 
trusted computing, 2007, pp. 15-20. 8 
[35] B. Glas, A. Klimm, O. Sander, K. Muller-Glaser, and J. Becker, "A system 
architecture for reconfigurable trusted platforms," in Conference on Design, Au-
tomation and Test in Europe (DATE), 2008, pp. 541-544. 8 
[36] J. Guajardo, S. Kumar, G. Schrijen, and P. Tuyls, "FPGA intrinsic PUFs 
and their use for IP protection," in Workshop on Cryptographic Hardware and 
Embedded Systems (CHES), 2007, pp. 63-80. 8 
57 
[37] S. Kumar, J. Guajardo, R. Maes, G.-J. Schrijen, and P. Tuyls, "The butterfly 
PUF protecting IP on every FPGA," in International Workshop on Hardware-
Oriented Security and Trust (HOST), 2008, pp. 67-70. 8 
[38] T. Wollinger, J. Guajardo, and C. Paar, "Security on FPGAs: State-of-the-
art implementations and attacks," ACM Transactions on Embedded Computing 
i Systems (TECS) - special issue on Embedded Systems and Security, vol. 3, no. 3, 
2004. 8 
[39] T. Guneysu, B. Moller, and C. Paar, "Dynamic intellectual property protection 
for reconfigurable devices," International Conference on Field-Programmable 
Technology (ICFPT), pp. 169-176, 2007. 8 
[40] K. Bernstein, D. Frank, A. Gattiker, W. Haensch, B. Ji, S. Nassif, E. Nowak, 
D. Pearson, and N. Rohrer, "High-performance CMOS variability in the 65-nm 
regime and beyond," IBM Journal of Research and Development, vol. 50, no. 
4/5, pp. 433-450, 2006. 8 
[41] P. Friedberg, Y. Cao, J. Gain, R. Wang, J. Rabaey, and C. Spanos, "Modeling 
within-die spatial correlation effects for process-design co-optimization," Interna-
tional Symposium on Quality of Electronic Design (ISQED), pp. 516-521, 2005. 
8 
[42] Y. Alkabani, T. Massey, F. Koushanfar, and M. Potkonjak, "Input vector control 
for post-silicon leakage current minimization in the presence of manufacturing 
variability," in Design Automation Conference fDAC,), 2008, pp. 606-609. 8 
[43] D. Shamsi, P. Boufounos, and F. Koushanfar, "Noninvasive leakage power to-
mography of integrated by compressive sensing," in International Symposium on 
Low Power Electronics and Design (ISLPED), 2008, pp. 341-346. 8 
[44] M. Majzoobi, F. Koushanfar, and M. Potkonjak, "Secure lightweight PUFs," in 
ICCAD, 2008, pp. 670-673. 8 
[45] C. P. Pfleeger and S. L. Pfleeger, Security in Computing. Prentice Hall, 1997. 
17 
[46] M. Feldhofer and C. Rechberger, "A case against currently used hash functions 
in RFID protocols," in Workshop on RFID Security, 2006, pp. 372-381. 21 
[47] J. S. J. Wong, P. Sedcole, and P. Y. K. Cheung, "Self-characterization of com-
binatorial circuit delays in FPGAs," in International Conference on Field-
Programmable Technology, 2007, pp. 17-23. 21 
[48] T. Corman, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. 
MIT Press, 2001. 23 
58 
[49] E. Ozturk, G. Hammouri, and B. Sunar, "Towards robust low cost authentication 
for pervasive devices," in International Conference on Pervasive Computing and 
Communications, 2008, pp. 170-178. 26 
[50] P. Sedcole and P. Y. K. Cheung, "Within-die delay variability in 90nm FPGAs 
and beyond," in International Conference on Field-Programmable Technology 
(FPT), 2006, pp. 97-104. 35, 47 
