



FPGA Based Arbiter Physical 
Unclonable Function Implementation 
with Reduced Hardware Overhead 
 
 
Alexander A.  Ivaniuk  and Siarhei S.  Zalivaka(B) 
 
Belarusian  State  University  of Informatics and Radioelectronics, 





Abstract. The existing implementations of the arbiter physical unclon- 
able function (PUF) are based on synthesis of configurable symmetric 
paths, each link of which is a pair of two-input multiplexers providing two 
configurations of test signal translation:  straight and exchange. To build 
a single link on FPGA, it  is necessary to use two built-in  LUT blocks, 
providing the implementation of two multiplexers, and the hardware 
resources of the  LUT blocks are not  fully  utilized.  The  paper presents 
a new architecture of symmetric paths of the arbiter PUF, providing 
efficient use of the hardware resources of LUT blocks for various Xilinx 
Artix-7  FPGA family. 
 
Keywords: Physical  Unclonable  Function  · Arbiter  · FPGA · LUT · 
Symmetrical  path 
 
 
1    Introduction 
 
Protection  of digital  devices against  unauthorized use, copying and  modifica- 
tion can be achieved by various methods, algorithms and technical means (e.g. 
encryption, watermarking and fingerprinting, active and passive metering, formal 
verification, etc.).  Relatively  novel scientific area named physical cryptography 
can be highlighted among many mentioned above methods. This direction is 
mainly based on so called Physical Unclonable Functions (PUFs) [1]. The main 
idea of PUFs  is in extraction of unique physical characteristics from a fabricated 
digital  device. Manufacturing  process variations of integrated circuits bring a 
lot of random, unpredictable changes to its structure. Therefore, each instance 
of a  manufactured integrated circuit  becomes unique and  irreproducible with 
unpredictable physical characteristics. These unique changes can be extracted 
by designing special digital circuits which can produce unique digital responses 
as a result of applying specific digital  challenges to the device. In general, cir- 
cuit  implementation of  a  PUF can  be represented as  a  block  with  n digital 
inputs getting n-bit challenge C from all possible 2n combinations and produc- 
ing  one single-bit  output  value  R (Response).  The  behaviour of such circuit 
Oc   Springer Nature Switzerland AG  2019 
S.  V.  Ablameyko et al. (Eds.): PRIP 2019, CCIS 1055, pp. 216–227, 2019. 
https://doi.org/10.1007/978-3-030-35430-5_18 








can be represented  as a boolean function mapping {0, 1 n → {0, 1}.  The  ran- 
domness of this function can be explained by the fact that  the exact mapping 
of the set of challenges to the set of responses  is unknown until  the device is 
fabricated. This property also depends on uncontrollable variations of all manu- 
facturing stages. Thus,  the set of all possible challenge-response pairs of a PUF 
CRα = {c0 r0 , c1 r1 , . . . , c2n −1 r2n −1 } determines the uniqueness of an instance α 
of the digital device and ri = PUFα (ci ), i = {0, 1, . . . , 2n−1 } determines unique 
dependency between a response ri and a challenge ci . 
PUF should meet the following criteria to be utilized in physical cryptogra- 
phy [1, 2]. 
 
1.  Hardware overhead for PUF implementation should not exceed the hardware 
cost of a protected device. 
2.  Collection, storage and analysis of the set CRα should be physically infeasible 
using modern equipment within reasonable time.  The  PUF, which matches 
this  criterion, is called a  strong PUF for parameter n ≥ 64. To  store the 
whole set of Rα , the  16 exabyte  memory device is required. At  the  same 
time, collection of all possible values of Rα takes approximately 580 years if 
the response time of a memory device is 1 ns. 
3.  Knowing information about the challenge-response  pair ci ri for a particular 
instance of a device α it is impossible to calculate, simulate or design a math- 
ematical  model to  predict value of a pair cj rj , i = j , or any  other subset 
of such pairs. If  a PUF satisfies mentioned condition, it  can be considered 
random  and unpredictable. 
4.  For  a  particular  device α  the  set  of  responses R∗ , |R∗ |   < |Rα |,  can  be
 
α α 
extracted  many  times with  a  high  degree of reliability  by  applying  corre- 
sponding set of challenges C ∗ , |C ∗ | < |Cα | to the inputs of a PUF. The PUF α α 
which has this property can be considered stable (reliable ). 
5.  The set D = {α0 , α1 , . . . , αm−1 }, |D| = m, for different instances of a digital 
device with embedded PUF circuit and fabricated using the same technology, 
should match the following condition CRα0 = CRα1 = . . . = CRαm 1 . This 
condition can be strengthened by additional condition R∗ ∗   = . . . = 
αm−1    
for a  corresponding set of challenges Cα0   =  Cα0   =  . . . = Cαm−1 , 
|Cα0 | = |Cα1 | = . . . = |Cαm−1 | = plog2 ml.  The PUF which has this property 
can be considered unique. 
 
The  PUF design which matches criteria described above can be efficiently 
used as a cryptographic primitive for following tasks. 
 
–  unclonable identification of digital devices; 
–  reliable authentication of digital devices; 
–  irreproducible random number sequences generation; 
–  hardware implementation of hash functions; 
–  hardware implementation of watermarks and fingerprints; 
–  protection digital devices against illegal cloning and modifying. 
 
There  are  many  circuit  implementation  of  PUFs   for  digital  device  [1, 2]: 
arbiter  PUF, ring  oscillator PUF, butterfly  PUF, memory based PUFs, etc. 







Almost  all of mentioned PUF designs are based on measuring delay differences 
between signals propagated by symmetrical paths formed by a set of sequentially 
connected digital elements. 
One  of the most widely used methods of a PUF implementation for digital 
devices based on measuring delay differences is called Arbiter  PUF  (A-PUF) 
[3–6].   In  contrast  to  the  other  mentioned  PUFs,  this  type  can  be  classi- 
fied as a strong PUF and involves acceptable hardware overhead. However, 
practical implementation of this PUF has some disadvantages, which are under 
investigation of many developers and researchers in physical cryptography area. 
One main issue of A-PUF design is its low reliability of challenge-response pairs 
set and a vulnerability to an accurate modeling using known subset of pairs CR∗ , 




2    Classical Arbiter PUF Design 
 
Classical  Arbiter  PUF (A-PUF) implementation usually contains three blocks, 
namely  Test  Pulse  Generator   (TPG),  Symmetrical  Paths   Block  (SPB)   and 




SW0  SW1  SWj  SWn-1 
 
TPG 
a0     x0 
 
 
b0     y0 
a1     x1 
 
 
b1     y1 
... aj      xj 
 
 
















Fig. 1.  Classical  Arbiter  PUF block level design. 
 
 
SPB  usually embeds n paths switching blocks (SWj ) controlled by external 
signals c
j  ∈ {0, 1},  j = {0, 1, 2, . . . , n − 1},  which correspond to the input chal- 
lenge bits.  Each  switch SWj   has two inputs aj , bj   and two outputs xj , yj . If 
i = 0 paths are configured in a straight mode, i.e. signal goes from input aj  to 
output xj  and from bj  to yj , correspondingly.  Otherwise (c
j  
= 1), switch SWj 
has an exchanging configuration, so input aj  is now connected to yj  and bj  to xj . 
Thus, n switching blocks provide 2n possible path configurations for propagation 
of two test pulses generated by TPG block. Arbiter  block determines which of 
two signals arrives earlier, i.e.  if signal from output xn−1  came faster than the 
one from output yn−1 AB  produces response value ri = 1. Otherwise, it outputs 
ri = 0. 
AB   of  classical  A-PUF  design usually  triggered by  rising edge of  a  test 
pulse produced by  TPG. Therefore,  synchronous D  Flip-Flop  (DFF) is used 





for response generation. Signal  from xn−1  is propagated to the clock input of 
DFF and yn−1 – to the data input. One main problem of this circuit is metasta- 
bility  state of the DFF, which degrade overall stability  of the A-PUF design. 
This  issue can  be resolved by  implementing AB  as latches (e.g.  RS  latch)  or 
multiple flip-flops [4]. 
Ma jor hardware overhead in A-PUF design with n ≥ 8 switches is located in 
SPB. Consider a detailed implementation of SW  components. Basic approach to 
implement one switching block usually utilizes a two-multiplexers circuit shown 











































I2   




















(a) RTL  level diagram (b) LUT-block  level diagram 
 
Fig. 2.  Circuit  implementation  of SW  block. 
 
 
The  circuit in Fig. 2a can be synthesized as two LUT6  blocks as shown in 
Fig. 2b,  which correspond to  the  combinational circuit  of 6-input multiplexer 
and configurable memory. Input signals aj  and bj  are fed into the corresponding 
input ports I0 and I1 of the upper LUT6  (A)  block and to the I1 and I0 ports 
of the bottom LUT6  (B) block, respectively. The challenge signal c
j  
is passed to 
the I2 input ports of both LUT6  blocks. The  other input ports (I3, I4, I5) are 
not utilized if the standard synthesis configuration is used. The  output signals 
xj  and yj  correspond to the output port O  of the LUT6  block. 
FPGA chips fabricated by Xilinx usually have 4- to 6-input LUT blocks. The 
number of inputs in hardware blocks depends on the FPGA architecture [7]. LUT 
block contains a configuration memory and a set of multiplexers for translation 
values stored in this memory. Address for the memory is formed by signals from 





multiplexer select inputs.  Redundant  inputs are usually  fed with constant  ‘0’ 
value. 
There are four types of LUT6  blocks in Xilinx Artix-7 FPGA, LUT6  (general 
output),  LUT6  D  (general and local output),  LUT6  L  (local output),  LUT6  2 
(two outputs). Figure 3 shows the structure of LUT6  L block which is configured 
as a multiplexer xj  from block SWj  (see Fig. 2). As shown in Fig. 3, only 12.5% of 
LUT6  block resources are utilized. Therefore, if number of switches is significant, 






Configuration  m em ory 




1   0   0   1 
 
0 





7      6 5 
 





1      0 





























Fig. 3.  Circuit  implementation  of a multiplexer  using LUT6 L. 
 
 
For  example,  for n  = 128 implementation of  the  SPB   requires 256 LUT 
blocks of Xilinx  Artix-7  FPGA, which is around 1% of all resources available in 
XC7A100T chip [7]. 
 
 
3    Proposed Symmetrical Path Design 
 
Classical implementation of A-PUF in FPGA provides SPB  with unique param- 
eters, e.g. delays of internal multiplexers providing translation of a chosen signal 
to an output port [8]. 
Assume  time  required for rising edge of a  test  signal to  reach output  xj 
from  input  aj   (c
j
 =  0)  be  denoted  as  δ(xj , aj )  and  as  δ(xj , bj )  for  input 
bj , output  xj   (c
j
 = 1),  respectively.  Similarly  denote these characteristics for 
a  second multiplexer,  δ(yj , aj )  (c
j
 = 1) and  δ(yj , bj )  (c
j
 = 0).  To  estimate 
these parameters the post place-route modeling of SWj  has been performed for 
221
1 











































































(a) RTL  level diagram (b) LUT-block  level diagram 
 




XC7A100T FPGA  platform.  As  a result, following values for SW0   block have 
been obtained,  δ(x0 , a0 )  =  0.809 ns,  δ(x0 , b0 )  =  0.309 ns,  δ(y0 , a0 )  =  1.022 ns, 
δ(y0 , b0 )  =  1.022 ns.  For  the  neighbour block SW1    these values differ signifi- 
cantly,  δ(x1 , a1 ) = 0.719 ns,  δ(x1 , b1 ) = 0.506 ns,  δ(y1 , a1 ) = 0.296 ns,  δ(y1 , b1 ) =  
0.083 ns. These results can be explained by uniqueness of utilized LUT blocks 
and asymmetrical configuration of its connections. Unique values of the delays 
prove that  the second copy of the switching block can be implemented using 
unutilized resources of a LUT block.  Therefore, all  unused inputs (I3,  I4 and 
I5) can be reconfigured for an additional switching block. Figure 4 shows both, 
circuit implementation of the block containing SWj  (see Fig. 4a) and the syn- 
thesized scheme on LUT6  blocks (see Fig. 4b). 
This circuit provides four configurations of inputs aj , bj , aj+1 , bj+1  to outputs 
xj , yj : two straight connections when c
j  = 0, cj+1  = 0 and cj  = 1, cj+1  = 1, and i i i i 
two exchanging connections when cj  = 1, cj+1  = 0 and cj  = 0,  cj+1  = 1. i i i i 
FPGA Based Arbiter  Physical  Unclonable  Function  Implementation 221  
 
Table 1 presents the  rising edge delays for test  signal for different combi- 




. These values have been i i 
obtained using post place-route modeling for n = 16 A-PUF implemented in 
Xilinx  Artix-7  XC7A100T FPGA. 
 
Table 1. Delay  values for two different switching blocks. 
 





Delay  type Delay  value for SW1 block,  ns 
00 δ(xj , aj ) 0.443 
δ(yj , bj ) 0.481 
01 δ(xj , aj+1 ) 0.392 
δ(yj , bj+1 ) 0.497 
10 δ(xj , aj ) 0.609 
δ(yj , bj ) 0.654 
11 δ(xj , aj+1 ) 0.660 
δ(yj , bj+1 ) 0.565 
 
 
Shown results proves the high potential of using proposed design as A-PUF 
implemented in FPGA. 
 
4    Hardware Overhead Analysis 
 
As shown above, proposed structure of an SPB  fully fits the LUT6  architecture. 
This  approach provides significant reduction of hardware resources in  FPGA 
chip.  TPG and  AB   blocks bring  negligible hardware overhead to  the  overall 
design comparing to the SPB. Implementation of the TPG block usually requires 
three LUT blocks and three flip-flops, AB  uses only one flip-flop if implemented 
as in [4]. The major hardware resources are required for SPB. Proposed architec- 
ture requires double less LUT blocks comparing to the classical A-PUF design. 
Table 2  presents  a   comparison  for  two  implementations  of   A-PUF  in 
XC7A100T FPGA,  namely classical A-PUF (G0 ) and a proposed one (G1 ). As 
shown in table,  proposed A-PUF switching blocks architecture requires double 
less hardware for its implementation. This design can be implemented in FPGA 
based on LUT blocks with more than 6 inputs [7]. 
 
 
Table 2. Hardware overhead for A-PUF implementation in Xilinx-7  FPGA (n = 128). 
 
FPGA resource # Utilized  blocks Total  # of blocks % of hardware overhead 
G0 G1 G0 G1 
Slices 131 69 15,850 0.83 0.44 
Flip-Flops 4 4 126,800 0.0003 0.0003 
LUT6 259 131 15,850 1.63 0.83 
FPGA Based Arbiter  Physical  Unclonable  Function  Implementation 223  
 
5    Arbiter PUF Figures of Merit Analysis 
 
The  modeling of  two  approaches for  A-PUF  implementation for  XC7A100T 
FPGA have been conducted using Xilinx ISE  14.7 CAD [9]. These approaches 
have been post place-route simulated as 16-bit A-PUFs. AB  has been imple- 









A-PUF design and testbenches have been designed using Verilog. Testbenches 
provide feeding of all possible 2n  challenges to the inputs, generation of a test 
signal and response values analysis. Furthermore, testbenches have been designed 
in order to analyse delay difference Δ(xn−1 , yn−1 ) between rising edges of two 
copies of a test signal, propagated from the outputs xn−1 and yn−1 to the inputs 
of AB. 
As shown in Fig. 5, the curve of Δ(xn−1 , yn−1 ) values for A-PUF-16-N (pro- 
posed design) is much  more symmetric in  both  axes.  Furthermore,  nonlinear 
dependency between coordinates makes challenge-response  relation much more 
complicated comparing to the classical A-PUF design. Asymmetry in A-PUF-16 
(classical design) graph implies that the set of challenge-response pairs is unbal- 
anced, i.e.  probability of zero response bit is much bigger than the probability 
of one. 
Furthermore,  the  range of delay values Δ(xn−1 , yn−1 ) has shrunken. As  a 
result,  the  range for  classical  A-PUF  design is  [−3772;  3396] ns  and  for the 
proposed architecture with the same n, this range has become much more narrow 
[−832;  943] ns. This  fact  can be explained by decreasing length for symmetric 
paths as number of LUT blocks is also decreased. Inter-chip uniqueness has been 
estimated for both, A-PUF-16 and A-PUF-16-N circuits. 
To  calculate this metric eight identical components of A-PUF on each chip 
have been simulated and compared on the same set of challenges. This  metric 
is estimated as  0.489 for A-PUF-16  and  0.473 for A-PUF-16-N,  respectively. 








Fig. 7. Graph  of Δ(xn−1 , yn−1 ) depending on challenge values for A-PUF-16-N design. 
 
 
The  ideal value for uniqueness figure of merit is 0.5. Real  values of reliability 
and uniqueness strongly depend on parameter n, AB  implementation and can 
be actually measured only on real hardware [10] by feeding the same challenges 
multiple times. 
Figures 6 and 7 show functional dependency of values Δ(xn−1 , yn−1 ) on chal- 
lenge values,  sorted in  ascending order by  the  value  of C for A-PUF-16 and 
A-PUF-16-N circuits, respectively. 
As  shown in figures, circuit A-PUF-16-N has better randomness and values 
of Δ(xn−1 , yn−1 )  are less correlated to  the  challenge values C . This  fact  can 
increase the complexity of the machine learning modeling for an attacker [3]. 
 
 
6    Future Works 
 
The  proposed design still utilizes two LUT6  for implementation of SWj  block. 
Alternatively, this block can be synthesized based on two-output LUTs (LUT6  2 




or CFGLUT5 in Xilinx 7 Series FPGAs [11]). Both  of these LUT blocks utilize 
one of the inputs to  configure the outputs,  i.e.  if the passed value is ‘0’  then 
both outputs produce the same value, otherwise the output values are different. 
Since one of the inputs has to be used in order to provide two distinct outputs, 6 
input values for SWj  and SWj+1  cannot be fit to 5 available inputs. Therefore, 
it is more promising to use CFGLUT5 block for a single-LUT  SWj  block imple- 
mentation as it also provides dynamically reconfigurable memory.  This  feature 
of the CFGLUT5 makes PUF reconfigurable and increases its uniqueness and 
security [12]. 


























0 ... 15 
I2 
I3 






















Fig. 8.  Circuit  implementation  of SW  block using CFGLUT5. 
 
 
The  input  values aj , bj   and  c
j
 are mapped to  the  inputs  I0,  I1  and  I2, 
respectively.  The input I3 is fed with ‘0’ as it is not utilized for switching block 
implementation. The input I4 gets the value of ‘1’ as LUT block should have two 
different outputs O5 and O6 which are mapped to the output values xj  and yj . 
The CFGLUT5 can be also dynamically reconfigured using serial port CDI and 
reconfiguration clock CLK. Thus,  switch block can be reconfigured to different 
paths increasing uniqueness and decreasing the vulnerability to machine learning 
attacks. 
 
7    Conclusion 
 
This  paper presents a new architecture for symmetrical paths block in arbiter 
based physical unclonable function for efficient implementation in FPGA. Pro- 
posed approach utilizes the internal configuration of embedded LUT blocks and 
226 A. A. Ivaniuk  and S.  S.  Zalivaka  
 
saves valuable hardware resources as well as enhances uniqueness and random- 
ness of a classical A-PUF design. 
All  experimental data described in this paper has been obtained using CAD 
Xilinx  ISE  14.7 [9] and Verilog hardware description language. The results are to 
be verified on a real hardware implementation to compute true values of inter- 
chip uniqueness and randomness. The proposed design has to be also verified on 
vulnerabilities related to machine learning based modeling attacks.  The circuit 






1.  Zalivaka, S.S.,  Zhang,  L., Klybik, V.P.,  Ivaniuk,  A.A.,  Chang, C.-H.: Design  and 
implementation of high-quality physical unclonable functions for hardware-oriented 
cryptography.  In:  Chang, C.-H.,  Potkonjak, M.  (eds.)  Secure System  Design  and 
Trustable  Computing, pp. 39–81. Springer, Cham  (2016). https://doi.org/10.1007/ 
978- 3- 319- 14971- 4 2 
2.  Ivaniuk,  A.A.: The  design of embedded digital  devices and systems,  337 p.  Best- 
print,  Minsk  (2012). (in Russian) 
3.  Zalivaka, S.S., Ivaniuk,  A.A., Chang, C.-H.: Reliable and modeling attack  resistant 
authentication  of arbiter PUF in FPGA implementation with trinary quadruple 
response. IEEE Trans.  Inf.  Forensics Secur.  14(4),  1109–1123 (2018). https://doi. 
org/10.1109/TIFS.2018.2870835 
4.  Zalivaka, S.S.,  Puchkov,  A.V.,  Klybik, V.P.,  Ivaniuk,  A.A.,  Chang, C.-H.: Multi- 
valued arbiters for quality  enhancement of PUF responses on FPGA implementa- 
tion. In: Proceedings Asia and South Pacific Design Automation  Conference (ASP- 
DAC 2016), Macau, China,  January  2016, pp.  533–538 (2016). (Invited  paper). 
https://doi.org/10.1109/ASPDAC.2016.7428066 
5.  Hori, Y., Yoshida, T., Katashita, T., Satoh,  A.:  Quantitative and statistical  perfor- 
mance evaluation of arbiter physical unclonable functions on FPGAs. In: Proceed- 
ings International  Conference  “Reconfigurable  Computing and FPGAs”,  Mexico, 
pp. 298–303 (2010). https://doi.org/10.1109/ReConFig.2010.24 
6.  Becker, G.T.: On the pitfalls of using Arbiter-PUFs as building blocks. IEEE Trans. 
Comput.-Aided Des.  Integr.  Circuits  Syst.  34(8),  1295–1307  (2015). https://doi. 
org/10.1109/TCAD.2015.2427259 
7.  Series FPGAs Data  Sheet:  Overview.  https://www.xilinx.com/supp ort/ 
documentation/data  sheets/ds180 7Series Overview.pdf . Accessed 13 Feb  2019 
8.  Morozov, S.,  Maiti, A., Schaumont,  P.: An  analysis of delay based PUF implemen- 
tations  on FPGA. In:  Sirisuk,  P., Morgan,  F.,  El-Ghazawi, T.,  Amano,  H.  (eds.) 
ARC 2010. LNCS, vol.  5992, pp.  382–387. Springer,  Heidelberg (2010). https:// 
doi.org/10.1007/978- 3- 642- 12133- 3 37 
9.  ISE  Design Suite.  https://www.xilinx.com/pro ducts/design- tools/ise- design- suite. 
html.  Accessed 20 Feb  2019 
10.  Nexys   4  Artix-7   FPGA  Trainer   Board:   https://store.digilentinc.com/nexys- 4- 
artix- 7- fpga- trainer- board- limited- time- see- nexys4- ddr.  Accessed 20 Feb  2019 
11.  Xilinx 7 Series FPGA Libraries Guide for Schematic  Designs: https://www.xilinx. 
com/support/documentation/sw  manuals/xilinx13  2/7series scm.pdf .    Accessed 
05 Sept 2019 
FPGA Based Arbiter  Physical  Unclonable  Function  Implementation 227  
 
12.  Katzenbeisser, S., et al.:  Recyclable  PUFs:  logically reconfigurable PUFs.  In: Pre­ 
neel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 374-389. Springer, 
Heidelberg  (2011). https:/ / doi.org/ 10.1007/978-3-642-23951-9_25 
