Physical Unclonable Functions (PUFs) have emerged as a promising solution to identify and authenticate Integrated Circuits (ICs). In this paper, we propose a novel NAND-based Set-Reset (SR) Flip-flop (FF) PUF design for security enclosures of the area-and power-constrained Internet-of-Things (IoT) edge node. Such SR-FF based PUF is constructed during a unique race condition that is (normally) avoided due to inconsistency. We have shown, when both inputs (S and R) are logic high ('1') and followed by logic zero ('0'), the outputs Q and Q can settle down to either 0 or 1 or vice-versa depending on statistical delay variations in cross-coupled paths. We incorporate the process variations during SPICE-level simulations to leverage the capability of SR-FF in generating the unique identifier of an IC. Experimental results for 90nm, 45nm, and 32nm process nodes show the robustness of SR-FF PUF responses in terms of uniqueness, randomness, uniformity, and bit(s) biases. Furthermore, we perform physical synthesis to evaluate the applicability of SR FF PUF on five designs from OpenCores in three design corners. The estimated overhead for power, timing, and area in three design corners are negligible.
I. INTRODUCTION
Due to the horizontal business model and vertical disintegration of IC design, most of ICs' manufacturing and testing of fabless design houses are performed in foreign foundries. In the heart of this design ecosystem, original IP owners face several security challenges including overproduction, counterfeiting, authentication, and trust in manufactured products. Among all the possible existence of security solutions, Physical Unclonable Function (PUF) acts as one-way function that can map certain stable inputs (challenges) to pre-specified outputs (responses). Although cryptography algorithms have been put into practice to perform the authentication, they are difficult to upload due to recent attacks [1] , [2] . Moreover, the deployment of computationally intensive cryptographic algorithms in resourceconstrained IoT devices limit their wide adoption. In contrast, PUF utilizes inherent silicon variations. If a similar design is manufactured onto two different dies, process variations would act differently within and across both dies and this forms the basis for a PUF. Ideally, a PUF implementation should be lowcost, tamper-evident, unclonable, and reproducible. The PUF response also need to be invariant to environmental variations In recent years, a wide variety of PUF architectures have been investigated that can transform device properties (e.g. threshold voltage, temperature, gate length, oxide thickness, edge roughness) to a unique identifier of certain length. Metastability in cross-coupled paths have been exploited to design PUF with SR latch [3] - [5] and Ring Oscillator (RO) [6] . Although latch-based PUF designs offer unique signatures to ICs, they suffer from signal skew and delay imbalance in signal routing paths and Error Correction Code (ECC) circuitry is commonly employed to post-process the instable PUF responses [7] . On the contrary, RO-PUF in [6] incurs significant area overhead that includes a counter, an accumulator, and a shift register. These serve as a motivation to harvest deep-metastability in bi-stable memory, SR FF, to design a low-cost PUF and highquality CRPs.
In this paper, we design and analyze a novel SR FF based PUF. For a NAND gate based SR FF, the input condition for S(Set) = '1' and R(Reset) = '1' must be avoided as it produces an inconsistent condition. When S=R='1' is applied followed by S=R='0', the outputs Q and Q would undergo race condition. Due to manufacturing variations, the state due to race condition will settle in either '0' or '1'. Further, due to intrachip process variations, some flip flops in a chip will settle in '0' state, while others will settle in '1' state. In addition, due to inter-chip variations, such signature will be different across the chips. We investigate delay variations in NAND gates of the feedback path that affect most the gate delay. We validate the proposed idea with SPICE-level simulations for 90nm, 45nm, and 32nm process nodes to establish the robustness of the proposed PUF responses for 16-, 32-, 64-, and 128-bit responses. We also perform layout-level simulation with foundry data on five designs that incorporate SR-FF and present their figures of merit (power, timing, and area). In summary, we present the following novel contributions:
• utilizes SR-FFs already present in the register of a design without any ECC and helper data. The responses are free from multiple key establishments round that can thwart reliability based attack. • input dependent random yet stable binary sequence aided by unpredictable manufacturing variability. Depending on input challenges, only a fraction of SR-FFs would be utilized to create unique device signature. Therefore, it would increase the attacker reverse engineering effort to determine the exact location of such SR-FFs that participate in PUF responses generation. • a centroid architecture such that surrounding transistor variations would only affect PUF responses and evaluate the associated overhead through layout-based synthesis.
The rest of the paper is organized as follows: Section II provides background on the types of PUFs using metastability. Section III describes the construction of SR flip-flop based PUF design. Section IV reports in detail the experimental results. Finally, Section V draws the conclusion and future work.
II. BACKGROUND AND RELATED WORK
A PUF is a digital fingerprint that serves as a unique identity to silicon ICs and characterized by inter-chip and intra-chip variations. Inter-chip offers the uniqueness of a PUF that helps to conclude that the key produced for a die is different from other keys. Intra-chip determines the reliability of the key produced that should not change for multiple iterations on the same die. For a signal, metastability occurs when the specifications for setup and hold time are not met and unpredictable random value appears at the output. Although metastable is an unstable condition, due to process variations, such metastability generates a stable but random state (either '0' or '1'), which is not known apriori.
Transient Effect Ring Oscillator (TERO) PUF [6] utilizes metastability to generate the responses with a binary counter, accumulator, and shift register. Although the architecture is scalable, it requires large hardware resources. Su et al. [5] , [8] presented cross-coupled logic gates to create a digital ID based on threshold voltage. The architecture is composed of latch followed by a quantizer and a readout circuit to produce the PUF ID. However, readout circuit is generally expensive and limits its application to the low-power device. FPGA-based SRlatch PUF has been presented in [3] , [4] . Due to temporal operating conditions, ECC is employed to reliably map one-to-one challenge-response pair in both approaches. To alleviate powerup values from memory-based PUF, registers based on edgetriggered D-FF are proposed in [5] , [9] . The authors suggested to include an expensive synchronizer in Clock Domain Signal (CDC) signals to get stable PUF response. A framework of 'body-bias' adjusted voltage on SR-latch timing using FD-SOI technology is presented in [10] . To get correct PUF response, authors employed buffers along the track of top and bottom of latches that suffer from responses biasedness.
The majority of works utilizing metastability to design PUF employ additional hardware to count the oscillation frequency. Our work is unlike these previous studies in that it (a) employs SR-FF to construct low-cost PUF and (b) reuses the SR-FF already in the original IP by varying channel length and temperature to account for intra-and inter-chip variations.
III. PROPOSED SR FLIP FLOP PUF
Our approach presents a PUF design that relies on the crosscoupled path in an SR-FF configuration. Each bit of a PUF response can be extracted from metastability induced random value in the output (Q) due to a particular input sequence at SR-FF. This random value would eventually evaluate to a stable logic due to process variability. A clock enabled crosscoupled NAND-based SR-FF construction is shown in Fig.  1 . It does not require additional synchronizer to control the input conditions. Set-Reset (SR) Latch has the forbidden input combination, namely, S=R=1 which results in both Q and Q equal to 1. After S=R=1 input, if we lower both inputs (S=R=0), there is a race condition between the two cross-coupled NAND gates (ND1 and ND2) making Q and Q to linger around V dd 2 value. Although such race condition is prohibited during normal circuit operation, it can influence the output to generate a state determined by the mismatch in the underlying device parameters (transistor length, threshold voltage). Analysis of the race behavior is seemingly dependent on precise phase relation between clock and input data. We exploit such input-referred event sequence to generate PUF response. Fig. 2 shows the architecture of n-bit array SR-FFs with an input multiplexer (MUX) to select either PUF or regular mode. As each SR-FF would generate a single bit key, we can obtain a PUF signature of the maximum size of FF instances. However, it suffers from multiplexer output that has to be sufficiently long to reach all SR-FF instances. It would also increase the delay to produce random output at Q depending on the longest distance from MUX output to an SR-FF instance. As a result, both higher wire length from MUX output and longest transition time due to metastability would decrease the timing performance of an SR-FF based PUF during regular operation. Furthermore, such architecture may be susceptible to key-guessing attack under a single clock pulse. Hence, the architecture in Fig. 2 would be biased towards variations in the connecting wire length and width. This, in turn, reduces the impact of transistors' local variation. In short, the higher the depth of PUF timing paths, the less its response would depend on transistors behavior.
SR FF
[0] 
PUF Mode Regular Mode

SR [3-0]
SR-FF
Sel.
PUF Mode Regular Mode
SR [7-4]
Mode
SR-FF
Mode
PUF Mode Regular Mode
SR [11-8]
Mode
SR-FF
Sel. Fig. 2 with additional MUXs to improve the delay and thwarting the key-guessing attack. It also results in improved bit distribution by preventing edge-effects [11] . Each multiplexer has three selector bit, of which, two would be used to select an SR-FF in a grid and the remaining would determine mode (PUF or normal) selection. A simple controller is embedded in the original architecture to aid in the signal extraction process. Depending on the number of controllable MUXs, the size of the partitions can increase or decrease.
PUF Mode Regular Mode
SR [15-12]
IV. EXPERIMENTAL RESULTS
We perform Monte Carlo (MC) simulations of SR-FF PUF at SPICE level using Synopsys HSPICE for three CMOS processes (90nm, 45nm, and 32nm) [12] . MC can perform device variability analysis within six-sigma limit, hence the Challenges-Response Pairs (CRPs) collected using MC is comparable to CRPs from manufactured ICs. We simulate the PUF structure for 1000 iterations, analogous to 1000 different dies on a 300mm wafer at nominal voltage (1V). Several works [2] , [13] , [14] in the literature have validated PUF design through SPICE level simulations. We then evaluate PUF responses according to the parameters proposed by Maiti et al. [15] . Although process variations impact the channel length, we maintain length variability within (intra-die) 15% and across (inter-die) 33% of nominal value to generate CRPs [16] . We also report the performance overhead of physical synthesis for five RTL designs [17] with centroid architecture.
Uniqueness: Uniqueness provides the measurement of interchip variation. We can measure the uniqueness by calculating Hamming Distance (HD) of two pair-wise dies. Ideally, two dies (chips) show a distinguishable response (HD ∼ 50%) to a common challenge. Fig. 4 (a-c) shows inter-chip HD of four different key sequences. For all keys, we made two thousand comparisons to verify uniqueness. One can see that the average HD for all key-lengths are close to 49%.
Reliability: We can measure the reliability from Bit Error Rate (BER) of PUF responses for intra-chip variation. Ideally, a PUF should maintain the same response (100% reliable or 0% variation) on different environmental variations (supply voltage, temperature) under the same challenge. Fig. 4(d-f) shows the intra-die HD for five key length in three process nodes different temperature (0 o C to 80 o C). The reliability (HD = 0) for 16-, 32-, 64-, and 128-bit registers are 92.3%, 92.2%, 90.7%, and 92.7% respectively.
Uniformity/Randomness: Uniformity measures the ability of a PUF to generate uncorrelated '0's and '1's in the response. Ideally, PUF should generate '0's and '1's with equal probability in a response. This ensures the resilience of guessing PUF response from a known challenge. The probability of zero is bound within 0.5 and 0.7 for four different key lengths in Fig.  5 . Although the ideal value of uniform probability should be 0.5, variability in gate delay due to process variability impacts the even distribution of '0's and '1's.
Bit aliasing/Response collision: To evaluate the bit aliasing, we use the same set of responses used in uniqueness. We see the average probability of collision less than 30% as shown in Fig. 6 . As the reference response is chosen randomly and compared to the rest of the responses, an adversary can guess, on average, less than 30% of the correct responses. Hence, the generated responses are resistant to the key-guessing attack.
Physical Synthesis Analysis: We report the physical synthesis results of designs from OpenCores [17] . We perform the logic synthesis with Synopsys Design Compiler and the layout (floor planning, placement, and routing) of the mapped netlist using Synopsys IC Compiler. We evaluate the area, power, and timing overhead for SAED 90nm technology in three design corners, namely, best, typical and worst. Table I lists the required resistance and capacitance (routing and parasitic) values during cell characterization for achieveing metastable state. The inter-transistor routing across all wireload models are presented in Table II . The capacitance values include both routing and parasitic capacitance. We vary input voltages (0.7V-1.32V) with (on_chip_variation) enabled during synthesis. It confirms that the responses are not biased towards a particular input voltage value and adversary can not further tamper the device responses with aggressive supply voltage changes. We maintain a 4by4 grid across all designs to implement centroid architecture and distribute it randomly. Depending on the dimension of the grid, the total number of the grid would grow or shrink. Following that, we report the overhead after physical synthesis in Table III . The number of bits in Table III represent the possible key length of design. Across different wireload models of a particular design corner, we observe more delay and power variations due to variable resistance and capacitance. For 8-bit uP, the centroid architecture is adjacent to high-activity logic, hence we see increased PPA overhead. In the remaining designs, best-case minimizes the area and delay overhead and during worst-case, we see a reduction in power overhead.
V. CONCLUSION
In this work, we have proposed to use the existing SR flip-flop in the design to quantify its race condition for PUF implementation. We embed a centroid architecture with SR FFs so that PUF responses conform to local transistor variations only. The generated responses exhibit better uniqueness, randomness, reliability and reduced bit-aliasing compared to other metastability-based PUFs. For future work, we would evaluate the uniqueness of SR-FF PUF responses from transient noise based simulation and the resilience against adversarial machine learning attack. 
