Abstract-Strong physical unclonable function (PUF) is a promising solution for device authentication in resourceconstrained applications but vulnerable to machine learning attacks. In order to resist such attack, many defenses have been proposed in recent years. However, these defenses incur high hardware overhead, degenerate reliability and are inefficient against advanced machine learning attacks. In order to address these issues, we propose a dynamic multi-key-selection obfuscation for strong PUFs (DMOS-PUF) to resist machine learning attacks. The basic idea is that several stable responses are derived from the PUF itself and pre-stored as the obfuscation keys in the testing phase, and then a true random number generator is used to select any two keys to obfuscate challenges and responses with simple XOR operations. When the number of challenge-response pairs (CRPs) collected by the attacker exceeds the given threshold, the obfuscation keys will be updated immediately. In this way, machine learning attacks can be prevented with extremely low hardware overhead. Experimental results show that for a 64×64 Arbiter PUF, when 32 obfuscation keys are used and even if 1 million CRPs are collected by attackers, the prediction accuracies of Logistic regression, support vector machines, artificial neural network, convolutional neural network and covariance matrix adaptive evolutionary strategy are about 50% which is equivalent to the random guessing.
I. INTRODUCTION

A. Background and Motivation
Internet of things (IoT) is the network of physical devices, vehicles, home appliances and other items embedded with electronics, software, sensors, actuators, and the connectivity enables these objects to connect and exchange data. According to IHS forecast [?] , the global installed IoT devices will increase from 15. the sustainable development of the IoT. Secret key storage and device authentication are the two key technologies to address IoT security issues. Traditional security mechanisms store secret keys in electrically erasable programmable readonly memory (EEPROM) or battery-backed non-volatile static random access memory (SRAM), and implement information encryption and authentication with cryptographic algorithms. However, in many IoT applications, resources like CPU, memory, and battery power are limited so that they cannot afford the classic cryptographic security solutions. Therefore, lightweight solutions for IoT security are urgent.
Physical unclonable function (PUF) is an alternative solution for low cost key generation and device authentication. It is a physical entity that is embodied in a physical structure and easy to manufacture and evaluate but practically impossible to duplicate, even with the exact same manufacturing process [2] . In the past decade, intensive study has focused on PUFs and lots of PUF structures are proposed such as Arbiter PUF [3] - [5] , SRAM PUF [6] and ring oscillator (RO) PUF [7] . These PUFs can be classed into the strong PUF [4] , [10] , [13] , [29] and weak PUF [6] , [7] , [27] , [28] . Weak PUFs exhibit only a small number of challenge-response pairs (CRPs) which can be used as a device unique key or seed for conventional encryption systems. On the other hand, strong PUFs such as Arbiter PUF can provide a huge number of unique CRPs, which enables the strong PUFs suitable for lightweight device authentication. However, current strong PUFs are vulnerable to machine learning (ML) attacks that attackers can collect a certain number of CRPs from the communication channel to model (clone) the PUF structure [8] . For example, for a 64×64 Arbiter PUF, the predication accuracy of trained soft model can reach up to 99.9% when 18050 pairs of CRPs are used. The cloned soft PUF exhibits almost the same challengeresponse behavior as the hardware one.
B. Limitations of Prior Art
In order to resist ML attacks, many defenses are proposed and can be roughly classed into structural non-linearization and CRP obfuscation. Structural non-linearization methods [3] , [9] , [10] are to implement the nonlinear mapping of CRPs by designing the specific non-linearization PUF structures. However, the vast majority of existing ML-resistant PUFs reduce the reliability of responses largely and can be modeled with high accuracy [8] , [12] . CRP obfuscation methods [7] , arXiv:1806.02011v3 [cs.CR] 7 Dec 2018 [13] - [17] are to prevent ML attackers from collecting enough effective CRPs to model PUF via obfuscating the mapping of CRPs. However, current CRP obfuscation methods share several weaknesses: 1) vulnerability to advanced ML attacks such as CMA-ES [18] , [19] ; 2) prohibitively expensive obfuscation structures such as hash functions; 3) reducing the reliability of strong PUFs.
C. Our Contributions
To solve the limitations of prior art, this paper proposes a universal low overhead dynamic multi-key-selection obfuscation for strong PUFs (DMOS-PUF) without reducing the reliability. In addition, the DMOS-based authentication protocol is proposed. The main contributions of this paper are as follows.
1) Universality. The DMOS proposed in this paper can be used for all strong PUFs to resist ML attacks. 2) Low overhead. The DMOS just uses XOR logic, a true random number generator (TRNG) and several stable responses derived from the PUF itself to obfuscate the mapping of CRPs, which incurs negligible hardware overhead. 3) No effects on reliability. Challenges and responses are obfuscated with the keys by bitwise XOR operation, and the obfuscation keys are the stable CRPs derived from the PUF. Therefore, the reliability of PUFs will not be reduced. 4) High efficiency. We have evaluated five ML attacks including Logistic regression (LR), support vector machines (SVM), artificial neural network (ANN), convolutional neural network (CNN) and covariance matrix adaptive evolutionary strategy (CMA-ES). The experimental results show that the prediction accuracies are about 50% even if 1 million CRPs are collected by attackers when 32 keys are used. 5) Resist all ML attacks. The DMOS selects any two keys randomly from the key set to obfuscate the map relationship of CRPs. When the number of CRPs collected by attackers exceeds the threshold we preset, the keys will be updated immediately. In this way, DMOS is able to resist existing ML attacks effectively. The rest of this paper is organized as follows. Related work is elaborated in Section II. Section III introduces some related definitions, concepts and terminologies. Section IV gives a detailed introduction about our proposed DMOS. Experimental results and analysis are reported in Section V. In section VI, we compare the DMOS with several recent proposed defenses. Finally, a conclusion is made in Section VII.
II. RELATED WORK
In 2004, Lee et al. [4] demonstrated that the ML attack is a great threat to strong PUFs for the first time. Later, several ML attacks have been proposed to model various strong PUFs [8] , [12] . In order to resist such attack, many defenses have been proposed in recent years. These defenses can be roughly classed into structural non-linearization and CRP obfuscation.
A. Structural Non-linearization
Structural non-linearization is to implement nonlinear PUF structures to obstruct the ML-based modeling. Feed forward Arbiter PUF (FFA PUF) [3] , current mirrors PUF [9] and voltage transfer PUF [10] are typical non-linearization PUF structures. In the FFA PUF structure, the racing results of the previous multiplexer stages is feed forward to the following one or more select lines (challenge bits), thus making the Arbiter PUF structure non-linear. The current mirror PUF is to transmit the current through identical non-linear current mirrors, while voltage transfer PUF is to implement a nonlinear voltage transfer function. These non-linear PUF structures can resist traditional ML attacks such as LR effectively. However, the reliability is reduced due to the non-linearization of PUF structure. In addition, these non-linear PUF structures are vulnerable to ES-based modeling attacks.
B. CRP Obfuscation
CRP obfuscation can hide the mapping of CRPs to prevent attackers from collecting valid CRPs to model strong PUFs. Some typical obfuscation methods have been proposed to obfuscate challenges and/or responses with XOR gates, hash functions and random bits.
1) XOR gates: XOR is a simple and efficient obfuscation method against machine leaning attacks. Suh et al. [7] proposed a XOR Arbiter PUF that the outputs of multiple parallel Arbiter PUF structures are XORed together to generate 1-bit more secure response. XOR Arbiter PUF improves the ability of resisting machining learning because the mapping of CRPs is obfuscated with the XOR gates at the expense of reduced reliability and high hardware overhead. Wang et al. [11] propose a feedback structure to XOR the PUF response with the challenge, which ignores the reliability of PUF responses. Majzoobi et al. proposed the lightweight secure PUF [13] where complex challenge mapping that derives the individual challenges from a global challenge is applied to multiple parallel Arbiter PUF structures and then multiple individual responses are XORed to produce a multi-bit response. Unfortunately, these XOR-based obfuscation methods have been broken by SVMs and ES when the number of parallel PUFs is no more than eight [8] , [12] , [22] .
2) Hash functions: Controlled PUF [14] is to obfuscate the mapping of CRPs with the hash function. Since both challenges and responses are hashed, the real challenges and responses can't be accessed. However, the hash and errorcorrection code (ECC) blocks introduce significant area and power overhead, and the use of helper data will make the PUF vulnerable to side-channel attacks [18] , [22] . In order to reduce overhead, Gao et al. [15] proposed a PUF-FSM obfuscation method that removes the hash logic on challenges and replace the ECC unit with FSM. However, the hash logic for responses still incurs large overhead, and the PUF-FSM obfuscation has been broken by the variant CMA-ES [19] .
3) Random bits: Yu et al. proposed a lockdown technique [24] which uses a query mechanism to restrict the number of available CRPs for attackers to clone PUF. However, the number of CRPs for authentication is limited. Majzoobi et al. [16] proposed a Slender PUF that randomly selects a substring of the response and fill it with a random binary string to ensure that its length is the same as the full response, then the server exploits a recovery method to match the substring selected randomly and authenticate the legality. Ye et al. [17] proposed a RPUF that randomizes challenges with a random number generator (RNG) before inputting to the strong PUF to prevent ML attacks. However, Slender PUF and RPUF are vulnerable to advanced ML attacks such as CMA-ES [18] , [19] . As discussed above, existing structural non-linearization methods and the vast majority of CRP obfuscation methods degenerate the reliability of PUFs. CRP obfuscation incurs prohibitive overhead. Moreover, structural non-linearization and CRP obfuscation are not completely immune to advanced ML attacks. This paper proposes a low overhead dynamic multi-key obfuscation technique to resist machine leaning attacks without reducing the reliability.
III. PRELIMINARIES
This section will introduce some terminologies and concepts used in this paper and some more detailed definitions will be given when necessary. Throughout this paper, we employ the symbols and terminology shown in Table I .
A. Notation 1) HD, FHD, mean HD Hamming distance (HD). For the L-bit binary strings X and Y, the HD between X and Y is defined as:
Fractional Hamming distance (FHD). The Fractional Hamming distance between X and Y is defined as: Mean of pairwise HD. Given a set C containing multiple binary strings, the mean of pairwise HD of C is defined as:
where the binary strings C i ∈ C, C j ∈ C and i =j.
2) Inter-HD, Intra-HD,P inter ,P intra Inter-HD and intra-HD are used to describe the statistical characteristics of PUF responses. The definitions of inter-HD and intra-HD are as follows.
Inter-HD. Inter-HD indicates the HD between the responses generated by two different PUF instances when the same challenge is input. Inter-HD is used to measure the uniqueness of PUF.
Inter-HD
where R 1 and R 2 are generated by any two different PUF instances when inputting the same challenge. Intra-HD. Intra-HD indicates the HD between the responses generated by the same PUF instance when the same challenge is input. Intra-HD is used to measure the reliability of PUF.
Intra-HD
where R X and R Y are generated by the same PUF instance when inputting the same challenge in different environments. Since both inter-HD and intra-HD distributions follow a binomial distribution, B(n,p), the binomial probability estimator of inter-HD and intra-HD distributions arê
whereP inter is the probability of R 1 = R 2 ,P intra is the probability of R X = R Y .
B. Strong PUFs
Strong PUFs can generate a large number of CRPs, scaling exponentially with the required IC area [25] . Arbiter PUF [4] , lightweight secure PUF [13] and current mirror PUF [10] are typical strong PUFs. Among them, Arbiter PUF [4] is the most popular one.
The structure of original Arbiter PUF is shown in Fig.  1 , two parallel n-stage multiplexer chains share the same input, and the outputs are connected to a flip-flop's D input and clock input, respectively. A step input signal T is given at the input side, and the selection side of the multiplexer chain forms the challenge input bits C 1 ∼ C n . The selection signal C i determines whether the step signals at the stage i of the multiplexer chain are transmitted along the original multiplexer chain or the step signals on the two parallel chains are interchanged. The delay difference in the upper and lower multiplexer chains determines whether the step signal will first reach the D input or the clock input of the flip-flop, resulting in logic 1 or logic 0 being latched respectively. The latched value will be 1 bit for a response. This Arbiter PUF structure only generates a 1-bit response for each challenge. However, multi-bit responses are required in practical applications. There are two ways to generate multi-bit responses. One is to use a linear feedback shift register (LFSR) or a module with similar function to extend the challenge which will be set as the seed of the random number generator to generate multiple subchallenges. Then these sub-challenges will be input to the PUF circuit one by one to generate multi-bit responses. Another is to use multiple challenges input to the Arbiter PUF one by one to get multiple bits response, but which incurs high hardware overhead. This paper uses multiple challenges to generate multi-bit responses.
The functionality of Arbiter PUF can be represented by an additive linear delay model [8] , [26] , [31] . When modeling an Arbiter PUF, the total delay of the signals is the cumulative sum of the delay in each stage. In this model, we can define the final delay difference ∆ between the upper and the lower path as:
where ω = {ω 1 , ω 2 , ..., ω n , ω n+1 }, the dimensions of ω and φ are both n + 1; the eigenvector ω represents a function with the n-bit challenge [8] [26] [31] . The parameter vector ω represents the delay of each stage in an Arbiter PUF; ω
denotes the delay of the multiplexer M i , where σ i 1 means that the signal is crossed in the M i , while σ i 0 means uncrossed. In addition,
where
.., n. The output of Arbiter PUF t is determined by the sign function acting on the total delay difference. And for convenience, we replace t = 0 with t = −1:
Eqn. (10) indicates that the vector ω determines a separate hyperplane in all the eigenvectors by ω T φ = 0. When t = −1, all eigenvectors are on one side of the hyperplane. Conversely, when t = 1, all eigenvectors are on the other side. Hence, the response of Arbiter PUF can be predicted by the obtained hyperplane.
C. ML Attacks
1) Logistic Regression (LR)
LR [32] is a frequently used supervised ML method. When LR is used to model the Arbiter PUF, each challenge C = {C 1 , ..., C n } is given a probability P(C, t| ω ) that generates a response t(t ∈ {−1, 1}). As a technical convention, t ∈ {0, 1} is replaced by t ∈ {−1, 1}. Since the vector ω denotes the delays of the subcomponents (stages) in the Arbiter PUF, the probability P (C, t| ω) is obtained by the logistic regression sigmoid function acting on f ( ω):
For the training set M , the parameter vector ω is adjusted to determine the decision boundary to minimize the negativelikelihood:
As there is no suitable way to find the optimal ω directly, the iterative method such as the gradient descent algorithm is used to solve this problem:
We have tested several optimization methods including standard gradient descent, iterative reweighted least squares and Rprop [32] [33] , where RProp gradient descent works best in LR. The classification object of LR is not required to be linearly separable, but the loss function must be differentiable.
2) Support Vector Machines (SVM) SVM [32] can perform binary classification and solve the classification tasks by mapping known training instances into a higher-dimensional space. The goal of SVM training is to find the most suitable separation hyperplane and solve the nonlinear classification tasks that cannot be linearly separated in the original space. The separation hyperplane should keep the maximum distance with all vectors of different classifications as much as possible. The vector with the smallest distance to the separation hyperplane is called the support vector. The separation hyperplane is constructed by the two parallel hyperplanes with support vectors of different classifications. The distance between the hyperplanes is called the margin. The key of constructing a good SVM is to maximize the margin while minimizing classification errors and the whole process is regulated by the regularization coefficient λ.
In well-trained SVMs, kernel functions are often used to solve the problem of support vector selection and classification. There are three frequently-used kernel functions: 1) linear: K(w, z) = z T w (only solves linearly separable problems); 2) radial basis function (RBF):
Training a good SVM classifier always requires to adjust regularization coefficient λ, σ 2 (RBF) or (α, β) (MLP). In our experiments, we use the SVM with RBF kernel function to model PUF.
3) Convolutional Neural Network (CNN) CNN [34] has been used in image classification widely and achieved great success in graphic recognition such as
Raw data Transformed data Extended data
Fig. 2. CRPs data transformation and extension
handwritten digits [35] , traffic signs [34] . CNN finds the association model between images and classifications by learning the training set. CNN consists of an input and an output layer, as well as multiple convolutional layers, pooling layers, fully connected layers. When classifying a target, multiple convolutional layers and pooling layers are required and they are arranged alternately. Each neuron in the convolution layer is connected to its input locally, and each connection is assigned a weight value. The output value of the neuron is calculated by the weighted sum of the corresponding local inputs plus the biased value. The advantage of CNN lies in the automatic extraction of features from the original pixel to the final classification, which helps to model PUF without understanding the characteristics (delays) of the PUF. In order to model PUF with CNN, we need to transform and extend the original training data (CRPs). Taking a 4-stage Arbiter PUF for example (see Fig. 2a) ), when the original challenge C 1 C 2 C 3 C 4 = 1011, the response is 1. In the process of data transformation, the challenge C 1 ∼ C 4 are from right to left, '1' indicates that the signal is on the path to a 1 , and '-1' indicates the path to the signal on a −1 . Therefore, in the training data, we use the path indicator (1 or -1) instead of the challenge (0 or 1). It should be noted that X 4 is always on the path to a 1 and Y 4 is always on the path to a −1 . As shown in Fig. 2(b) , the dimension of transformed data is changed from 1×4 to 2×4. Transformed data can make CNN easier to find the mapping relationship between the challenge and the path to establish the PUF model. However, since the transformed data do not have a spatial association like the image pixels, extracting them directly using a convolutional layer may lose characteristics. Therefore, we need to further expand the transformed data. As shown in Fig. 2(b) , the 2×4 transformed data is further expanded to the 4×4 extended data. Additionally, unlike CNN for image recognition, CNN for modeling PUF is not applicable to convolved data after a compression in pooling layers. Besides, since the pixel value of the PUF is -1 or 1, we also need to adjust the sigmoid layer to facilitate the processing of convolved data. In this way, CNN can model the PUF instance more easily.
4) Artificial Neural Network (ANN)
ANN is a self-adaptive learning system composed of interconnected computing nodes called neurons. A strong motivation to use ANN is given by the universal approximation theorem: a two-layer neural network containing a limited number of hidden neurons can fit any function with high accuracy [37] .
The simplest neural network consists of a layer with several neurons, called the single layer perceptron (SLP) [37] . For each neuron, all input vectors are weighted, added, biased and applied to an activation function to generate an output. In the SLP training process, the neuron updates its weight and bias based on the linear feedback function of the training set prediction error. When prediction accuracy of the trained model reaches the preset termination condition, e.g., the preset number of iterations, the training process will be stopped. SLP can only solve the problem which is linearly separable, and non-linear problems require multi-layer ANNs. In our experiments, we use a 3-layer ANN to model the PUF and the activation function is sigmoid, where the first layer has 35 nodes, the second layer has 25 nodes, and the third layer has 25 nodes. In addition, the loss function of ANNs is adjusted by the RProp iterative method due to its fast convergence speed. The core parameters adjusted to build an accurate ANNs are the number of layers, the number of neurons in each layer, the activation function and the optimization algorithm.
5) Evolutionary Strategy (ES)
ES [20] is inspired by genetics and evolutionary theory. ES is to generate different children randomly through the parent, and retain the best performing child as the parent of the next generation and then keep the cycle going. As the next generation inherits the best genes of the previous generations, species continue to evolve.
Since a PUF instance can be represented by the delay vector ω, the goal of modeling PUF with ES is to find the parameter vector ω as accurate as possible to simulate the real PUF instance. The key idea is to generate different PUF instances randomly and keep the most suitable PUF instance as the parent of the next generation. Such process will be repeated until the child PUF instance closest to the real PUF is generated. In the next generation, child vectors usually use most of the parent's delay vectors and adopt a few randomly mutated vectors. In the ES algorithm, the most typical mutation method is to add a random Gaussian variable N (0, σ) to each PUF instance.
The modeling accuracy is used to select the most suitable child PUF instance in this paper, and the child with the highest modeling accuracy rate will be considered most appropriate. Specifically, assuming that R and R' are the l responses generated by the physical PUF and the child PUF instance when inputting the same challenges, respectively. The modeling accuracy A can be obtained by calculating the average HD between the two binary strings R and R':
There are many variants of the ES algorithm. The main differences between them are: 1) the number of parents kept in each generation; 2) the way that children derive from the parents; 3) the way to control the random mutation rate σ. There are two general approaches to control σ. One is to reduce the mutation rate σ deterministically, and the other is to make [C m ] ... Fig. 3 . The structure of dynamic multi-key obfuscation the σ adjusted adaptively according to the current execution performance of the evolutionary algorithm. In this paper, we use the covariance matrix adaptive ES (CMA-ES) to evaluate our proposed DMOS PUF and use the default parameters in [20] . CMA-ES employs a reorganization approach where a child instance relies on multiple parent instances. CMA-ES also uses the self-adaptation, i.e., the mutation rate σ is not controlled deterministically but adapts itself automatically depending on how the ES algorithm performs. CMA-ES has better performance than original ES in modeling PUFs.
IV. THE PROPOSED DYNAMIC MULTI-KEY-SELECTION OBFUSCATION
The modeling accuracy is related to the number of CRPs collected by attackers and the complexity of the mapping relationship of the challenge and response. Generally, the more complex the mapping relationship is and the less the number of CRPs collected, the lower the modeling accuracy will be. Our goal is to complicate the mapping relationship of the challenge and response and limit the number of CRPs collected by attackers to resist modeling attacks. This paper proposes a universal dynamic multi-key obfuscation structure that uses the PUF's own multiple stable responses generated as the obfuscation keys, and then two keys will be selected randomly from them by a TRNG to XOR the strong PUF's challenge and response. Furthermore, the dynamic key-updating mechanism will update the obfuscated keys when the number of CRPs collected by the attacker reaches the preset threshold. In this way, ML attacks can be prevented efficiently. As shown in Fig. 3 , the structure of dynamic multi-key obfuscation consists of the XOR logic, TRNG, non-volatile memory (NVM) and registers. The concrete working principle of DMOS is as follows.
First, some stable CRPs for the PUF are collected by testing, then m × n CRPs are selected from these stable CRPs, and their challenges are stored in the non-volatile memory on the chip. Second, the stored challenges are input to the PUF circuit one by one, and the generated responses are used as the obfuscation key set which will be temporarily stored in registers. Third, the server generates a challenge set (n challenges) randomly and send them to the PUF chip, and then the TRNG will select two keys randomly from the registers to XOR with all these challenges and response of the PUF, respectively. Finally, the obfuscated response will be sent to the server for authentication.
In the DMOS, both the challenges and response are obfuscated to get the best ability to resist machine leaning. In addition, since challenges and response are obfuscated with the key by bitwise XOR operation, the stability will not be affected. Therefore, the DMOS can not only resist ML attacks effectively, but also address the issue that the reliability of PUFs is declined in structural non-linearization methods and CRP obfuscation methods. The process of DMOS-based device authentication includes the device-side obfuscation and the server-side authentication.
A. Device-side Obfuscation
Device-side obfuscation includes the preparation phase and the obfuscation phase.
1) Preparation phase: After the PUF is manufactured,
where C ob contains n challenges and all challenges and R ob are n-bit. C ob is stored in the NVM on the chip and CR ob is stored on the server.
2) Obfuscation phase: The server generates a random challenge set [C] (C 1 , C 2 , ..., C n ), at the same time, the server calculates all possible responses by the parametric PUF model and the stored CR ob , and all the responses generated will be split into two parts R 1 and R 2 . Then, the server sends the challenges and all R 1 to the device for authentication. The DMOS selects a key Key i from the registers according to a random number generated by the TRNG. Then the set C' generated by XORing [C] with Key i is input to the PUF circuit one by one as the real challenges to generate the response R'. Finally, another key Key j will be selected from the registers randomly to perform XOR operation with R', and the new generatedR which is split into two partsR 1 andR 2 will be generated. If there is a R 1 that matches withR 1 , theR 2 will be sent to the server for authentication. In this process, the real challenge C' can be expressed as:
The response R' andR can be expressed as:
WhenR 2 is sent back to the server, the server will verify whether there is a R 2 that can match with it. If it exists, the device is legal, otherwise, the device is illegal.
For example, for a 64×64 DMOS PUF (n = 64), assume that the number of obfuscated keys in the set K is 8 and the server sends the challenge ) is generated by XORing R with Key j andR will be split intoR 1 andR 2 for authentication. Finally, in the device side, if theR 1 passes authentication, theR 2 will be sent to the server for authentication.
B. Server-side Authentication
On the server side, we use the PUF parametric model SP U F i to generate R 1 and R 2 for matching authentication. Compared with the traditional authentication method that stores all CRPs in database [7] , [25] , the use of parametric model can reduce storage overhead greatly and improve the efficiency of server authentication. The whole authentication process on the server side is shown in Fig. 4 . 1) Enrollment phase: For a Device i , the device identifier id i is stored on the one-time programmable storage (OPT-S) through the e-fuse technology. At the same time, we build an accurate PUF parametric model SP U F i with the original CRPs and store the model parameters on the server securely.
2) Authentication phase: First, the device identifier id i will be sent from the device to the server. Second, the server will compare the sizes of Counter i and N min, which is the minimum number of CRPs needed to build a model with an error rate . If the value of Counter i reaches the threshold N min, , the server sends a key update command to the device (ID = id i ) to update the key set K on the PUF chip. When the server issues a deterministic challenge set [C] and obtains m 2 responses which will be split into two parts R 1 and R 2 , the server sends the [C] and m 2 R 1 to the device according to the authentication record and updates the CRP counter Counter i . In this way, it can not only prevent attackers from using the already used CRPs to conduct replay attacks [25] , but also can update the keys to enhance the ability to resist ML attacks according to the recorded Counter i .
In each authentication event, first, when the server receives the device identifier id i , it will generate a unused challenge set [C] and compute the m C by XORing [C] with the m keys, respectively. Second, the m C will be input sequentially to the parametric model SP U F i to generate m R i . Then the m 2 R will be generated by XORing R i with the m keys respectively and all m 2 R will be split into two parts R 1 and R 2 . After that, the server will send the challenge set [C] and m 2 R 1 to the device. On the device side, [C] is input to the PUF chip to generate an n-bit obfuscated responseR which will be split into two partsR 1 andR 2 , too. In the device side, if there is FHD(R 1 , R 1 ) ≤ τ , theR 2 will be sent to the server for authentication. At last, the server will compare each R 2 with the receivedR 2 . For the m 2 possible R 2 that may be generated, if there is FHD(R 2 , R 2 ) ≤ τ , the authentication is successful, otherwise it fails.
C. Dynamic Key-updating Mechanism
We propose a dynamic key-updating mechanism for DMOS to prevent against all potential ML attacks. In this mechanism, Device Server when the number of CRPs collected by attackers reaches the minimum number of CRPs required for attackers to build a PUF model with an error rate , the server will update the key set K. Based on theoretical considerations (dimension of the feature space, Vapnik-Chervonenkis dimension), it is suggested in [8] that the minimal number of CRPs which is necessary to model a N -stage delay based Arbiter PUF with a misclassification rate can be expressed as:
According to Eqn. (18) , to model a 64-stage Arbiter PUF with the prediction accuracy 95%, the minimum number of CRPs required for attacks is N Arbiter CRP,0.05 ≈ 650. However, for the DMOS Arbiter PUF, when m keys are set for obfuscation, m 2 CRPs can be generated for each authentication. Therefore, if attackers want to build a model for the DMOS Arbiter PUF with an error rate , the minimum number of CRPs needed to be collected can be expressed as:
In this case, the probability that the attacker extracts valid N Arbiter min, CRPs from N DM OS min, CRPs will be :
According to Eqn. (20) , for a 64-stage DMOS Arbiter PUF, when m = 2 and = 5%, N DM OS min, is 2600 which is much bigger than 650 (the minimum number of CRPs required for attackers to build the model for the original 64-stage Arbiter PUF). The probability that the attacker extracts valid 650 CRPs from 2600 CRPs is 10 −630 . In addition, when the number of CRPs collected by the attacker reaches N DM OS min, , the server will send the key update command to the device to update the K on the PUF chip and server simultaneously. In this way, DMOS PUF can resist all potential ML attacks.
D. Authentication Threshold
We propose a dynamic key-updating mechanism for DMOS to prevent against ML attacks. In this mechanism, when the number of CRPs collected by attackers reaches the minimum number of CRPs required for attackers to build a PUF model with an error rate , the server will update the key set K.Based on theoretical considerations (dimension of the feature space, Vapnik-Chervonenkis dimension), it is suggested in [8] that the minimal number of CRPs which is necessary to model a N -stage delay based Arbiter PUF with a misclassification rate of can be expressed as:
where n tolerance denotes the maximum number of bit-flips allowed by the PUF response when the server matches, and the response whose number of bit-flips are not greater than n tolerance can be authenticated successfully. If τ is set to be greater than n tolerance /n, the authentication speed of the PUF responses will be faster. If τ is set to be less than n tolerance /n, the authentication speed will be slow. Therefore, the server can set τ flexibly to meet the requirements of application scenarios such as the authentication time and the security level. If the application scenario has the high requirement on the authentication efficiency, the τ can be set to be greater than n tolerance /n; if the application scenario focuses on the security level, the τ can be set less than n tolerance /n. The probability of successfully authenticating a legal PUFembedded device can be estimated as:
wherep intra is the binomial probability estimator of intra-HD distributions of the strong PUF. In order to clone a PUF successfully, the prediction accuracy of the cloned PUF model should be higher than 1 −p intra . For a 64×64 Arbiter PUF, P suc is about 99.9% when n tolerance = 10 andp intra = 5% which is measured in the worst case for the Arbiter PUF [26] . Therefore, legitimate devices have an extremely high probability of passing authentication by setting a reasonable threshold τ .
E. Security Analysis 1) Brute force attacks: To build a model with the accuracy 1 − , attackers need to create a new model based on the previous models for all possible responses generated by the challenge. Therefore, the number of models that attackers need to build to pass the authentication can be estimated by:
For example, when two keys are used to obfuscate, to model a 64-stage Arbiter PUF with the error rate 5%, the number of models that attackers need to build will be as high as 4 650 . Therefore, it is impossible for attackers to clone the DMOS PUF by brute-force attacks.
2) Replacement attacks: In DMOS, TRNG is used to select two keys randomly from the key set K to XOR with [C] and R, respectively. Therefore, for the challenge set [C], DMOS PUF may generate m 2 R. However, if attackers replace all challenges in NVM with the identical challenge, then the TRNG will always choose the same key to participate in obfuscation, which will form a fixed mapping relationship between [C] and R. In this case, the obfuscation ability of DMOS will be reduced. Therefore, to prevent the replacement attack, we need to ensure that both the average FHD of keys in the set K and the ratio of '1' in each key are close to 50%. Therefore, we added a FHD detection module which consists of XOR gates and an adder to detect whether C ob was replaced by setting an average FHD threshold θ. If the average FHD of C ob in NVM is less than θ (θ is slightly less than 50%), the system will judge that C ob has been replaced and raise an alert.
3) Probing attacks: The metal wires of the DMOS used to generate the delay to determine the response can be attached to the upper and lower paths of the strong PUF. Therefore, once the attacker physically detects the internal structure of the DMOS PUF, the response generated by the PUF would be changed [14] and the entire DMOS PUF structure would be destroyed.
4) Reliability-based modeling attacks: Unlike previous ML attacks, recent reliability-based modeling attacks [19] do not require the concrete knowledge of the response for a given challenge but the binary reliability of the generated response from the challenge. With the reliability information of CRPs, attackers can design the fitness function for CMA-ES to model PUF successfully. If attackers replace all C ob in NVM with the identical challenge, then input the [C] (see Fig. 3 ) to the DMOS PUF repeatedly and observe reliability information of the response, fine grained reliable information of the [C] would be given away to facilitate the reliability-based modelling attack. However, if attackers replace C ob with the same challenge in the NVM, the authentication will be cancelled due to the presence of FHD detection module in DMOS. In this case, even if attackers know the reliability information of the challenge, the reliability-based modeling attacks cannot be conducted successfully since they do not know which response is corresponding to the input challenge. If the attacker matches challenge and response by guessing, as analyzed in Section IV.C, for a 64-stage DMOS Arbiter PUF, when the number of keys is 2 and the prediction error rate is 5%, the probability that the attacker extracts valid 650 CRPs from 2600 CRPs is extremely low (about 10 −630 ).
V. EXPERIMENTAL RESULTS AND ANALYSIS
A. Experimental Setup
The proposed DMOS in this paper is a universal obfuscation architecture that can be used for any strong PUFs, we choose the Arbiter PUF to evaluate its resistance to ML attacks. We have collected 1 million CRPs for Arbiter PUF and DMOS Arbiter PUF on a Xilinx Artix-7 FPGA. All experiments are conducted on the Intel i5-7400 CPU@3.0 GHz, GTX1050 GPU and 8G memory. 
B. Resistance to ML attacks
We adopt the adversary model used for the Arbiter PUF. We model the original Arbiter PUF and DMOS Arbiter PUF with five ML methods, LR, SVM, ANN, CNN, and CMA-ES.
In the experiments, we use the LR with the iterative function Rprop [33] , the SVM with the kernel function RBF [38] , a 3-layer ANNs, a CNN containing two convolution layers and two connection layers and the CMA-ES whose the fitness function is average Hamming distance to conduct ML attacks. In the model training, we divide the CRP data set into the training set (70%), validation set (20%) and test set (10%) randomly.
All trained models will be tested by 10,000 unused CRPs. The experimental results are shown in figures 5, 6, 7.
As shown in Fig. 5 , without any protection strategies for the PUF, the modeling accuracy of the five ML attacks can reach 95% when 50,000 CRPs are collected, which shows that when a small number of valid CRPs are collected, attackers can clone the Arbiter PUF successfully because the average bit flip rate for a 64-stage Arbiter PUF is about 4.8%. However, when the DMOS is deployed for the Arbiter PUF, attackers are difficult to clone it. For example, as shown in Fig. 6 , when the number of keys in the set K is set to 8, even if the K is not updated dynamically, compared with the original Arbiter PUF, the modeling accuracies of the five ML methods are reduced significantly. We have tested the effectiveness of the five ML algorithms on a 64-stage DMOS Arbiter PUF. Experimental results show that even if 1 million CRPs are collected, the accuracies are lower than 55%.
We have evaluated the modeling accuracies of five ML methods on the 64 × 64 DMOS Arbiter PUF with different numbers of keys using 1 million CRPs. Fig. 7 shows that the modeling accuracies of LR, SVM, ANN, CNN and CMA-ES decrease significantly with the number of keys increasing. For example, when the number of keys is 2, the modeling accuracies are lower than 65%; when the number of keys is 8, the modeling accuracies are lower than 60%; when the number of keys is 16, the modeling accuracies are lower than 55%. When the number of keys is 32, the modeling accuracies are close to 50% which is equivalent to the random guessing. Therefore, DMOS PUF is able to resist ML attacks effectively.
In addition, since LR performs best in terms of modeling time and modeling accuracy in five modeling attack methods, we use LR to model the 64-stage DMOS Arbiter PUF with 32, 64 and 128-stage to evaluate the effectiveness of resistance to ML attacks. The experimental results are shown in Table II . For the original Arbiter PUFs, the modeling accuracy is about 95% when collecting 10,000 CRPs. However, for the DMOS Arbiter PUF, the modeling accuracy is lower than 65% when the number of keys is 2 and 100,000 CRPs are collected; the modeling accuracy is lower than 60% when the number of keys is 4; the modeling accuracy is lower than 55% when the number of keys is 8. Therefore, the DMOS Arbiter PUF can resist ML attacks efficiently.
Finally, the DMOS PUF can update the obfuscation keys dynamically to improve the ability to resist modeling attacks greatly. Once the number of CRPs collected by attackers reaches the threshold N DM OS min, , the server will send a key update command to update the keys on the PUF chip and server synchronously. With the dynamic key updating, the DMOS PUF can resist all potential ML attacks. C. Authentication Capability Fig. 8 shows an example of the estimated inter-HD and intra-HD distribution of a 64 × 64 Arbiter PUF's responses. The process by which we computed these estimators guarantees that the assumed binomial distributions provide an accurate estimation, in particular for the right tail of the intra-HD distribution and for the left tail of the inter-HD distribution, because these two tails describe two undesirable errors in an authentication application: the false acceptance rate (FAR) and the false rejection rate (FRR). The authentication capability of PUF can be evaluated with the FAR and the FRR. Given an n-bit response, FAR denotes the probability of incorrectly accept an unauthorized device. Obviously, high FAR will bring security vulnerabilities in authentication. False rejection rate (FRR) denotes the probability of rejecting an authorized device. High FRR would result in low successful authentication rate for authorized devices. When a n-bit response is used for authentication, we can conduct a quantitative analysis for the FAR and FRR which are determined by the uniqueness, reliability and n tolerance [25] , [26] .
1) Uniqueness and FAR Uniqueness is used to evaluate the difference in responses generated by different PUFs when inputting the same challenge. The paper evaluates the PUF uniqueness with the average Hamming distances:
where s represents the number of PUF instances, R a and R b are two n-bit responses generated by two PUF instances u and v when inputting the same challenge. The ideal value for uniqueness is 50%. For a n × n DMOS Arbiter PUF, FAR can be expressed as [26] :
(25) where n tolerance is the number of flip-flop bits allowed in a response when the server matches.p inter denotes the probability of R a = R b . Since the DMOS Arbiter PUF uses the multiple challenges to generate n-bit response from one Arbiter PUF, the probability of R a = R b is actually the rate of different bits in two responses. Therefore,p inter is equal to the uniqueness of the DMOS Arbiter PUF.
We selected 10 4 challenges randomly to evaluate the DMOS Arbiter PUFs with 32, 64 and 128 stages. As shown in Table  III , the uniquenesses (p inter ) are close to the ideal value 50%.
2) Reliability and FRR Reliability is used to evaluate the stability of PUF responses generated by the same challenge in different environments. Ideally, PUF responses should remain the same under same challenges over multiple observations. Actually, a variety of environmental conditions, such as temperature, voltage and aging, may result in the delay differences in the PUF circuit and cause responses to vary. Since the DMOS Arbiter PUF obfuscates the challenge and response with the key by bitwise XOR operation, the DMOS has no effect on the reliability.
Assume that the response R x and R y are generated by the same PUF instance in different environments, and the bit-flip rate due to environmental varies isp intra . For an n-stage DMOS Arbiter PUF, the FAR is the probability of FHD(R x , R y ) > τ . While the probability of FHD(R x , R y ) ≤ τ is:
Therefore, the FRR can be defined as [26] :
3) Analysis of Authentication Ability FRR decreases with the increasing of n tolerance , while FAR increases with the increasing of n tolerance . High FAR or FRR is undesirable for device authentication. Therefore, we hope that the FAR and FRR can maintain balance. Assume there is a n tolerance that makes the FAR and FRR equal, we call this [14] . (b) PUF-FSM structure [15] . (c) Slender PUF structure [16] . (d) DMOS PUF structure error rate as equal error rate (EER). In this case, n tolerance can be denoted by n EER . However, for discrete distributions, there may not be a value that makes FAR and FRR exactly equal. Therefore, n EER and EER [25] can be defined as follows.
In the experiment, n EER and EER are computed with the 32, 64 and 128-stage, respectively.p inter andp intra are measured by the DMOS Arbiter PUF data.
As shown in Table III , for the 32×32 DMOS Arbiter PUF, FAR is closest to FRR when n EER = 6. In this case, EER is 8.7 × 10 −4 , which is higher than the standard of 10
(the required identification performance of an identification system is determined by its application, but for most practical applications a FAR and FRR both ≤ 10 −6 , and hence an EER ≤ 10 −6 , is minimally desired [25] ). For the 64×64 DMOS Arbiter PUF, FAR is closest to FRR when n EER = 13. In this case, EER is 1.7 ×10 −6 , which meets the standard in practical applications. For the 128×128 DMOS Arbiter PUF, FAR is closest to FRR when n EER = 27. In this case, the EER value is 8.9×10
−11 which can be applied in practice well.
VI. COMPARISONS
In this section, we compare the DMOS PUF with the Controlled-PUF [14] , PUF-FSM [15] and Slender PUF [16] to evaluate hardware overhead and security.
A. Hardware Overhead Comparison
As shown in Fig. 9(a) , the Controlled PUF [14] adds the two hash circuits to obfuscate both the challenge and response, which requires an error correction code (ECC) to correct the unstable PUF responses. However, the hash circuit will incur high hardware overhead, and the ECC unit is also expensive and the hardware overhead is exponentially related to the number of error correction bits, which make the PUF difficult to be applied to resource-constrained devices.
As shown in Fig. 9(b) , based on the Controlled PUF, the PUF-FSM [15] removes the hash circuit on the challenge side and replaces the ECC unit with the FSM state conversion structure at the response side to reduce the hardware overhead. However, the hash circuit on the response side still consumes considerable hardware resources. Additionally, the PUF-FSM protocol requires transferring more than 160×64 = 10240 challenge bits [19] , which is more expensive than storing or transferring the helper data of a fuzzy extractor [39] .
As shown in Fig. 9(c) , the hardware overhead of Slender PUF [16] consists of 4 parallel 128-stage Arbiter PUFs, TRNG, FIFO, LFSR and control logic. The TRNG in PUF chip generates a nonce nonce p first. Then, combining with the nonce nonce v received from the server, a random seed is generated by concatenating nonce p and nonce v . The generated seed is used by a pseudo-random number generator (PRNG) to output the challenge C which will be input to the PUF. At last, Slender PUF will select a sub-sequence W of the response randomly and pad it with a random binary string to create a bitstream P W of the response length for authentication. However, 4 parallel Arbiter PUFs and related circuits still consume considerable hardware overhead.
As shown in Fig. 9(d) , the hardware overhead of DMOS includes the XOR logic, TRNG and NVM. XOR logic for obfuscation and HD detection consumes 75 LUTs and 46 DFFs. TRNG is used to select a key from set K randomly to obfuscate the challenge and response. Many TRNG have been implemented on FPGAs [40] , [41] . For example, flipflop meta-stability-based TRNG [41] consumes 128 LUTs and 1 DFF on Artix-7 FPGA chips. Nevertheless, the additional hardware overhead incurred by TRNG can be avoided in practical application: 1) the TRNG has been used in many systems and hence can be reused; 2) the metastable PUF responses in Arbiter PUFs can be used as the random number [42] .
In order to further demonstrate the low overhead of our proposed DMOS, we compare the DMOS with recent obfuscation methods [14] , [15] , [16] . Based on a 128-stage Arbiter PUF implemented on the Xilinx Artix-7 FPGA chips, the resources (LUT, DFF and RAM) consumed by these different structures are summarized in Table IV . A 128-stage DMOS Arbiter PUF consumes only 395 LUTs and 176 flip-flops, the required NVM is about 2 3 ×64×64 bits = 4KB when m = 8. Therefore, the hardware overhead of DMOS is much smaller than other obfuscation structures.
B. Security Comparison
For the Controlled PUF [14] , the original challenge and response of Arbiter PUF are input to the hash circuit to be encrypted. Therefore, attackers cannot get the original CRPs to model it. However, the unreliability of the response can be exploited to break the Controlled PUF [12] , [30] . Besides, the usage of helper data also makes the Controlled PUF vulnerable to the ES attack [18] , [22] , [23] . Compared with the Controlled PUF, the hash circuit on challenge side is removed in PUF-FSM [15] . However, the latest research [19] proves that if attackers design the fitness function according to the output reliability information of the challenge and perform minimal changes to its open-source implementation in Matlab [20] , the CMA-ES can break the PUF-FSM successfully.
For the Slender PUF [16] , it is impossible for attackers to guess a substring W by guessing the indices ind 1 and ind 2 . However, instead of guessing the indices to get effective CRPs, attackers can use the strings P W or W directly as the inputs to conduct the CMA-ES-based ML attack. Therefore, the CMA-ES can break the Slender PUF successfully [18] .
For our proposed DMOS PUF, challenges and responses are obfuscated by the randomly selected key with the bitwise XOR operation. The reliability of strong PUF will not be reduced. In addition, as analyzed in Section IV.E.4), attackers do not know which response is corresponding to the input challenge and hence it is difficult to conduct reliability-based modeling attacks on DMOS-PUF. Moreover, with the dynamic update of obfuscation keys, potential ML attacks including advanced ML attacks such as CMA-ES can be prevented. As shown in Table V , DMOS has the obvious advantage in resisting machine leaning attacks.
VII. CONCLUSION
In this paper, we propose a new obfuscation technique for strong PUFs, named dynamic multi-key-selection obfuscation structure. In the DMOS, a true random number generator is used to select any two keys from the key set which is derived from the strong PUF's own stable responses to obfuscate challenges and responses respectively with the XOR operation to prevent attackers from collecting effective CRPs to perform ML attacks. With the dynamic key-updating, the DMOS PUF is immune to all ML attacks. In addition, the usage of self-stable CRPs incurs low hardware overhead and the obfuscation structure does not reduce the reliability of strong PUFs. Experimental results demonstrate its advantages of strong resistance to ML attacks and low hardware overhead. His primary research interests are in the area of embedded systems and VLSI (Very Large Scale Integration) CAD (Computer Aided Design) with focus on low power system design and hardware related security and trust. He studies optimization and combinatorial problems and applies his theoretical discovery to applications in VLSI CAD, wireless sensor network, bioinformatics, and cybersecurity. Dr. Qu has received many awards for his academic achievements, teaching, and service to the research community. He is a senior member of IEEE and serving as associate editor for the IEEE Transactions on Computers, IEEE Embedded Systems Letters and Integration, the VLSI Journal.
