PUFs, that self-generate random numbers, are used in identification or authentication applications for two reasons: cost and security. Since the randomness of PUFs in individual chips may differ, PUFs in some chips may generate somewhat less than random values. Defects during manufacturing may also affect the randomness of PUFs. In either case, confidential information based on PUFs could be vulnerable to security threats. Thus, it is necessary to identify both failing chips during manufacturing and PUFs which are not sufficiently random. To test the randomness of PUFs in a chip, we have designed a dedicated random test module optimized for hardware implementation. Finally, by implementing the module in real PUFs, we verified its validity.
Introduction
A Physical Unclonable Function (PUF) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] is a function that is impossible to duplicate because it is implemented in a physical structure based on the random mismatching of electric or optical elements. Since the random mismatching is uncontrollable, it is impossible to reproduce or to replicate PUF cells having the same values, even using the same layout design. Each PUF cell has an unpredictable random value, so PUF values may be effectively used as IDs, Personal Identity Numbers (PINs), or secret keys for identification and authentication.
Using PUFs for identification and authentication has two strong points: cost and security. In general identification and authentication systems, certificate authorities generate IDs or keys, which are inserted into each device and stored in internal memory. In these systems, the certificate authorities or servers are required to manage the unique and private information of devices, and each device needs some memory to store its own secret data. On the other hand, PUF-based information is generated by each device and does not use internal memory. This reduces the time and cost of managing the unique or private information. Since the secret information is kept only in the device that generated the information, the risk that this secret information is exposed is reduced. Hence, PUFs can be embedded in some chips to enhance security, e.g. NXP will release smart card ICs with SRAM-based PUFs [13] . PUFs can also substitute security applications such as one time programmable memories used for integrity.
To enhance the safety and reliability of mass producing PUF chips, it is critical to detect defective chips and verify the randomness of PUF values. The primary mission of chip quality control is to ensure that defective products are not released to the end-user. Like most other products, PUFs require inspection for manufacturing defects. Manufacturing defects result from the following causes: the chip departing from its intended design even with good quality control, or defects in the process. This causes a chip to malfunction or to severely degrade the randomness of PUFs. Another manufacturing test for PUF is detecting chips with weak randomness in order to prevent security threats. Chips whose randomness has been previously verified are productized, but in some rare cases the PUF values can be highly skewed towards 1's or 0's. These chips are prone to security attacks, so in the case PUFs are used for security purposes their randomness must be verified in advance.
In general, additional test circuits such as scan chains or Built-in self-test (BIST) techniques are inserted into the target chip for testing purposes. These methods are designed to verify the chip's functionality by checking the corresponding output values from the inputs. However, these methods are not suitable for testing PUFs due to a few reasons. These circuits may expose the value of PUFs, possibly compromising the PUF-based secret data. Also, the randomness of PUFs is not something that can be verified by comparing expected values. Thus, we propose a built-in hardware random test module that is capable of detecting defects in chips and testing randomness of PUFs without exposing the PUF values. This new test module, embedded within a PUF chip, self-operates and exposes only the test results. We can easily identify chips with weak randomness from the results, and these may include defective chips since chips with defects tend to have weak randomness.
We propose to use a dedicated random test module based on FIPS 140-2, a widely used statistical random test suite. FIPS 140-2 can resolve some limitations that other random test methods have. Among these random test methods, hamming distance is widely used to evaluate the uniqueness of PUFs [5, 9, 10] . By comparing different chips PUF values in the same location, the inter-chip distance is calculated. As the distance approaches 50 %, the uniqueness is considered to be better. Using this method, the values of certain PUFs can be considered sufficiently distinguishable, i.e. unique. However this method is only good for inter-chip comparison. Since we plan to design an embedded test module in each chip, this method is not suitable for our purpose. Min-entropy [8] and compression rate [8] are also used as random tests on PUFs in previous studies. These examine only one property of the target sequence and they may be inadequate to carry out our goal. In contrast with these random tests, FIPS 140-2 consists of four different random tests, so that testing can be done on various properties rather than a single property. Hence, we chose the FIPS 140-2 standard [14] .
FIPS 140-2 is widely used for testing hardware random number generators, but we cannot use the FIPS 140-2 the same way as the hardware random number generators. A random number generator can dynamically generate a desired number of bits, and the bit-sequence can be tested on separate CPUs or PCs. Unlike random number generators, the bit sequence of a PUF chip is derived from a constant number of PUF cells and should not be exposed. Since the size of the input sequence is limited, and the test needs to be performed within the chip, we used a modified version of FIPS 140-2 according to reference [15] . The first modification is to adjust the acceptance regions to pass the FIPS 140-2 tests by recalculating. The tests of FIPS 140-2 have their own required acceptance regions, which are adjustable depending on the size of bit-sequences. We need to recalculate the regions so that it is suitable for random tests on PUFs. Second, since the test suite is implemented as a hardware module instead of being computed in CPUs, the computation for random test should be simple. The embedded test module is an additional circuit that increases chip area and cost, so the hardware module should be optimized as much as possible. We optimized the method for hardware implementation, implemented the design with PUF cells using 0.11 μm process library, and analyzed the chip test results.
The remain of this paper is organized as follows: We explain, for background information purposes, the previous PUFs and their properties in Section 2 and provide a theoretical explanation of the random test suite used for our design in Section 3. It is described how the test module is optimized for hardware implementation in Section4. In Section 5, we explain the implementation method and analyzed results. Finally, we conclude in Section 6.
Physical Unclonable Functions
In this section, we describe the previous PUFs and their properties. There has been extensive research on PUFs since the coating PUF appeared in 1998 [2] . There are several requirements for a PUF that is used for identification or authentication, and it has proven difficult for one PUF to satisfy all of them. Table I lists some previous PUFs going back to 2000. From their identified weaknesses, we can prepare a list of the essential properties of PUFs that are used for identification or authentication. The first PUF [3] in Table I uses the fact that the drain voltage of every CMOS is slightly different. An arbiter PUF [4] uses the random difference of two delay paths. An SRAM PUF [5] uses the unstable initial values of SRAM as a PUF. A coating PUF [6] has an additional coating layer. Its random capacitance leads to random values. A Via PUF [7] is created by a via non-deterministically formed due to the process variation. As shown in Table I , the previous PUFs have weaknesses. Some PUFs have permanent and random values but consist of complicated circuits or require additional processes [6] . Other PUFs have simple structures, but their values are predictable or vulnerable to measurement noise or environmental changes such as variations in temperature or time [3] [4] [5] . Since most of these PUFs are based on variations of electrical characteristics, it is difficult to maintain stable values that are not affected by external factors. In contrast, the Via PUF is based on a process variation, so the Via PUF cells maintain their values and reproduce the same values without error correction codes (ECCs).
Some of the properties as explained above are essential as requirements for commercial use. Since additional materials or circuits, or complicated structures may cause cost increase, simple structure is preferred. The data generated based on PUFs such as IDs or keys should always be the same, i.e. robust for constancy of identification and authentication. PUFs should generate sufficiently random values for the confidential information based on PUFs not to be exposed or predicted. These properties, simplicity, robustness, and randomness are mandatory for PUFs that are to be used for identification or authentication. They all should be satisfied for commercial use.
Since it is quite difficult to meet the all requirements, some PUFs have methods to make up for their weaknesses. Unstable PUFs use additional circuitry, such as ECCs, or repeatedly measure the value of the same PUF cell, to reduce the error rate. PUFs with lower randomness use hashing functions or majority voting that counts the number of 1s from several bits and determines the result by comparing the counted number of 1s with a specific number. However, these methods need additional circuitry, and the actual random entropy does not increase. In our research, we wanted to test stable PUFs without using additional circuits, so we will use the Via PUFs in the rest of this paper.
FIPS 140-2 random test suite
To test the randomness of the implemented PUF cells, we chose FIPS 140-2 which is one of statistical random test suites. In this section, FIPS 140-2 and its requirements recalculated for hardware implementation are explained.
FIPS 140-2
Of the widely used statistical test suites, such as NIST 800-22 [16] and FIPS 140-2 [14], we chose FIPS 140-2 in view of the fact that it consists of several tests that can be easily implemented in a hardware module. The statistical test suite was removed from the FIPS 140-2 documentation in December 2002, but is still commonly used for testing hardware random number generators [17] [18] [19] [20] .
The FIPS 140-2 statistical test consists of four tests: the mono-bit test, the poker test, the runs test, and the long run test. Each test has its own acceptance region to pass the test, and the regions are adjustable depending on the length of the bit sequence and the level of significance. The level of significance, α, "is the probability that the test will indicate the sequence is not random when it really is random" [16] . α varies with the requirements of the system that needs the random numbers. The acceptance region for each test also has different bounds. For example, [21] defines the acceptance regions such as 9, 725 < (the number of 1s in the 20, 000 − bit sequence) < 10, 275 for the mono-bit test when the length of the sequence is 20,000 and α is 0.0001.
Recalculated requirements of the FIPS 140-2
Random number generators can dynamically generate bit sequences, so it is easy to obtain bit sequences that are long enough for random tests. In contrast, our bit sequences are taken from the implemented PUF cells. Since the length of the bit sequence is directly affected by the number of PUF cells, a longer bit sequence requires more PUF cells, and this increases cost. Thus, we needed a test module that could perform random tests with the restricted number of PUF cells. We designed a random test module that needs only a 2,500-bit sequence with reference to [15] . Since common values of α in cryptography are about 0.01 [16] , α is also adjusted to 0.01. The acceptance regions recalculated using the modified length of the bit sequence and α are as follows.
The mono-bit test
The mono-bit test examines whether the distribution of 1s and 0s in the bit sequence is too biased to either of 1 or 0. This test is simply done by counting the number of 1s and comparing the counted number with the bounds of the acceptance region. If the bit sequence is random, each bit occurrence is independently 1 with a probability of 0.5. The occurrence of each bit is considered a Bernoulli trial, and the probability distribution function f (x) is described as
where n C x is the number of distinct patterns when there are x 1s among n bits.
is the probability that the number of 1s is between x 0 and x 1 . If the number of 1s does not belong to the region, the test fails. Since α is equal to the probability that the test fails, α is expressed as
Since we set α to 0.01, x 0 and x 1 are calculated by substituting 0.01 for α as follows:
f (x) = 0.009866 ∼ = 0.01.
In Eq. (3), when x 0 and x 1 are 1186 and 1314, respectively, α is approximately 0.01. The acceptance region for the mono-test is 1, 185 < (the number of 1s in the 2, 500−bit sequence) < 1, 315.
In the designed hardware module, the number of 1s is counted and tested against Eq. (4).
The poker test
In this test, the pattern of 4-bit segments is examined to verify that they occur uniformly enough. The 2,500-bit sequence is divided into 625 consecutive 4-bit pieces, and each frequency of 2 4 (= 16) patterns on the pieces is counted. For evaluation, the counted numbers of patterns are expressed as
In Eq. (5), the value x is distributed approximately according to the chi-square distribution with 15 degrees of freedom where g i is the frequency of each pattern for i = 0, 1, 2, . . . , 15. The probability density function of the variable x is defined as
Using Eq. (6), α is expressed as
By substituting 0.01 for α, Eq. (7) is calculated as follows: 
The runs test
The runs test is also known as the Wald-Wolfowitz test, and the number of runs that is defined as a sequence of consecutive values of 1 or 0 is examined. For example, the runs of a bit sequence, 010010011 are 0, 1, 00, 1, 00, and 11, and the number of runs is six. The distribution of runs is approximately a normal distribution with the following mean μ and variance σ 2 :
where N + and N − are the numbers of 1s and 0s, respectively, and N is the sum of N + and N − , i.e. 2,500. To make the calculation easier, we can express α using the probability density function of a standard normal distribution as follows:
If we calculate x 0 and x 1 ,
f (x)dx = 0.009997 ∼ = 0.01.
According to Eq. (13), the variable x should be between -2.575829 and 2.575829. Since x is the variable of the standard normal distribution, we need to transform the acceptance region into the normal distribution of the number of runs. x = (X − μ)/σ where the variable X is the number of runs. Thus, the acceptance region of X with the mean μ and the variance σ 2 is μ − 2.575829σ < (the number of runs) < μ + 2.575829σ.
In [15] , the acceptance region is more easily obtained. If Eqs. (10-11) are simplified into
and we obtain 
The long run test
A long run is a run that exceeds a specific length, and the long run test examines whether there is a long run in the sequence. To get the acceptance region for the long run test, we must recalculate the specific length necessary for defining a long run. The probability, P (l ≤ L C ), where the lengths of all the runs in the sequence are L C or under, is given by the following formula [22] :
). (17) α is expressed as
When L C = 16, P (l ≤ 16) = 0.9905646, and α = 0.0094354 ∼ = 0.01. Thus, a run longer than 16 is defined as a long run. The acceptance condition of the long run test is as follows:
(the lengths of all runs) ≤ 16.
According to Eq. (19), the long run test would fail for a run that is longer than 16.
Proposed random test module
Since the equations recalculated in the section 3 are not suitable to be implemented in the hardware, we modified and optimized the equations, and designed them as a hardware module. In this section, the structure of our hardware design is described, and the optimization method is explained in detail.
The structure of the random test module
The designed test module performs four random tests with the 2,500-bit input data: the mono-bit test, poker test, run test, and long run test. The module is shown in Fig. 1 . 
Optimization for hardware implementation
To implement the test module in the hardware and to reduce the area and time consumption, we modified the equations for the random tests as follows.
The mono-bit test
The mono-bit test checks the number of 1s among the 2,500 bits. To pass this test, the number of 1s should be larger than 1,185 but smaller than 1,315 as shown in Eq. (4). In the designed module, we changed Eq. (4) into:
Since the acceptance requirement of the runs test is dependent on the number of 1s, Eq. (20) is better than Eq. (4) for calculating the acceptance region of the runs test. According to Eq. (20), the module begins counting the number of 1s from -1,250 with a high o mono signal. When the last 4-bit data is inputted, the module immediately tests if the number of 1s satisfies Eq. (20) . If the number of 1s is negative, the module takes the absolute value before testing. The value of the o mono signal will be low if the value is not bigger than 64.
The poker test
The poker test counts each number of 4-bit 0000, 0001, 0010, . . . , and 1111 segments whenever the 4-bit input data i data is inputted. After the counting is finished, the test module should check if it satisfies Eq. (9), but Eq. (9) is not suitable to be implemented in the hardware because of the squaring operations. Squaring operations are simply done using a general CPU, but need extra squaring modules for hardware implementation. Eq. (9) is modified as follows:
In Eq. (21), some numbers are computed in advance, and all numbers are integers. The equation becomes simpler than Eq. (9), but is still not sufficient for implementation in the hardware due to the squaring operations for calculating g 2 i . A total of 16 squaring modules are required without resource sharing, or more clock cycles with resource sharing. It is also necessary to wait for the complete 2,500-bit to be inputted in order to calculate Eq. (21) . This waiting consumes additional time. To remove the squaring operations, we transformed Eq. (21) into:
using the fact that the square of any number k can be expressed as the sum of the k smallest odd numbers, e.g. 3 2 = 1 + 3 + 5 and 4 2 = 1 + 3 + 5 + 7. Using Eq. (22), we can reduce the area and time necessary for the squaring operations, and instead of waiting for the complete 2,500-bit sequence, the designed module can calculate the intermediate result of Eq. (22) whenever the 4-bit data is inputted. In addition, when the intermediate sum,
g i k=1 (2k + 1) becomes bigger than 25,695, the poker test is designed to be stopped, so we do not have to have registers and adder modules large enough for the worst cases. After the 2,500-bit sequence is inputted, if the final sum is not smaller than 24,594 and not bigger than 25,695, the result of the poker test, o poker, becomes low.
The runs test
The runs test is dependent on the result of the mono-bit test. The runs test counts the number of runs and compares it with the lower and upper bounds, which should be dynamically calculated using Eq. (16) depending on the number of 1s. Since Eq. (16) is not that simple, we set up the lower and upper bounds as parameters in advance with values that will pass the mono-bit test instead of dynamically calculating the bounds. Table II shows the lower and upper bounds of the number of runs to pass the runs test. After the 2,500-bit sequence is inputted, if the mono test has not failed, and the number of runs is not smaller than the lower bound and not larger than the upper bound, the test result, o run, becomes low.
The long run test
The long run test checks if the length of the longest run is greater than 16. This test is simple. It counts the length of each run, and if the length of the run being counted reaches 17, the test is stopped. If the test is not stopped after processing the entire input data, the test result, o lr, becomes low.
Implementation result
We implemented the Via PUF cells with the random test module we designed, using the 0.11 um process library. In this section, we explain the implemented design, the experimental environment to verify the implemented design, and its experimental results.
The chip implementation
To prove the functionality of the random test module, we implement it with 2,500 Via PUF cells. The implemented design is shown in Fig. 2 . The implemented module included a PUF circuit, a random test module, and a universal asynchronous receiver/transmitter (UART) module. The PUF circuit generates a 2,500-bit random sequence and transmits it to the random test module. The random test module outputs the test results of the 2500-bit random sequence. To verify the operation of the random test module, the UART module was inserted. Using the UART module, we extracted the raw data from the PUF circuit, performed random tests ourselves, and by comparing our test results with the results of the chips, verified if the random test module operated properly. The UART module was inserted for testing only. It would be excluded for any practical use of PUF chips because the PUF values could be extracted, and their confidentiality could be compromised. The each area of the implemented UART module and the random test module is 156 and 1,005 gates, respectively when using 0.11 μm process library, which means that the overhead is minimal. 
The experimental environment
The interface of the implemented chip is very simple, so the experimental environment is simply configured. Figure 4 shows the experimental environment. The environment consists of a FPQ-208-05-10 socket, LEDs, an UART interface, a power supplier and some wires. We used an 80 MHz oscillator and checked the four random test results using four LEDs. If we want to know all values of PUFs in each chip, we can extract them through the UART interface.
The experimental results
We performed random tests on the raw data extracted using the UART module and compared them with the chips results. Table III shows some cases of the test results obtained from the raw data. In Table III , "p" and "f" stand for "pass" and "fail," respectively. Table III shows four cases with four test results respectively. Since the lower bound and upper bound of each run test depend on the number of 1s, the result of the run test is considered as "fail" if the chip fails the mono-bit test. This is why the results of the run tests are said "fail" in the case number 2-4. In Table III , a chip passed all four tests (case number 1). In contrast, some chips passed only some of tests (case number 2-3), or none of tests (case number 4). Although the results in Table III were calculated by using the extracted values, each chip self-tests the randomness of its own PUFs in the same way if the random test module works. In Table IV , we can compare our test results on the raw data obtained using the UART module in Table III with the chip results obtained using four LEDs. Including these four cases, all compared results were the same. This proves that the random test module was very effective, and each chip can successfully perform the random test using the embedded random test module. In Table V , the recalculated acceptance regions are compared with the acceptance regions for 20,000-bit sequence [21] . Because the requirements are recalculated, the test results differ. Since the acceptance region of each run test depends on the result of the mono-bit test, we excluded the run test from this comparison. The lengths of the acceptance regions to pass the mono-bit tests for 2,500-bit sequence and 20,000-bit sequence are 130-bit and 500-bit, respectively. These lengths are 5.20 % and 2.75 % of the whole length of each bit sequence, respectively. If the proportion of 1s in the bit sequence is 52 %, the 2,500-bit sequence passes the mono-test, but the 20,000-bit sequence fails the mono-bit test. If X = 40 in the poker test, the 2,500-bit sequence fails the poker-test, while the 20,000-bit sequence passes the poker test. In the case number 4, the length of the longest run is 22, and its result is "fail", but it would pass the long run test if the length of the bit sequence is 20,000-bit.
Conclusions
In order to meet the appropriate security level, PUFs used to generate unique IDs or secret keys should ensure sufficiently random results. Some PUFs are implemented with additional circuitry such as a fuzzy extractor [10] [11] [12] to generate random numbers. Using a hashing function, they re-generate random numbers even though the raw values of the PUFs are not sufficiently random. Strictly speaking, the random entropy does not increase even using hashing function, but the bits are just mixed. We also wanted to generate random numbers without additional circuitry such as a hashing function because such an additional circuitry consumes more area, and more time to generate random numbers. In addition, we wanted to identify bad chips that have not sufficiently random PUFs, including failure chips in order to guarantee the security strength of the systems or devices that use the PUFs. To distinguish chips with low randomness, each chip with PUFs should include a dedicated module that can perform the random tests. Due to chip area requirements and cost, the test module can use only a limited number of PUF cells. We designed a dedicated random test module that required only 2,500-bit sequence, and we optimized it for hardware implementation. We implemented the test module with Via PUFs using 0.11 μm process library and tested the implemented chips. In the chip test results, it was proven that our implemented random test module worked. Although adding a test module required some overhead, it was minimal. The area of the test module is only about 1,000 gates while the areas of hashing functions used for fuzzy extractors are about 10,000s gates. Moreover, each chip was able to perform a random test by itself without any additional effort. Therefore, the cost and time for testing can be reduced. The confidentiality of PUFs is also guaranteed because the test is performed inside of chips.
