A true random number generator (TRNG) is an important component in cryptographic systems. Designing a fast and secure TRNG in an FPGA is a challenging task. In this paper we analyze the TRNG designed by Sunar et al.
Introduction
Traditionally, a high assurance implementation of cryptographic algorithms has been done in application specific integrated circuit (ASIC). During the recent years, more and more of these implementations are done in field programmable gate array (FPGA). There are several reasons for this development. The FPGA can be reprogrammed, leading to more flexibility for modification of algorithms, changing algorithms and fixing bugs. The development of an algorithm in an FPGA is easier and faster as compared to an ASIC design, resulting in a shorter time-to-market. In addition, the latest FPGA devices are manufactured with the state-of-the-art technology.
It is well known that a true random number generator (TRNG) is an important component in today's cryptographic systems. Typically a TRNG can be used for generating keys, initial vectors, random sequence for cryptographic challenges, etc. In a cryptographic system, a private or secret parameter is normally generated by a TRNG and is an interesting property to an attacker. Therefore, the generation of a random bit sequence is important and should be unpredictable to an attacker. One common method for generating a truly random sequence is to amplify the thermal noise in a diode [7] . The disadvantage of this method is the use of external components. This approach enables an attacker to manipulate and read the random bit sequence from the device and consequently violating the security of the entire cryptographic system. If the TRNG is implemented entirely inside the FPGA, an attacker will have difficulties in retrieving and manipulating the random bit sequence. The challenge is to design a TRNG in an FPGA passing all statistical tests and at the same time using as few resources as possible and achieving a high throughput of random bits.
In this paper we will examine more closely the TRNG based on oscillator rings proposed by Sunar et al. [12] . We found that their TRNG is not random without postprocessing. We propose an enhancement of the proposal in [12] and show experimentally improved performance with respect to FPGA resource usage and throughput. We also show that our TRNG has no bias and therefore no need for complicated post-processing. We have shown by experiment that the frequencies of the oscillator rings will be different due to the placement and routing of the inverters inside the FPGA. We have implemented our proposal in an Altera Cyclone II FPGA [1] . Our implementation of the TRNG based on ring oscillators passes the NIST and DIEHARD statistical tests with a throughput of 100Mbps and the usage of less than 100 logic elements in the FPGA.
The rest of this paper is organized as follows: In Section 2, we briefly examine the previous work on TRNG in FPGA. In Section 3, we analyse the TRNG of [12] . In Section 4 we propose an enhancement of the TRNG to achieve better randomness on the output sequence. The detailed analysis of the randomness of our proposed TRNG and the investigation of distribution of frequencies on oscillator rings are discussed in Section 5 and 6 respectively. In Section 7, we describe in detail an implementation of our proposed TRNG and finally we make a conclusion in Section 8.
Related Work
Several implementations of TRNG in FPGA have been proposed during the recent years. The common entropy source used is jitter on clock signals. Jitter can be viewed as timing deviation from the theoretical correct position due to electronic or thermal noise [13] . The random jitter will typically follow a Gaussian distribution characterized by a certain standard deviation (σ). Usually, jitter is an unwanted property in a system, but this behavior is useful when generating random signals in a TRNG.
In 2002, Fisher et al. [5] used the jitter in analogue phase-locked loop (PLL) in FPGAs from Altera as entropy source in a TRNG. The strategy is to create different clock signals with jitter from the PLL and sample one of the clock signals with the other. This method is restricted to FPGAs containing such analogue components. Later, Kohlbrenner et al. [8] used a similar technique, but the clocks are generated by oscillator rings containing two transparent latches, a buffer and an inverter. Since the frequencies of the two oscillator rings have to be almost equal, the oscillator rings have to be correctly matched. Tkacik [14] proposed a TRNG using a linear feedback shift register (LFSR) and a cellular automaton shift register (CASR) clocked by two independent oscillator rings. Selected outputs from the LFSR and CASR are combined by an XOR generating the final random signal. The disadvantage of this scheme is that the TRNG has memory and is therefore not stateless as pointed out in [2] . In 2006, Golić [6] proposed a TRNG using a Galois ring oscillator (GARO) and a Fibonacci ring oscillator (FIRO). These LFSR structures use inverters as delay elements instead of register elements. The outputs from one GARO and one FIRO are combined with an XOR and the random sequence is generated by sampling with a D flipflop. This design was further investigated by Dichtl et al. [4] who found some minor problems regarding cross-talk from other signals inside the FPGA. In 2007, Sunar et al. [12] gave a theoretically proposal for a random number generator based on several equal length oscillator rings made up of an odd number of inverters (refer to figure 1). The outputs from the oscillator rings are XORed together and sampled with a D flip-flop. To correct for unbalances between zeros and ones in the random signal, a post-processing is carried out on the output of the D flip-flop. Schellenkens et al. [11] implemented this scheme in a Xilinx FPGA, but with a large number of rings in order to make the random sequence output pass the statistical tests. Figure 1 shows the TRNG proposed by Sunar et al. [12] . The TRNG is constructed from many equal length oscillators connected to an XOR tree. The output from the XOR tree is sampled by a D flip-flop. In order to satisfactorily pass the statistical tests like NIST [10] and DIEHARD [9] , the random signal from the D flip-flop must be postprocessed. The proposed post-processing is a resilient function implemented as a BCH-code. The suggested design of the TRNG consists of 114 oscillator rings where each ring consists of 13 inverters. A suggested sampling frequency is 40MHz and the post-processing is a [256, 16, 113] extended BCH code. The resulting throughput of the TRNG in [12] is 2.5Mbps.
The entropy source of the TRNG is the jitter created by each oscillator ring. The jitter has a Gaussian distribution around each clock transition between logic low and logic high level. This jitter will create an accumulated phase drift in each ring so that the transitions will be at different times in the sampling period. Due to the jitter, the unpredictable transition region is assumed to be uniformly distributed in the sampling period. The number of rings needed can then be calculated based on the coupon collector's problem, that is, the number of uniformly random selections of N urns such that all urns are selected at least once. The number of urns is determined by the proportion of the jitter size as compared to the frequency of the oscillator ring. Because the number of rings grow exponentially when filling up the last urns, a lower fill rate than 100% is selected. To compensate for this, a BCH-code is used for post-processing. The resulting random number throughput is reduced by a factor of 16 due to this post-processing scheme.
In [4] , some weaknesses of this implementation were mentioned. The main concern of the authors is that the XOR-tree and the sampling D flip-flop cannot handle the high number of transitions from the oscillator rings. The frequency of a oscillator ring is about the same or higher than the sampling frequency. With many oscillator rings in Figure 2 . Our proposal parallel, the number of transitions during a sampling period will be so high that the setup-and hold-times for the look-up table (LUT) and the internal register element in the FPGA will be shorter than specified for the device. The detailed analysis of the TRNG of [12] is shown in Section 5 and 6 as it is better to have a comparison with the proposed enhanced TRNG.
Our Proposed Enhancement
To cope with the problem with many transitions in the sampling period, we suggest an enhancement to the TRNG based on the oscillator rings in [12] by adding an extra D flip-flop after each ring (refer to figure 2). As we will show, this configuration will improve the randomness of the TRNG. The randomness of the configuration relies on the jitter variations of the oscillator rings. Adding these extra flip-flops will not alter the collection of the randomness of each ring, but improve the overall output of randomness.
The frequency of the oscillator ring (f i ) is dependent of the odd number of inverters in the ring. The frequency will increase with the decreasing number of inverters. In order to have a fast and small TRNG, the number of inverters should be as low as possible making the frequency of the rings become high as compared to the sampling frequency (f s ). The advantage of our enhancement is that the signals on the input of the XOR will now be synchronous with the sampling clock and only updated once in the sampling period. Due to this reduction in transitions on the input to the XOR tree, the setup-and hold-times for the internal logic in the FPGA will now be within acceptable limits.
The frequency of the beat signal (f b ) after the extra flipflop will always be less than half of the sampling frequency and lie in the interval [0, 
Bias in TRNG
One of the basic statistical tests of random number generators is the frequency test of ones and zeros. For a good random bit sequence the probability of a zero or a one should be close to 1 2 . In other words, there should be no bias in the random bit sequence.
Let X and Y be two random bit sources with their expected values E(X) = E(Y ) = µ respectively and let ρ be their correlation. Then the expected value or bias of the XOR of the two sequences (X ⊕ Y ) is given by [3] :
If µ is close to (1) can be written as:
It can be seen that a correlation between the two sequences will generate a bias in the output from the XOR of two random bit sequences. If X and Y are linearly independent, then ρ = 0 and E(X ⊕ Y ) ≈ If there are n independent bits, each with expected value µ, then the expected value of XOR of all these bits will be given by [3] : where ε = µ − 1 2 . Since µ ∈ (0, 1) ⇒ |2ε| < 1, the expected value in equation (3) will converge to 1 2 for increasing number of sequences, n. In other words, adding more oscillator rings in the TRNG design should improve the bias if the rings are independent.
We have carried out some experiments on the randomness of the TRNG in [12] (without any post-processing) and our proposal in figure 2 . The experiments are carried out on a Starter Development Board from Altera containing a Cyclone II FPGA. This device has a core voltage of 1.2V and is fabricated in 90nm technology. Quartus II WebEdition 6.1 is used for synthesis and P&R (Place and Route). The sequences of random bits generated inside the FPGA is stored in an external SRAM and transmitted to a PC for analysis through an asynchronous serial connection. The result is blocks of subsequent random number bits from the TRNG where each block has a maximum size of 4Mbit. The sampling frequency used in this experiment is 50MHz. No constrains have been put on the P&R tool regarding the placement of the inverters in the FPGA.
We have implemented the two configurations of TRNG (refer to figure 1 and 2 ), recorded 10 blocks of 1Mbit of random data from each configuration and calculated the frequency of ones in all blocks. We have done the experiment with oscillator rings of length 3 and 13 with varying number of rings. The results are shown in figure 4 and they indicate that the design in [12] has a bias after the XOR of the oscillator rings. The tendency is that the bias increases with the increasing number of rings and there is a majority of zeros in the output. This shows that there is some dependency or correlation in the random sequences. It seems that this bias is due to the problem with the high number of transitions at the input of the XOR tree and the sampling flip-flop. For our configuration (figure 2), it is seen that the bias is close to 1 2 and that it converges rapidly to 1 2 . This shows that our proposal behaves accordingly to the theory of XOR of independent random sequences. 
Distribution of Ring Frequencies
According to [12] , the assumption of the randomness is that the equal length oscillator rings will have the same frequency while the phase drift related to the jitter causes the drifting of the transition regions. We believe that the frequencies of the oscillator rings will be different from each other. We have carried out some experiments where we have implemented 64 oscillator rings in the Altera Cyclone II FPGA and tapped out the signal from each of these rings to I/O-pins on the FPGA. These frequencies are measured with an oscilloscope. Figure 5 show the histograms of the frequencies of ring oscillators with 5 and 31 inverters respectively 1 . From this experiment, it is observed that the distribution of the frequencies for short rings does not follow a Gaussian distribution and the frequencies are clustered in groups. For longer rings, the clustering is not so obvious and the distribution is approaching Gaussian with only some values far from the mean.
When examining similar histograms for other lengths, it is observed that the dispersion is decreasing with increasing number of inverters. In figure 6 , the dispersion is measured by the coefficient of variation defined as the percentage of σ µ where σ is the standard deviation and µ is the mean. It can be seen that the dispersion is high for short rings and decreasing with longer oscillator rings. Based on this observation, using oscillator rings with only 3 inverters will give the highest dispersion in the frequencies.
To explain the behavior of the frequency distribution, the architecture of the Altera Cyclone II FPGA [1] is examined.
Figure 6. Dispersion of frequencies
This FPGA consists of a matrix with logic elements (LE), each containing a programmable register and a LUT for implementing any logic function of four inputs. 16 of these LEs are then grouped into a logic array block (LAB). All the LEs and LABs are connected together via different routing resources depending on the distance between them inside the FPGA. When running P&R for the design in an FPGA, the inverters in the oscillator rings are located at physical LEs. Depending on the placement, the routing delay between the LEs will differ. If all the inverters are placed in LEs inside one LAB, the routing delay will be short. If the inverters are placed in LEs in different LABs, the routing delay will be increased resulting in a lower frequency of the oscillator ring. In addition, there will also be some variation in the delay on each LUT. All these variations in the routing delays cause the distribution of the oscillator ring frequencies and the clustering for short rings. For long oscillator rings, the difference between the routing delays of each ring will be smaller due to the fact that the inverters have to be placed in more than one LAB. For other FPGAs with similar architecture, the ring oscillator frequencies will result in similar distributions.
Due to the observed distribution of frequencies of equal length oscillator rings, the transition regions will quickly be spread out over the sampling time period and the probability of interaction between the oscillator ring outputs is reduced.
TRNG Implementation
We have done an implementation of our proposed TRNG in figure 2 . In order to have a fast and small TRNG, the number of inverters in the oscillator ring is selected to be 3. A sampling frequency of 100MHz is selected, resulting in a throughput of 100Mbps since our TRNG do not use any post-processing.
The required number of rings is calculated based on the probability to hit the transition region with the sampling. Sunar et al. [12] computed this by using a combinatorial approach (coupon collector's problem). An alternative way Figure 7 . Simulated probability of hitting a transition region is to make a statistical model of the TRNG and perform simulations in order to decide how many oscillator rings are needed to achieve a high probability such that at least one ring are sampled in the transition region. When the size of the jitter is small as compared to the sampling period, the simulations show that the number of rings in the transition region follows a Poisson distribution with a parameter λ = k · r where r is the number of rings and k is a constant depending on the size of the jitter as compared to the sampling period. The probability of sampling in at least one of the transition regions versus the number of rings is shown in figure 7 . It shows that the probability increases rapidly for small number of rings, but many oscillator rings are needed to get a 100% certainty.
We have carried out an experiment where we have used 50 oscillator rings with 3 inverters, a sampling frequency of 100MHz and without any post-processing. A total of 1000 blocks of 1Mbit (a total of 1Gbit) of random data have been captured from the TRNG. The data is tested using the statistical test of NIST (SP 800-22) [10] and DIEHARD [9] . The random data passed both tests. We also did the same experiment for this configuration with only 25 oscillator rings. The random data did also pass both the NIST and the DIEHARD tests. This experiments indicate that there is probably not necessary to have almost 100% certainty to hit at least one transition region in order to pass the NIST and DIEHARD statistical tests for this kind of TRNG. Table 1 shows the number of resources used for our TRNG in the Altera Cyclone II FPGA. For 25 oscillator rings, the number of LEs is less than 100 (< 1% of the total number of LEs in our medium size FPGA). For comparison, the original design in [12] occupies more than 1800 LEs.
In order to examine the randomness of our TRNG after start-up, an oscilloscope is used to capture the random output when restarting the TRNG several times. While the FPGA is held in reset, the oscillator ring outputs are kept at zero or low level. When the reset is deactivated, the oscilla- figure 8 , 10 restart sequences from the output of the TRNG are captured where the oscilloscope is triggered on a clocked version of the reset signal at the origin of the graph. The sampling frequency is 50MHz. Because of the bandwidth limitation in the oscilloscope, the measured outputs are not square signals. It can be seen that all the outputs starts at zero, but there is a deviation between the rings after the first clock period of 20ns. This experiment shows that our TRNG outputs randomness quickly after a restart.
Conclusion
We have analyzed the TRNG in [12] and have proposed an enhancement of a TRNG based on oscillator rings. By adding an extra flip-flop after each inverter ring before the XOR tree, we have shown that the performance is much better than [12] regarding the random signal. We have also shown that the frequencies of each ring are not equal but have some kind of distribution. Smaller rings will have higher dispersion in the distribution and therefore also better potential for fast generation of randomness after restart.
We have implemented the TRNG of figure 2 and carried out statistical tests on the resulting random bit sequences. It is shown that our TRNG passes both the NIST and DIEHARD tests without post-processing. The throughput of the TRNG is 100Mbps and the resources used in the FPGA are less than 100 logic elements in an Altera Cyclone II FPGA.
