Abstract: A field programmable gate array (FPGA) -based implementation of a physical random number generator (PRNG) is presented. The PRNG uses an alternating step generator construction to decorrelate an oscillator-phase-noise-based physical random source. The resulting design can be implemented completely in digital technology, requires no external components, is very small in area, achieves very high throughput and has good statistical properties. The PRNG was implemented on an FPGA device and tested using the NIST, Diehard and TestU01 random number test suites.
Introduction
Random number generators (RNGs) are an important primitive widely used in simulation and cryptography. A physical random number generator (PRNG) derives its output from a physical noise source and its output is non-deterministic in nature. Given the importance of random number generation, surprisingly few hardware implementations of PRNGs have been reported. There are three commonly used techniques in the literature, namely oscillator sampling, direct amplification and discrete time chaos. In the oscillator sampling approach, period variation (i.e. oscillator jitter) in a lowfrequency clock of low quality factor (Q) is exploited by using it to sample a high-frequency clock. The direct amplification technique digitises thermal or shot noise, using an amplifier and comparator. Finally, chaotic systems can be used to produce PRNGs. A high performance PRNG which uses a physical random source to control two linear feedback shift registers (LFSRs) in a manner similar to that of an alternating step generator (ASG) stream cipher is proposed. This approach combines some of the benefits of both approaches and achieves high throughput, small area and good randomness properties. The same approach could be applied to combine other weak physical random number generators with a stream or block cipher.
In 1984, Fairfield et al. [1] developed the first integrated RNG based on oscillator phase noise. In the design, a highfrequency oscillator was sampled using a low-frequency oscillator. After removing duty cycle biases via a parity filter, the flip flop output was fed into a LFSR-based scrambler. The design generated 27 bps using a 1000 Hz low-frequency clock. The Intel RNG is part of the Intel 8xx chipset starting with the Intel 810 and is implemented in the Intel 82802 firmware hub device. It uses amplified thermal noise to drive a voltage controlled oscillator (VCO), and oscillator sampling is used to detect the phase noise of the VCO to produce a digital random source [2] .
We have previously reported an FPGA design which employs oscillator sampling [3] . In this design, a low-frequency RC oscillator was used to sample an internal high-frequency clock. The design requires only three external passive components to control the time constant of the resistor-capacitor (RC) oscillator. Phase noise in the RC oscillator produced randomised output which was filtered through a parity filter. A disadvantage of this approach is that the output rate is limited by the speed of the RC oscillator and in order to pass the NIST and Diehard tests, the maximum rate was 4.7 kbps. The only other FPGA-based implementation was one by Fischer and Drutarovsky [4] which used a variation of oscillator sampling. Their design was based on the randomness of jitter in an analogue phase locked loop (PLL) and a decimator was used to ensure that at least one sample affecting jitter was included in every output data. The design was implemented on an Altera APEX EP20K200-2X FPGA with a 33.3 MHz external clock. With an 88.245 MHz internal clock, it can generate 69 kbps. For FPGAs such as the Altera APEX E and APEX II devices which have internal PLLs, this approach requires no external components. The disadvantage of this approach is that few FPGAs have this feature.
PRNGs based on chaotic systems can lead to very compact complementary metal oxide semiconductor (CMOS) implementations. In 2001, Stojanovski et al. [5] implemented an analog chaos-based RNG in a 0.8 mm CMOS process utilising switched current techniques. The estimated output bit rate of this design was 1 Mbps. Gerosa et al. [6] also implemented a RNG based on a chaotic system. Their design with a pipelined ADC (analog-to-digital converter) occupied 2.2 mm 2 silicon area and the design can generate 8-bit data using a 20 MHz clock. Petrie and Connelly combined oscillator sampling, direct amplification and discrete time chaos to produce an analog very large scale integration (VLSI) chip which was robust to power supply noise and substrate signal coupling [7] . Implemented in 2 mm CMOS, the chip could produce random numbers at 1.4 Mbps. The design occupied an area of 1.5 mm 2 and dissipated 3.9 mW of power. In comparison with the approaches described above, the design presented in this paper, an output rate of 400 Mbps can be achieved on a Xilinx XCV300-8 devices and the design occupies approximately 130 Xilinx Virtex slices. Furthermore, it can be implemented entirely in digital technology with no external components.
The rest of the paper is organised as follows: the architecture of the PRNG and its FPGA implementation are presented in Section 3. The performance of the design and the quality of the resulting output is reported and evaluated in Section 4. Conclusions are drawn in Section 5.
Background

Oscillator sampling based physical noise source
Oscillator sampling based noise sources typically use a lowfrequency clock (F l ) with large phase noise to sample an accurate high-frequency clock (F h ), producing an output (F r ) as shown in Fig. 1 . If the phase noise of F l is of the same order as the period of the high-frequency clock, an output which is random is obtained [1] . However, since the output rate of this approach is that of the low-frequency clock, the output rate of this PRNG is determined by the frequency of F l . If the frequency of F l is increased to improve the output rate, the phase noise usually decreases, leading to correlations in the output. There are several factors which affect the randomness of the output [1] . The first is that the duty cycle of F h may not be 50%. In this situation, F r will have unequal probability of being zero or one. An N-bit parity filter [1, 8] can be used to deskew a non-uniform distribution. If the ratio of ones to zeroes in the raw random bitstream is p:q then the probability that the parity will be one or zero is the sum of the odd or even terms of the binomial expansion of ( p þ q)
N . This sum can be evaluated to calculate the probability of a one at the output of the parity filter and is 1=2 [ 
As N increases, this expression tends to 0.5.
The second factor is the selection of clock frequency. The period of the generated clock changes because of oscillator phase noise. If the variation in F l 's period is not large enough, there will be correlation between bits and so the value of the output can be predicted to some extent from the previous values. Previous research has suggested that the standard deviation of the period variation of F l should at least be 0.75 times the period of F h to reduce bit to bit correlation [1] . Increasing F h and reducing F l leads to more randomness.
A third factor affecting the quality of the RNG is the random source itself. As both periodic and aperiodic electromagnetic noise exists inside a computer system, there may be correlation in the output sequence as the result of coupling of periodic noise from the power supply, clocks, crosstalk, thermal effects and so on. This issue is not addressed in this work.
Alternating step generator
The ASG is constructed from three LFSRs as shown in Fig. 2 [9, 10] . The binary output of the selection LFSR (LFSRS in the figure) is used to select whether LFSR1 or LFSR2 is clocked. The output of the ASG is the exclusive-or (XOR) of the output of LFSR1 and LFSR2. The characteristic polynomials of LFSR1 and LFSR2 are irreducible and different. In addition, the greatest common divisor of the periods of LFSR1 and LFSR2 should be equal to 1.
Several attacks on the ASG have been proposed. If the connection polynomials of LFSR1 and LFSR2 are primitive trinomials, the generator can be attacked using the linear syndrome method [11] . In our design, a high Hamming weight polynomial was chosen to prevent this attack. Golic proposed an attack based on the edit distance [12] . This attack requires computing the edit distance for every possible pair of initial states of LFSR1 and LFSR2 and is hence not practical for large shift register lengths (approximately 127 in our case).
3
Architecture and implementation
In the proposed approach, a physical noise source, hereafter called the oscillator noise source (ONS), is produced by oscillator sampling as shown in Fig. 3 . The high-frequency clock, F h , is generated using a three-inverter ring oscillator implemented in a single Xilinx Virtex slice, whereas the low-frequency oscillator input comes from the system clock (133 MHz) in our tested configuration. These two signals are combined using an edge-triggered D-type flipflop to produce a non-deterministic but correlated random output. This output is used instead of the selection LFSR of an ASG. In order to achieve a high output rate, the ONS should produce outputs at the same rate as the system clock. This is normally derived from a crystal-controlled oscillator and has low phase noise. Hence the system clock should be connected to the clock input of the D type flip-flop (as shown in Fig. 3) , and a high-frequency oscillator connected to the D input. For the FPGA implementation, a highfrequency ring oscillator was used. Ring oscillators are commonly used for PLLs, clock recovery circuits and frequency synthesisers, but have high phase noise compared with circuits employing passive resonant components [13] . Thus they combine the advantages for this application It is desirable to make the frequency of the ring oscillator as high as possible in order to reduce the correlation resulting from sampling the ring oscillator with the system clock. A naive implementation would require three lookup tables (LUTs) and hence 1.5 Xilinx Virtex slices [14] . The FPGA implementation used an additional two-input XOR gate present in the Xilinx Virtex slice to reduce the implementation to 1 Virtex slice as shown in Fig. 4 . This has the advantage of higher speed because wiring is reduced and the XOR gate is faster than a LUT.
The LFSRs were implemented using the SRL16 [14] feature of the Xilinx Virtex chip which enables a 1 -16 stage shift register to be implemented in a single LUT.
Clock doubler
As discussed in Section 2, increasing the high-frequency clock, F h , improves the randomness of the ONS output. It is possible to apply a clock doubler to the output of the ring oscillator as shown in Fig. 5 . The Poker test in the NIST testsuite [15] was used to observe the effect of different delay values for the clock doubler, and the results are shown in Fig. 6 . This test is quantitative and a low figure implies better randomness. The Poker test is passed if the result is between 1.03 and 57.4 [10] . As can be seen, small and large values of the delay do not result in clock doubling and the Poker test results are large. The test results show a significant improvement for delay values of approximately 2.5 ns as reported by the Xilinx timing analyser. Table 1 shows a comparison of the best Poker test results with and without a clock doubler. Note that although the clock doubler offers an improvement, the ONS output does not pass the Poker test.
Results
An implementation of the PRNG was synthesised and implemented using the Xilinx ISE 8.2i software. The LFSRs were implemented as a chain of 16 bit shift register primitives (called SRL16 blocks) in the FPGA device to achieve high performance and density. The FPGA platform used was a Pilchard FPGA card [16] , which employs the synchronous dynamic random access memory (SDRAM) bus instead of the peripheral component interconnect (PCI) bus used in conventional FPGA boards. The FPGA device used was a Xilinx Virtex XCV300E-8 device. The LFSRs were chosen so as to have a random irreducible connection polynomial of degrees 127 and 129 with approximately the same number of 0 and 1 coefficients [9, 10] . Several different polynomials were tested. The initial states of the LFSRs were random numbers with approximately an equal number of 1's and 0's. Table 2 summarises the resource utilisation and performance of the PRNG including a host interface to read back the data. The high-frequency clock of the PRNG can operate at over 400 MHz, but experiments described in this paper used a 133 MHz clock so that the output sequence could be collected via the SDRAM interface of the host computer. As reported by the Xilinx timing analysis tool, the minimum ring oscillator frequency was 800 MHz.
Since the ONS output of the clock doubler improves on the randomness, results reported below are without the clock doubler (i.e. the delay was set to 0). It was also verified that the implementation passes the below tests when an appropriate delay for the clock doubler was added. This increases confidence that the design will operate correctly even if the delay of the clock doubler is set to an inappropriate value.
NIST test suite
For the NIST test suite (version 1.5), all parameters were set according to the recommendations in [17] and the test sequences were 1 Mbit in size. The sample size, that is test sequences used in the tests, was 100. Table 3 summarises the NIST test results for the PRNG. The significance level a was chosen to be the default of 0.01 (99% confidence) so a test has passed if the p-value is larger than this number. The pass rate is proportion of the 100 binary sequences that passed the test. It can be seen that the PRNG passes all NIST tests.
Diehard test suite
Although the Diehard test suite is one of the most comprehensive publically available sets of randomness tests, unfortunately there are no well-defined pass criteria. Intel assumed that the entire suite passes with a 95% confidence interval for p-values between 0.0001 and 0.9999 [18] , and this method was used for our testing. The Diehard test results are summarised in Table 4 . If multiple p-values are in the result, the worst case value is presented. The PRNG random sequence passes the Diehard test.
TestU01 test suite
TestU01 [19] is a set of C libraries for RNG performance evaluation. We developed programs to test our RNG results using this library. The random data were stored in a file and then read in as an external RNG source. The RNG passes the Rabbit, Alphabit, SmallCrush and Crush test batteries.
Conclusion
A new RNG was introduced. This circuit combines a physical random number source with a high speed stream cipher to produce a physical noise source based RNG with small area, high output rate and good statistical properties. This RNG would be suitable for simulation and cryptographic applications although for the latter, caution should be taken since the design is new, and it may be possible to attack the ASG construction given that the ONS is weakly correlated. 
References
