examples of usable processes. A software-based TRNG on This paper describes a solution for the generation of true the other hand tries to derive a seed by mixing (usually with random numbers in a purely digital fashion; making it suit-a compression function) different entropy sources (e.g., sysable for any FPGA type, because no FPGA vendor spe-tem clock, mouse movement, network statistics); this apcific features (e.g., like phase-locked loop) or external ana-proach is sometimes called a hybrid generator to distinguish log components are required. Our solution is based on a it from the physicalhardware -TRNG.
framework for a provable secure true random number gen-Historically, physical TRNG were only available in speerator recently proposed by Sunar, Martin and Stinson. It cialized cryptographic devices (smart cards, hardware secuuses a large amount of ring oscillators with identical ring rity modules, cryptographic accelerators, ...), but they are lengths as a fast noise source -but with some deterministic appearing in more mainstream products like trusted platform bitsand eliminates the non-random samples by appropri-modules, Intel motherboard chipsets [2] and VIA procesate post-processing based on resilient functions. This results sors [3] . The implementation details of these solutions are in a slower bit stream with high entropy. Our FPGA imple-mostly not publishedbecause of their commercial value mentation achieves a random bit throughput of more than 2 -but three different techniques seem widely used to gener-Mbps, remains fairly compact (needing minimally 110 ring ate a random bitstream: sampling jittered oscillators, chaotic oscillators of 3 inverters) and is highly portable. circuits, and direct amplification of resistor or PN junction Key words: true random number generators, ring oscil-noise. lators, jitter, resilient functions If the availability of high quality random numbers is required in an FPGA application, the latter two techniques 1. INTRODUCTION are impossible, because they depend on analog components. For instance, good RNGs are becoming a necessity to pro-The security of many cryptographic systems depends upon tect hardware implementations of cryptographic operations the generation of nonrecurring and/or unpredictable quanti-against side channel analysis [4] -think of masked AES imties; these random numbers are, for instance, used for the plementations [5] or blinded modular exponentiation [4] . generation of session keys, challenges in cryptographic pro-Conventional solutions for digital TRNGs will typically tocols or padding of plain text messages [1] . While the non-extract randomness from the unpredictable jitter of oscillarecurrence property can easily be fulfilled (e.g., by using a tors. The design proposed in [6] samples the jitter of the linear congruential generator or linear feedback shift register clock signal synthesized by the analog phase-locked loop with large enough period), unpredictability is more difficult embedded in some FPGAs. Implementations of this conto assure.
cept have been demonstrated on Altera and Actel FPGAs, Pseudo-random number generators (PRNGs) typically but this approach cannot be ported to others FPGAs. Xilinx use some cryptographic mechanism to deterministically pro-FPGAs for instance use digital delay lines, instead of PLLs, duce a sequence of random numbers from a randomly choto provide on-chip clock synthesis. The solution proposed sen seed; the random seed could, for example, be used as the in [7] relies on an external analog oscillator circuit, making key of a stream cipher. The generator however still has to be it clearly unusable for tamper resistant and reliable FPGA seeded by a random number produced by a hardware or soft-applications, especially because disconnecting the external ware true random number generator (TRNG). Hardwarecircuit stops the operation of the RNG. The preferable way based generators exploit randomness which occurs in phys-to create a jittered oscillation is to rely on ring oscillators, ical phenomena; nuclear decay and electronic noise are two an odd number of inverters in a ring configuration. The de-sign of Kohlbrenner and Gaj [8] uses two ring oscillators to ternal random words r[i] is much closer to a uniform distrisample each other, but the placement of these oscillators into bution than that of the das-random bitstream s [i]. CLBs is tricky because the oscillation frequency needs to be In the second place, the post-processing step is also used closely matched; to overcome this placement sensitivity the to increase the entropy per bit of the internal random words authors suggest using more than two ring oscillators. In [9] r[i] by applying a compression function on the input stream Golic proposes two promising variations on classical ring s[i], resulting in a lower speed output stream with increased oscillators using concepts from linear feedback shift register randomness. This will be especially important if a noise (LFSR) design: Fibonacci and Galois ring oscillators. These source is used, that has a low entropy per bit. Compression new asynchronous circuits with feedback provide a mixture also provides tolerance against environmental changes and of true randomness (jitter in ring oscillators) and pseudotampering. randomness (LFSR). The design was experimentally imple-Two popular post-processors to reduce bias are the von mented on FPGA, but real details lack; the implementation Neumann corrector [2] and XOR corrector [6] . The XOR is said to satisfy standard statistical tests, if very powerful corrector just takes the exclusive-or of pairs of input bits; post-processing (an irregularly clocked 64-bit LFSR) is ap-consequentially, the input stream is compressed with a facplied. tor 2. The von Neumann corrector also looks at pairs of The design parameters of most proposals (such as sam-input bits, but uses the first one if the bits are different and pling frequency, amount of post-processing, and thus the re-otherwise throws them away. The resultant stream will have sulting bitstream throughput) are mainly determined by trial a variable bit rate, but on average the compression factor and error until the random bitstream passes the statistical will be 4. Then again, in other proposals more complicated tests provided by the NIST [10] or DIEHARD [11] test suite.
post-processing algorithms than these simple correctors may Mathematical modeling of the entropy source and justifica-be needed, for example a cryptographic hash function or an tion of the post-processing is mostly absent. In this respect, extractor function [15] . the approach taken by Sunar, Martin and Stinson is com-When the random number generator is integrated into pletely different [12] . They start by modeling the entropy a cryptographic module with security certification, an extra collection process and use this model to specify the design unit performing statistical tests is part of the design. Traparameters, i.e. the number and length of ring oscillators ditionally, NIST has specified some tests for the internal and the requirements for the post-processor. ified, and not the unpredictability; a PRNG will also generate a uniform probability distribution and pass the statistical tests, but it does not produce any entropy -like a 2. GENERIC ARCHITECTURE TRNG. The guidelines provided by German IT security certification authority BSI list a number of tests that need to Every physical random number generator will follow the be performed on the das-random numbers s[i], before the generic architecture depicted in Fig. 1 , where the definitions post-processing [13] . Evaluation of designs that use a low of [13, 14] are adopted. entropy noise source and that rely on high compression in Normally, the random noise source generates an analog the post-processor, is more difficult, because the standard signal n(t) which is the result of some non-deterministic BSI criteria will not be met.2 To address this problem, the physical phenomenon. The analog noise signal is digitized concept of a stateless random bit generator is introduced (e.g., by a comparator), yielding the so-called digitized anain [14] . For this class of generators, the verification of a log signals s[i], briefly denoted as das-random numbers. better results than using identical ring lengths; thus from now on we assume that li = 1 for all rings. The statistical model also allows to determine the required number of Fig. 2 . Noise source based on ring oscillators ring oscillators to fill the entire spectrum of the signal n(t) with jittered transition zones. Justification of the expected entropy of the noise source is also provided. quently uses digital ring oscillators as its noise source (see Nonetheless, they conclude that populating the whole Fig.2 ). If an odd number of inverters are connected in a ring spectrum with jitter events is undesirable, because too many configuration, the output of any of the inverters will oscil-ring oscillators would be necessary. Hence they suggest allate from a logical zero to a logic one and back, owing to lowing a fraction of the signal n(t) to be deterministic, but the instable nature of the circuit. If one inverter is replaced compensating for this in the post-processor. The fill rate f with a NAND gate, the ring oscillator can be disabled for denotes the portion of spectrum that will be random. Tainstance to reduce power consumption. At any point in the ble 1 gives the minimum number k of rings necessary (with ring a period square wave can be observed. Ideally, the pe-a confidence of 99%) for certain fill rates.3 This number also riod of this wave linearly depends on the number of inverters depends on the amount ofjitter per period present in the ring (i.e., the ring length) and the delay of a single inverter. In oscillators. If the jitter width (i.e. the standard deviation for practice, there is some random variation on the moment the the jitter random variable) is larger, fewer ring oscillators signal switches; this phenomenon is commonly called jitter. are needed to obtain the same fill rate. Also note that an ex-The goal of a digital TRNG is to harvest this entropy source ponential effort must be invested to obtain a constant factor by sampling the uncertain transition zones and not the de-improvement in the fill rate. This fact confirms the observaterministic part of the waveform. Generally, two approaches tion that f = 1 is too expensive and that lower fill rates will exist to extract randomness from jitter: sampling the output yield a more effective design, provided that post-processing of a ring oscillator using the output signal of another os-can efficiently get rid of the non-random bits. cillator (coupled oscillators) or combining the signals of a The proportion of the jitter width to the entire period number of ring oscillators. The framework we build upon, of oscillation is an important property for an implementauses the latter mechanism. The exclusive-or of k ring oscil-tion of this framework. The ratio depends on the technology lators with length 11, .lk , lk is used as n(t) and this signal used and hence needs to be determined experimentally or is sampled at a regular clock frequency f, using a D-type 3The results slightly differ from those in the paper of Sunar et al.
flip-flop creating a das-random bitstream s [i].
We believe this is because they used another mathematical program By modeling the combined signal with a combinatorial producing the inaccurate calculations. All our calculations have been Fig. 3 . Efficient implementation of post-processing algoby simulations. The jitter of FPGA ring oscillators has been rithm based on cyclic codes measured in [16, 17] . For a Xilinx Virtex FPGA (XCV800) the jitter width ranges from 30 to 45 ps.4 The measurements were only done for long ring length (25 up to 101 invert-4. POST PROCESSING ers), because the waveform of shorter rings was not square presumably because of the output capacitance when send-By allowing the noise signal n(t) to contain deterministic ing the oscillation to an output pin of the FPGA. For this bits, the design becomes practical. For example, a fill rate Xilinx Virtex FPGA the mean period of the ring oscillation of 0.95 necessitates at least 393 ring oscillators, while only was also quantified and as expected it depends linearly on 110 rings are required if a fill rate of 0.60 is allowed (see the ring length 1 and on the (technology specific) delay of .GT FPGA (XC2VP30). We have measured the period of ring oscillators5*with different lengths; a simple linear regression rcyclc codes a special class of lnear codes, the generaof the measurements in Table 3 gives the following relation:
. . ' T 0.88 .1 -0.23. This difference with the other mea-tor matrix will have the form surements is due to the fact that this FPGA is fabricated in go 0 ... 0 T more modern CMOS technology (0.13 ,tm/1.5V as opposed gi go 0 to 0.22 ,tm/2.5V); the logic gates have become twice as fast.
As it is not so easy to quantify the jitter width, specially for short rings, we assume that the jitter width is 2% of the pe-gn-m-1 gn-m-2 ... overcompensate__by adding extra ring oscillators.This resilient function can be efficiently implemented 4These measurements seems to be consistent with a post on the using the circuit depicted in Fig. 3 . The first nm cycomp.arch.fpga newsgroup by a Xilinx engineer stating that jitter of an cles the input bits s [i] are shifted into a register of nm FPGA ring oscillator will be about 40 to 60 ps peak to peak. slices (see Table 4 ). In case we overestimated the jitter on 5. RESULTS the more modem Xilinx FPGA, it would be safer to use more rings; for example, 210 rings increases the slice count to Sunar et al. end their paper with a brief description of a sam-973 but allows the standard deviation of the jitter to be cxr ple design, by picking some realistic values for the parame-0.O1T (instead of 2% of the period). ters of the framework. They propose to use rings of 13 in- We have checked our design with standard tests (NIST verters and assume based on the experimental results of [17] and DIEHARD) and confirmed that the statistical properties that these ring oscillators have a perd of 5 n s, 40 of the produced random numbers are fine. This however MHz) and that the jitter has a standard deviation of 50 ps, 50
does not validate that the design indeed produces 0.97 bits cx = 0.02T. The framework suggests that these parameters entropy per output bit.7 yield at least 0.97 bits of entropy per sampled bit. In order It is also nice to point out that our minimized version to achieve a fill rate f of at least 0.60 their calculation shows satisfies the requirements of a stateless random bit generator that 114 rings6 are needed. Xilinx FPGA and we have verified that the theoretical backdetailed statistical model of the noise source is provided, we ground of the framework indeed holds in practice. In orbelieve it should be possible to certify the design, given that der to minimize the required hardware resources, we made appropriate (online) statistical tests on the post-processed a slight change to the original proposal. By using ring oscilrandom number are applied. lators with a shorter length, namely 1 = 3, we significantly reduced the number of inverters needed in the noise source. However, it is a bit unclear whether this has a big influence 6 . CONCLUSIONS AND FUTURE WORK on the available jitter; the measurements from [16, 17] seem to suggest that shorter rings result in more jitter per period We have verified that the framework for the provable secure and thus potentially more randomness. The downside of our true random number generator proposed by Sunar et al. is adjustment is that fact that the ring oscillators will not create efficiently implementable on any FPGA type. In order to a good square waveform, so maybe the theoretical model no reduce the hardware requirements we propose using shorter longer holds. From our own measurements (see Table 3 ) we ring oscillators, but also suggest using more rings than theknow that ring length 3 gives a period of 3 ns (i.e., 333 MHz) oretically necessary. Furthermore we have verified that an and hence potentially a higher sampling frequency can be real implementation of design passes all common statistical used. The fact that fewer inverters (around 13/3 4 times) tests. could lead to much higher random bitstream (333/40 8 Interesting future work would be to better measure the times) appears rather suspicious. Because of this, we de-jitter of short ring oscillators and test the robustness of the cided to keep the sampling frequency at 40 MHz. We also FPGA implementation. What The original proposal (with I1= 13) uses 1664 slices bits. To verify this experimentally, we tried compressing the das-random of the FPGA, while the minimal version only occupies 565 bitstreams with bzip2 but this was not possible. Other statistical tests, like __________________________~~~~~~~~M auer's universal statistical test, however fail, showing that the das-random 6More precise calculations suggest k =110 is sufficient (see Table 1 ).
numbers contain statisticals defects that get corrected by the post-processor.
