Quality randomness is fundamental to cryptographic operations but on embedded systems good sources are (seemingly) hard to find. Rather than use expensive custom hardware, this study investigates entropy sources that are already common in a range of low-cost embedded platforms. In particular, we empirically evaluate the entropy provided by three sources-SRAM startup state, oscillator jitter, and device temperature-and integrate those sources into a full Pseudo-Random Number Generator implementation based on Fortuna [2]. Our system addresses a number of fundamental challenges affecting random number generation on embedded systems. For instance, we propose SRAM startup state as a means to efficiently generate the initial seed-even for systems that do not have writeable storage. Further, the system's use of oscillator jitter allows for the continuous collection of entropygenerating events-even for systems that do not have the user-generated events that are commonly used in general-purpose systems for entropy, e.g., key presses or network events.
Introduction
Modern security technologies depend on strong random numbers for creating encryption keys, signatures, and nonces for sensitive data. The strength of these numbers is dependent on a reliable source of entropy and a secure Pseudo-Random Number Generator (PRNG). On general purpose systems such as Linux, the entropy is usually gathered from the timing of unpredictable events, e.g., user key strokes, mouse movements, and disk events. Further, initial seeding of the generator is usually done by reading in a seed file or by delaying random number generation until enough entropy can be collected. However, entropy sources used by general purpose systems are poorly suited to the embedded environment. For example, key stroke information cannot be collected by a system that does not have a keyboard attached. Similarly, seed files cannot be used when the system does not have a disk.
This work examines the feasibility of implementing a PRNG fed by high-quality entropy sources in an embedded environment with limited hardware and software resources. Previous studies proposed using specialized hardware in either an FPGA or an ASIC to gather entropy [1, 9] ; However, such hardware can add significant production cost to embedded systems. In contrast, the system we present utilizes existing internal hardware that is commonly found in a wide variety of inexpensive microcontrollers. In particular, our implementation targets the Texas Instruments MSP430 (MSP430F5529) low-power microcontroller using only an external 4 MHz crystal oscillator and power supply. This platform was chosen due to its numerous programmable peripherals, its commonality in arXiv:1903.09365v1 [cs.CR] 22 Mar 2019 embedded systems development, and its affordability. Many microcontrollers and Systems On a Chip (SoC) include similar hardware that allows our PRNG to be implemented, including chips from Atmel, Xilinx, Cypress Semiconductor, and other chips from Texas Instruments.
Our system is based on two key insights. First, we can address the challenge of collecting entropy at runtime by sampling the jitter of the low-power/low-precision oscillator. Coupled with measurements of the internal temperature sensor we can effectively re-seed our PRNG with minimal overhead. Second, we can solve the problem of initial seeding by leveraging the randomness inherent in the startup state of RAM [4] . We also propose the use of a Cyclic Redundancy Check (CRC) as a mixing and collapsing function for transforming the RAM startup state into the initial seed, overcoming the processing limitations created by a small memory space.
Our system, based on the Fortuna PRNG [2] , was designed and implemented on the MSP430. We empirically evaluate the amount of entropy generated by each of our sources and the overall quality of the random numbers generated by the PRNG. Additionally, we analyze the effect of different operating environments on the entropy sources; As systems operate at different supply voltages and environment temperatures, it is important to understand the extent to which the performance of the PRNG is dependent on particular physical parameters, e.g., external temperature.
Pseudo-Random Number Generator Design
PRNGs rely on a source of true randomness, entropy, that often provides an initial seed from which a random stream of data is produced. Depending on the generator design, entropy sources may also be used to periodically re-seed the PRNG, extending the number of outputs that can be produced before the generator is liable to repeat sequences or become predictable. One such generator that frequently uses entropy sources to re-seed is Fortuna, a block cipher-based PRNG designed by Niels Ferguson and Bruce Schneier [2] .
Fortuna
Fortuna is divided into two components: the generator and the accumulator. The generator consists of a block cipher that encrypts a counter value to produce a block of random bytes. After every data request the encryption key is replaced with a new output from the block cipher in order to reduce security vulnerabilities in the event an of attacker learning of the internal state [2] .
The accumulator consists of a set of 32 entropy pools which are fed random data from any number of entropy sources. The data contributed to each pool consists of the random data itself and a source identifier which indicates the origins of the sample. When a pool is filled with enough entropy to replenish the generator, the pool contents are collapsed into a 256-bit value that becomes the new generator key. The collapsing is done using a cryptographic hashing algorithm [2] .
Algorithm Selection
Fortuna is not specified as having a specific block cipher or hashing algorithm, however there are limitations on which algorithms can be used due to key, block, and digest sizes. The block cipher is required to have a 128-bit block to fit a 128-bit counter value, and a 256-bit key. The hashing algorithm digest must also be 256-bit in order to produce new keys during re-seeding. Further limitations placed on this implementation were code size and execution time due to limited memory and a slow clock speed. These parameters were used to select algorithms that have been implemented in a "lightweight" format (ie. having small code size and efficient execution). The algorithms shown in Tables 1 and 2 were considered to have good performance in low-power and low-cost devices. However, most of the algorithms do not support the required key, digest, and block sizes needed in Fortuna. From this list of algorithms, AES-256 and SHA-2 256 were selected as the block cipher and hashing algorithm, respectfully, due to their input and output sizes, and the availability of software libraries for embedded systems.
Entropy Sources
The security of Fortuna partially rests on the act of re-seeding the generator with enough truly random data to replace the internal state. In the event that an attacker is able to gain knowledge of the AES key, the entropy pools must contain at least 128 bits of entropy for the generator to be re-seeded and become safe from attack [2] .
Three peripherals and devices in the MSP430 were investigated as potential entropy sources for seeding the generator and feeding the accumulator. Two of the sources, phase jitter in an internal low-power oscillator and a temperature sensor were used as feeders to the entropy pool. The third source, the startup state of SRAM, was used as an initial seed for the generator. The following sections discuss each of these sources and analyze their behavior and entropic qualities. For estimating the entropy contributed by each source, the entropy estimation suite from NIST SP 800-90B was used to estimate the Min-Entropy, the most conservative entropy estimate among popular estimators including Shannon and Renyi estimates.
Low-Powered Oscillator
The MSP430 has an internal Very Low-Power Oscillator (VLO) that is intended to be used in low-power applications where an external oscillator is not able to be powered or is not present. As it is powered internally by the microcontroller, and is not sourced from a high-precision crystal, the VLO is subject to phase jitter. Phase jitter is the small time difference between when a controlled oscillator has a rising edge, and when it is expected to have a rising edge in an ideal model [10] . This signal artifact has been used in other RNGs before, and is often a promising entropy source [1, 9, 11] . The jitter characteristics of the VLO were investigated in order to know how much entropy it is able to contribute to the PRNG.
Jitter Characteristics
In an oscillator with a stable typical period, τ typ , every measurable period τ i is composed of the typical period and a small time error (phase jitter) j i between the expected and actual edge times (Equation 1). For a system that begins measuring at every rising edge, these are the only components of each mesaured period.
One other characteristic of interest is oscillator wander, the tendency for the frequency to stray far away from the typical frequency f typ . Unlike phase jitter, this phenomenon is not desireable in an entropy source, as oscillator wander can be influenced by changes in the operating environment [10] . Beyond investigating the steady-environment qualities of the VLO, samples were tested with different supply voltages and operating temperatures. This provided insight to the VLO's behavior in applications with different environment temperatures and system voltages.
Since oscillator jitter is defined to not include low-frequency shifts in oscillator timing, we verified that the VLO instability manifests as jitter and not oscillator wander [10] . An oscilloscope capture in infinite-persist mode depicts that the oscillator period sits at a consistent f typ with slight fluctuations in the form of phase jitter. This is shown by the late and early edges captured in Figure 1 .
The scope capture in Figure 2 of an extended run of the oscillator in infinite-persist mode reveals the density of phase jitter lengths over time. Equally-sized finite regions are covered on both sides of τ typ , which suggests a possible normal distribution of jitter events, and confirms that the VLO operates at a typical frequency. Thus, at a constant supply voltage, the VLO does not experience oscillator wander.
The effect of different supply voltages on the VLO τ typ was also measured over a range from 2.3 V to 5.3 V as shown in Table 3 . As the supply level increases, the typical period decreases due likely to higher power availability. Thus, the VLO does exhibit tendencies to wander across supply levels, but does not wander when kept at a consistent supply.
VLO Sampling
VLO jitter was sampled by measuring the period of each VLO cycle, which includes only τ typ and j i , as described in equation 1. The system shown in Figure 3 used the following process and components to measure VLO periods: Jitter events around VLO edges cover equal finite ranges on both sides of the center period. This is partial evidence of a normal distribution among VLO phase jitter events.
1. A high precision timer, the Digitally Controlled Oscillator (DCO), is configured to run at 24 MHz.
2. A 16-bit hardware timer increments a count register on every rising edge of the DCO.
3. The VLO is routed into the timer as a capture interrupt signal, such that the timer count is stored in a secondary register on the rising edge of the VLO.
4. An interrupt is simultaneously generated by the timer. The MSP430 runs an interrupt srevice routine that subtracts the previous timer count from the current timer count to obtain the VLO period in DCO ticks.
The MSP430 itself runs on the DCO and is fast enough to complete the interrupt service routine before the VLO edge occurs, and leaves enough time to perform additional post-processing or transmission functions between samples. Direct Memory Access (DMA) can also be configured to perform copies from the timer register to memory to avoid extra time in context switches between a main application and ISR.
Statistical analysis of one million 8-bit VLO samples produced a normal distribution in accordance with the previously discussed frequency characteristics seen in oscilloscope captures. Figure 4 depicts the distribution of VLO samples at varying supply voltages, each of which suggests the VLO period is a randomly-distributed variable and therefore a good source of entropy [9, 11] .
VLO Entropy
The NIST entropy estimation suite was used to predict the min-entropy of the VLO in different operating conditions in order to ensure that the VLO has reliable entropy regardless of the environment temperature or system voltage. Entropy estimates were acquired by testing one million samples from the VLO across a voltage range of 2.4 V to 3.6 V, the maximum supply range for the MSP430 running at 24 MHz [12] . Table 5 shows the min-entropy estimates for the number of bits of entropy in each 8-bit sample for each configuration. Table 4 provides a detailed listing of the various entropy estimator results for a supply of 3.3 V. In addition to collecting entropy estimates, restart tests were performed on the VLO as recommended by NIST in order to check that a similar sequence is not generated after restarting the entropy source. The VLO passed the restart tests, confirming the validity of the min-entropy estimates. In addition to testing different supply voltages, measurements were taken at different environment temperatures. At a fixed temperature, the VLO was estimated to have a consistent amount of entropy, regardless of the temperature itself. However, when samples were collected while the environment temperature was actively changing, the entropy estimates decreased as shown in Figure 5 . Based on these tests, the VLO is affected by temperature changes in that the entropy decreases when temperature is changing, but reliable when at a stable temperature. 
Temperature Sensor
The second runtime entropy source used in this PRNG is an internal temperature sensor that measures an aggregate of the environment and internal temperatures of the microcontroller. The sensor was routed as an input to a 12-bit Analog to Digital Converter (ADC) with a reference voltage of 1.5 V. This reference was selected in order to increase the sensitivity of the sensor and have a resolution of 0.366 mV/bit. As specified in Fortuna, it is desirable to have each entropy source contribute to the pools evenly. In order to achieve such an equal contribution, the ADC was configured to make conversions of the temperature sensor at 9.5 kHz to match the typical period of the VLO at a supply of 3.3 V. Each deployment of this system should select a sampling rate equal to the typical VLO frequency for the supply voltage used in the application.
While there is not a substantial amount of information on this sensor, and it isn't proven to be a true entropy source, the samples were tested with the NIST entropy estimation suite to determine the apparent amount of entropy in each sample. The resulting min-entropy estimate for the temperature sensor at 9.5 kHz can be seen in Table 6 . • F to 0 • F. This is likely because of the VLO periods increasing or decreasing in a constant direction, introducing a more predictable pattern to the period samples.
SRAM Startup State
Beyond runtime entropy sources, the Fortuna generator also requires a 64-byte seed file that initializes the generator key at start up; this is a challenge in embedded devices due to limited non-volatile memory. However, previous work has shown that the startup state of SRAM cells exhibits a random pattern and can be used as a large pool of boot-time entropy in embedded systems [4] . This implementation utilized the start up state of SRAM on the MSP430 to create a 64-byte seed file.
First, a min-entropy estimate for SRAM on the MSP430 was determined by reading out the initial memory state before any function calls or data were placed on the software stack, and tested using the NIST entropy estimation suite. The MSP430 has 10 KB of RAM available, which was collected 100 times across start ups with 30 second power-off periods to create a set of one million samples. The estimation results for SRAM are shown in Table 7 . Although SRAM was estimated to have 0.457844 bits of entropy per byte, it was determined that only certain bytes are likely to change frequently as shown in Figure  6 .
Thus, the randomness is likely spread out over the entire memory, and all 10 KB needs to be collapsed down to a 64 byte memory segment without disturbing the RAM state using a high-level software routine. To solve this, a built-in CRC module was used to mix each block of 160 bytes down to a 16-bit value with high entropy. The viability of this CRC-CCITT-16 hardware mixing function is discussed in the following section.
CRC as a Mixing Function
CRCs are a type of algebraic cyclic code that operates on the underlying principles of polynomial division over the field GF(2), modulo a generator polynomial G(x). The individual bits of the CRC inputs are treated as the binary coefficients of polynomials, and the remainder, often called the syndrome, of division by G(x) is used as the CRC output [8] . This structure of a cyclic code is important in the analysis of CRCs for entropy mixing. . This operation is analgous to calculation of a syndrome within a cyclic code [11] , and makes the CRC-CCITT-16 an attractive candidate for a resilient function.
Additionally, the output characteristics of CRC-CCITT-16 are of interest. As CRCs are error-correcting codes, they are designed to produce different syndromes for different input sequences to allow for error detection. As a 16 bit CRC, this variation of the code is able to detect any bit errors (ie. changing bits), within a 16-bit input [8] . Thus, there is a unique output for any inputs up to 16 bits, so the individual period sample inputs will have unique effects on the final CRC result. Furthermore, as CRC-CCITT-16 is implemented as a Linear Feedback Shift Register (LFSR), the nature of the output is randomized, as is the purpose of a LFSR-based RNG [7] .
CRC-CCITT-16 is defined with the generator polynomial G(x) = x 16 + x 12 + x 5 + 1, which is the product of (x+1) and a primitive polynomial p(x) = x 15 + x 14 + x 13 + x 12 + x 4 + x 3 + x 2 + x 1 + 1. Primitive polynomials are important in CRC theory, as they define further error-detection capabilities and output variance with respect to input sequences. A generator with a primitive polynomial factor of degree n for a CRC guarantees detection of 2-bit differences in input streams that are at most 2 n − 1 bits apart [8] . In this case, it can be guaranteed that different CRCs will result for input streams that differ in 2 bits that are 32767 places away from eachother, far larger than the maximum of 1280 bits that pass through the CRC for extraction. Additionally, since the generator is the product G(x) = (x + 1)p(x), the CRC can detect all parity errors, which are errors, or bit differences consisting of an odd-number of bits [8] .
The Hamming Distance (HD) of a CRC defines the widest burst of bit errors that can be detected. In determining the HD it is imperative to know the generator polynomial and whether it is primitive, the CRC size, and the length of the input blocks. Based on previous work in CRC analysis, the CCITT-16 generator with an input of 256 bits has a Hamming Distance of 4 bits; precisely 6587 inputs with 4-bit-wide differences will be un-noticed [5] . Taking into account that a large amount of the input bits from SRAM are deterministic and that the majority of bytes only differ by a few values, a very small fraction of all the possible input sequences may not be detected. Additionally, there are no 5-bit errors that are undetectable [5] . Any 6-bit errors are highly unlikely to occur due to limited number of values each SRAM byte may contain.
The relevant properties of CRC-CCITT-16 as a mixing function are summarized below:
• A Primitive-based generator allows detection of all odd-numbered bit differences.
• Generator polynomial degree 15 guarantees detection of all 2-bit differences within the 256-bit input.
• A maximal-length LFSR design ensures pseudo-randomness in outputs.
• A small amount of 4-bit-wide input differences are undetectable, although wide bit differences are not critical to detect in this application.
• All 5-bit-wide input differences are detectable.
• We don't expect any 6-bit-wide input differences.
Thus, the CRC provides randomness and guaranteed output mixing with any input streams that contains 2-bit, odd-numbered-bit, and 6-bit-wide differences. This is a strong argument for using the CRC module to collapse the 10 KB of SRAM down to a 64-byte seed file.
Comparison of Entropy Sources
As a performance and security benchmark, the entropy sources in this PRNG were compared to the sources of an RNG on a general purpose system. The Linux random number generator is a widely-known generator designed for use on PCs, servers, etc. The generator uses three sources of entropy: User input events (keyboard and mouse), hardware interrupts, and disk access events [3] . Each event is paired with a timestamp, the precision of which is dependent on the operating system configuration; the computer used for development of this PRNG has timestamps with 4 ms precision. In addition to the entropy sources, the generator's pool of entropy is saved and restored across system power cycles to provide a high amount of entropy to start with, similar to Fortuna [2] . In order to measure the entropy collected from Linux sources data being contributed to the pool was intercepted via the Linux kernel. One million samples were collected from each source to test with the NIST estimator, the results of which are shown in Table 8 ; the maximum amount of entropy from a single source was 0.742 bits per sample from user input events. The time needed to collect the high-entropy input events spanned three days, whereas interrupts required 1 hour, and disk events required over a day for disk access events to be initiated by the user.
The estimates for the sources used in this PRNG have significantly higher estimates than the Linux entropy sources, particularly the VLO and temperature sensor. Additionally, both the user input and disk event sources are dependent on user input, which could starve the system if the computer is not used frequently enough. The sources used in this work are not easily controlled by a user, and would require extensive access to the device in order to fully control or eavesdrop the sources. Furthermore, the sources in this PRNG run much faster than the Linux sources, and can generate a high volume of samples in a very short period of time (on the order of seconds).
PRNG Implementation
A full implementation of Fortuna using the previously discussed entropy sources was implemented on an MSP430 with the architecture shown in Figure 7 . Taking into account the entropy estimates of both the VLO and the temperature sensor,~4.4 bits of entropy are collected between the two runtime sources at an average rate of 9.5 kHz. In the event that the internal state of the generator is compromised, the accumulator would take 58 entropy source events, or 3 ms, to gather 128 bits of entropy and reseed the generator. A sequence of random bytes was collected from the PRNG and tested positive using the NIST Statistical Test Suite, the results of which can be seen in Table 9 .
Beyond basic implementation details, a number of design changes in the following areas were made from the original specification of Fortuna: seed file management, entropy pool inputs, and entropy pool size. The subsequent sections discuss each of these changes.
Seed File Management
The specified manner for managing the seed file is to rewrite the contents of the seed with an output from the generator after every startup [2] . On an embedded device this is difficult due to the lack of an initial seed on the first boot of a device and limited non-volatile memory. By using SRAM startup state, the need for managing a seed file was removed. Instead, the application reads the mixed SRAM state from the low 64 bytes of memory after the CRC startup routine completes and the main program begins. This also improves the boot speed of the application when compared to the alternative of writing and reading a seed file from FLASH. FLASH operations take a considerable amount of time and would slow down the device start up during seed file management. Using the CRC mixing function is fast, as each calculation happens in hardware within 2 clock cycles while the CPU prepares the next byte of SRAM for computation.
Entropy Pool Contribution
Based on the Fortuna specification, each entropy source adds both its source ID and random data to each entropy pool every time the source produces a new sample [2] . With an approach like this, a significant amount of deterministic data would be present in the pools and contribute no randomness to the system. As memory is not readily available in Table 9 : Randomness tests of PRNG samples The NIST Statistical Test Suite was used to test the randomness of the PRNG output. Columns C1 to C10 represent the frequency at which passing p-values are calculated for blocks (p-values gauge whether a block is probable to pass a specific randomness threshold). The p-value calculated from a chi-square test is in the following column, with the pass-rate of the test follwing in the next column. All of the pass rates are above the required threshold for confirming good random data. C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 This implementation modifies the pool contribution method by only adding the source sample to an entropy pool when a source has a new sample available. Although this means that each pool fills more slowly than it would with the source ID included, the sources run fast enough where the pools will fill up quickly. This is fast enough to be able to re-seed the generator every 3 ms.
Entropy Pool Size
The final design change made to the Fortuna PRNG is the minimum size an entropy pool must reach before it is mixed into the generator. The original specification leaves this parameter open to designers due to the fact that different sources accumulate entropy at varying rates. For this system, 58 events are required to reseed the generator with 128 bits of entropy; this number was used as the minimum entropy pool size for re-seeding in order to ensure consistent protection of the generator state.
Conclusion
The PRNG presented in this work was developed to explore the feasibility of building an effective PRNG in an embedded device without requiring custom external hardware. Three potential entropy sources were evaluated in a commonly-used embedded platform: oscillator jitter, an internal temperature sensor, and the startup state of SRAM.
These entropy sources were incorporated into a strong pre-existing PRNG, Fortuna, that was designed to keep the entropy accumulator and generator separate. This generator architecture was appropriate for an embedded device due to the independent operation of the entropy sources from the main CPU. The entropy sources fill the PRNG's entropy pools as they have events, while the generator is able to produce a sequences of random bytes regardless of the pool states.
Having identified methods of harvesting entropy from existing hardware in embedded systems to implement an effective PRNG, this architecture and methodology can be extended to other off-the-shelf embedded devices with similar hardware peripherals. This overcomes the challenge of implementing a PRNG in a resource-limited system with highly deterministic behavior, whereas traditional PRNGs have access to a wide range of computational resources and entropy sources. Furthermore, this problem is solved without the use of custom hardware that often greatly increases the cost of development for embedded systems.
