In this paper, a high accuracy gaussian noise generator emulator is defined and optimized for hardware implementation on FPGA. The proposed emulator is based on the Box-Muller method implemented by using ROMs tabulation and a random memory access. By means of accumulations, the central limit method is applied to the Box-Muller output gaussian distribution. After presenting the algorithmic method this paper analyzes its efficiency for different noise signal formats. Then the architecture to fit into FPGA is explained. Finally results from the FPGA synthesis are given to show the value of this method for FPGA implementation.
Introduction
a flat spectrum; high sample rate (> 10 MHz.).
A theoretical method to fit thiij requirement was proposed by the author in [5] . This paper focuses on the FPGA implementation (namely the FLEXlOK or APEX2OK of Altera [2] ) of the AWGN generator in order to reproduce the architecture. The paper is organized as follows : Section 2 recalls briefly the method proposed in [SI, Section 3 described the overall architecture of the AWGN generator, Section 4 shows the LFSR (Linear Feedback ShiR Register) optimization and Section 5 gives the design results.
Design of accurate AWGN reference model
Fast prototyping of digital communication systems needs efficient tools for the evaluation of the performance of the transmission algorithms. For example, to obtain an 6 9 estimation of the Bit Error Rate of 10-f 3.3%, 10
iterations have to be done. Since the number of parameters in a modern system can be very high (sampling frequency, digital format, carrier resolution, rounding and quantification,. . .), the search for an
The Gaussian noise sample is generated in two steps.
First, a quantized version of the Box-Muller method is performed to obtain a good approximation of the Gaussian distribution. Second, several samples thus obtained are accumulated to genemte the final sample. The aims of this last step is to smooth the fluctuaction of the distribution obtained with the qiuantized BoxMuller method (central limit theorem).
optimal compromise between performance and ; i ; complexity is not trivial and, simulation is generally the 2.1. Box-Muller method last tool used to perform this task [4] . To avoid software delays inherent in a long simulation, hardware emulation
The Box-Muller method is widely used in software is investigated in the research project carried out simulation. It generates a random sample n of gaussian between ENST-Paris (France), SUP'COM (Tunisia), variable N(0,l) (zero mean and standard deviation -1) LESTER (France) and University of Toronto (Canada). from two uniformly distributed over [0,1] random The idea is to use an FPGA board to emulate the system. A synthetized model of the channel and a synthetized measurement performance at very high speed. The main Additive White Gaussian Noise (AWGN) generator, i.e.: variable x1 and x2 using (see [3] for a proof) :
version of the algorithms is used to perform difficulty in emulating the channel, is to have an accurate B bits (2 to 10) of resolution after the decimal point f@l) = &aG
a Ilormal distribution UP to than times the A quantized version of (1) and (2) using pre-computed standard deviation owith a relative error less than 0.1% compared to the ideal distribution; values is proposed in [5] . It is based on a non-uniform quantization of segment [0, 1] 
The sign of the output sample n is obtained by using a random variable s ignvhich complements P ( r , s) when equal to 1 :
. . Mixed m e t h o d
The curve a of figure I compares the distribution BM 1 obtained with the parameters of FPGA circuit.
. Overall architecture
As shown in figure 1, good results are obtained with K=5 and m=7, which means that (2+m)*K=45 logic cells are needed for the FrROMs. A 256-byte on-chip RAM can be used for the G ROM (m'=8-1=7). All these parameters correspond to Table 1 above. They are a good trade-off between performance and FPGA complexity.
Once the Box-Muller variable is generated a truncation can be done according to the needed accuracy to keep only B bits after the decimal point. In our example we truncated to get 6 bits after the decimal point. When the sign bit is one, the one's complement is used to get negative values. Hence the mean value is now -2-B-' instead of 0 before accumulation. After accumulation the mean and standard deviation are given by :
To be as close as possible to N(0, I), a compensation has to be done at the back end stage. The back end stage consists in multiplying the noise according to the needed SNR and in adding the result to the signal. For instance if A=4, a mere left shift of the decimal point is enough to compensate CJ and the addition with -2-B+1 compensates the mean. The period and the number of combinations is 2"-1 if the LFSR has n registers. After Reset the LFSR has to be initialized with a value different from "00000" otherwise it stays in this state. Instead of using 29 LFSRs for the 29 variables, the number of LFSRs can be reduced if the address bits are grouped by packet of 4, necessiting only 7 LFSRs, 2 for the G ROM, one for each FrROM and one for the signAt every clock cycle, 4 bits are use as outputs and "shifted". For instance for the LFSR of figure 5, tbeing the clock period, the register x5 can be expressed as xs(4= x4(&l) = x2(&3)+x5(t_3) = x(t-4)+x4(t-4). By considering operations every 4t, 4 virtual shift operations are done in one clock cycle.
This technique can be easily coded in VHDL and generates almost no extra FPGA logic cells. The code of the LFSR fimction generator is given in the Annex for any number of outputs (parameter nb-it& the code). Figure 4 illustrates the structure LFSR with polynomial x5 + x2 +1 and 4 ouputs. The sequence of the first 12 combinations of LFSRs of figure 3 and 4 is indicated in Table 2 , The initial value is set to "00001". This table shows cornbinations of the 4-outputs LFSR correspond to every fourth combination of the 1-output LFSR.
In order to meet the periodicity constraint which is greater than 10l8 (or 260), at least a total of 60 registers of LFSRs are needed. In order to keep the highest period, the LFSRs need to have periods prime between them. To meet this condition, we propose to select the LFRS's length from the "Mersenne" numbers (number so that 2d-1is a prime number).
. Results

. 1 A c c u r a c y
By considering the parameters of Table 1 between the ideal gaussian distribution N[O, 11 and the synthesized one is calculated by using the MATLAB model. The accuracy depends on B ,which is the number of bits after the decimal point resulting from the truncation operation, and the number of accumulations A. 5 . 2 S y n t h e s i s Table 4 gives the results obtained with the parameters of Table 1 , A=4, B=6 and LFSRs of length 22, 21, 20, 17, 13, 7, 5, 15 registers for respectively G, F r a n d s i a n :
The synthesis has been done using FPGA ExpressTM and MAX+PLUSIITM. The number of cells of the LFSR part is 149. In order not to lose the performance level due to the 4 accumulations, 4 Box Muller generators can be placed in parallel and added in one shot. Consequently the hardware size is multiplied by 4. Figure 5 illustrates the relative error obtained with 10 samples.
9
-VHDL le9 samples 
Conclusion
In this paper, a new technique for generating in real time gaussian noise and emulating a transmission channel was developped by applying the central limit theorem to a gaussian distribution generated by the Box-Muller method. Hardware in FPGA has been optimized by taking advantage of the logic cell structure and the onchip RAM blocks. The proposed implementation delivers a quasi-perfect gaussian noise which has a maximum relative error of 0.1 % at 4 q compared to the ideal distribution. The FPGA hardware uses only 8% of a lOOK gates FPGA and can deliver a gaussian noise at 20MHz with a period which can last a few days at this frequency. 
Reference
