In this paper, the emulation environment named Hardware Discrete Channel Emulator (HDCE) has been developed as a coherent framework to emulate on a hardware device (FPGA as the implementation platform in the verification) and simulate on a computer the effect of an Additive White Gaussian Noise (AWGN) in a base band channel. The HDCE is able to generate more than 180 M samples per second for a very low hardware cost, which has been achieved in an efficient architecture. Using the HDCE, the performance evaluation of a coding scheme for a BER of 10-9 requires only one minute of emulation time.
INTRODUCTION
Advanced digital wireless communication systems often require an appropriate trade-off between complexity and performance of an efficient iterative decoder design. In practice, from the floating point to the fixed-point hardware description, many parameters (reliability message length, digital wordsize, rounding and quantization operations, etc.) should be jointly optimized. However, these parameters interact in a non-linear way and the selection of the optimal algorithm is a very high timeconsuming task. Usually, the formal expression of Bit Error Rate (BER) or the Frame Error Rate (FER) expressed in [1] has been used to predict the performance of the system. The conventional solution is the MonteCarlo simulation that evaluates the BER, which gives an estimation of the error correcting capability of the decoder.
The Monte-Carlo simulation method is traditionally performed by software programs. With this approach, a FER around 10-8 requires one or two weeks of simulation. To speed up these very long simulations, some software approaches are proposed, such as, a reduced Monte-Carlo simulation method [2] that re-runs only erroneous codewords obtained from an initial "classical" MonteCarlo simulation. We then propose a technique called the distance-based method which is based on the direct evaluation of a distance between the soft output of the suboptimal decoder and the soft output of a reference decoder [3] . Although these methods reduce the simulation time, the software based execution (for instance, executing applications on a conventional CPU cluster) is still infeasible due to the high power consumption and physical space cost. Consequently, we have turned our attention to the hardware accelerator based simulation.
In our previous works [4, 5, 6] , as well as in the works of Dong-U Lee et al. [7, 8] , the hardware emulation leads to a significant speed-up factor (from a few hundreds to a few ten-thousands) in terms of simulation time. Those designs are based on the realization of a normal random variable by using the Box Muller method [9] , the Wallace method [10] , or Box Muller and Centre-Limit theorem. The normalized variable is then multiplied by the variance and quantified. In this paper, the proposed approach is to directly generate the quantified Gaussian variable by the application of the alias method [11] . We denote the conception of a hardware channel emulator, called HDCE for Hardware Discrete Channel Emulator. The HDCE aims at emulating the effect of a base band discrete channel and has several properties: high accuracy, high speed (more than 180 M samples per second) and also the capacity of seamless switching from hardware emulation to software simulation and vice-versa. The last feature will allow us to combine a reduced Monte-Carlo simulation method [3] and hardware emulation, thus giving a set of very efficient and complementary methods to perform the optimization of decoders. Since the Gaussian channel is widely employed as the universal discrete channel, the proposed HDCE has been dedicated for an Additive White Gaussian Noise (AWGN) channel. Even though, the HDCE still satisfies different discrete channel models. It only requires its user to focus on the adaptation of distributions for the channel specification. This paper is divided into seven sections. First, the discrete Gaussian channel model is depicted in Section 2. Then, the method of Gaussian random variable generation is presented in Section 3. Section 4 presents the architecture of the HDCE. The applications of the HDCE are showed in Section 5. In Section 6 we demonstrate the experimental result of the HDCE and give the conclusion in the final section.
GAUSSIAN CHANNEL MODEL
According to the Noisy Channel Coding theorem developed by C. E. Shannon in 1948 [12] , a binary source generates bits that are encoded, modulated and affected by a noise in the channel. In our design, we directly consider the AWGN channel combined with the Analog-to-Digital Converter (ADC), as shown in Figure 1 . 
, if x Q = a, then:
Thus, P(xQ = a) for 1 2 1 2
can be expressed as: 
Similarly, for a = amax = 2q-1-1, we obtain:
We propose a component to directly generate the quantized version of the Probability Density Function (PDF). The Gaussian random variable generator is presented in the following section.
GAUSSIAN RANDOM VARIABLE GENERATION
The main aspect of the Gaussian channel emulator is the generation of Gaussian random variable. The method is conventionally achieved by the transformation of a uniformly distributed random variable in a non-uniformly distributed random variable. In the literature, four wellknown methods are used: the Box-Muller method [9] , the Ziggurrat method [13] , the Wallace method [10] and the combination of Box-Muller method and Centre-Limit theorem. Moreover, D. B. Thomas and W. Luk proposed a non-uniform random number generator that performs automatic customization of the distribution using a hybrid of the piecewise linear approximations and the alias method [14] . In our design, the alias method which is detailed in following is also employed for the Gaussian random variable generation.
The Alias Method
The alias method was initially proposed by A. J. In a more general case, we obtain, 
where T(alias(p)=a) equals 1 while alias_value(p) = a, 0 otherwise. From the alias table, the PDF of the generated random variables can be computed.
Generation of the Alias Table
The construction of the alias table to obtain a given PDF is not a trivial problem and we have already tried several methods to find an optimal solution. The proposed method separates the problem of quantification of the initial PDF and the problem of the construction of the alias table.
Let X be a random variable taking its values in the set [0, 2 q -1], X is characterized by its PDF {P(X=i), i = 1...2q-1}. The first step is to quantize the PDF of X with q+l bits of precision to obtain an approximated random variable X .
The direct quantization gives:
where   . In fact, due to quantization effect, the sum can have p quantums δ = 2 -(q+l) below or above 1. In this case, a post-processing is performed to increase or decrease δ   in such a way that the summation of p probability values remains at value 1. The p values are chosen to minimize the quantization error.
ARCHITECTURE
The architecture of the HDCE is composed of two blocks, one to generate the pseudo-random uniform variable using Linear Feedback Shift Register (LFSR) [15] , the other to Figure 2 . Generation of a sample translate the uniform random variable into a non uniform random variable using the alias method. To obtain a sample of a particular channel realization, as shown in Figure 2 , several parameters should be indicated to the hardware, -The number of quantization bits q of the received signal. -The internal precision l of the alias table.
-The number N of different PDF implemented.
-The index of the PDF required to generate the sample.
Architectures of these blocks are described in more details in the following subsections.
LFSR
To simplify the design of the LFSR , we used a size-63 LFSR that has a Repetition Period (RP) of RP=263-1. Using the HDCE at a frequency of 100 MHz, the time before a repetition is then equal to: Note that q+l 63-bit LFSR works simultaneously to generate q+l binary RV, the initial state of the q+l LFSR should be different, and ideally, uniformly spread among the RP of the LFSR. Let S(k) (62 DOWNTO 0) be the state of the LFSR at time k, than, at time k+1, the state S(k+1) will be:
For more information on LFSR, please refer to [16] . A simplified architecture of the LFSR is given in Figure 3 . Finally, the signal read_seed allows to feedback directly FIFO (62) to the input of the FIFO. Thus, in 63 cycles, the 63 values of the LFSR are sent to the output via rv_out. Moreover, the final state after 63 cycles is equal to the initial state: the data have only performed a cyclic rotation.
Direct Computation of the LFSR Seed
To replay the simulation, the rehabilitation of former random variable is needed. Thus, here, we present an algorithm for the direct computation of the LFSR seed.
The LFSR is a linear structure, i.e. if S(k) = A(k) + B(k), then, for all p, S(k+p) = A(k+p) + B(k+p). The state of the LFSR can be written as:
where S(k)(i) is the i th component of binary vector S(k) and U i (0) is an unitary vector of size 63 that contains a single non zero value at position i considered at time k=0, i.e., U i (0)(j) = 0 if j≠i and U i (0)(i) = 1.
By using the linearity of the LFSR structure, for any p>0, we can directly compute the state S(k+p) of the LFSR at time k+p as a function of the vector U j (p). Then, the computation of S(k+p) requires only the XOR of 63 vectors.
The question that now arises is how to compute U j (p) in a simple way. Let us first assume that p is a power of 2, i.e., p=2k. For k=0, U j (1) is computed using (13) . Then, by recursion, assuming that the U j (2k), j=0..62 are known for a given k, then, for j=0 ... 62 , U j (2 k+1 ) are computed using: 
To sum up, it is possible, at a cost of 63 2 = 4048 63-bits XOR, to compute the S(k+p) for any S(k) and any p. This algorithm facilitates the computation of the LFSR seed with a given number of cycles and an initial seed.
Alias Method
With the generation of a pseudo-random uniform variable thanks to the LFSR, the translation of the uniform into a non uniform random variable will be accomplished with the alias method. After the construction of the alias table as explained in Section 3, the alias method is simple and can be expressed in the following few pseudo-code words.
To allow for a high clock frequency, even when l = 32 (i.e. if a 32-bits comparator is required), we have defined a pipelined structure which is shown in Figure 4 . TableInput: two uniform random variables S0 (on q bits) and rv (on l bits). Output: "alias(S0)", while threshold(S0)< rv, "S0", otherwise. Firstly, the signal index denotes the index of the PDF required to generate the sample, and the signal rv_out is the uniform variable output of LFSR. The signal en follows the pipeline in order to be synchronized with the output of the HDCE. The pipeline structure consists of three phases: the two variables, threshold fetched from alias table and rv_out, are registered in the first phase; then, the second phase executes the comparison in every 8 bits; at last, the determination will be done. The reconfiguration of the distributions in the hardware is carried out by rewriting the alias table. It allows the system to play several testing scenarios without the need of synthesis, place-route and configuration of the FPGA. In our design, the alias tables are stocked as RAM with size of (l+q)×2 q .
Applications
In this section, we first present two practical applications where the HDCE has been employed as a tool for the design and the test of Low Density Parity Check (LDPC) decoders. We then present other possible application scenarios.
Test of an LDPC Code Architecture
Due to the linearity of the LDPC decoder, we first consider an 'all zero codeword' with a Binary Phase-Shift Keying (BPSK) transmission on an AWGN channel. First, we precompute an alias table for each required SNR value. These alias tables are stored in a ROM and the index signal addresses the alias table corresponding to the required SNR values. Figure 5 shows the all zero codeword model for the test of FER or/and BER. In the compute errors block linked with the decoder output, each non-zero value is counted as a bit error. With the bit error, the frame error is easily deduced and the index (SNR) is incremented when a maximum number of frame errors is reached. This block is low cost and easy to design. The test patch (the HDCE and the compute errors block) can be included as a part of a decoder chip or IP for built-in SNR estimation and testing purposes at a low area cost. It would take less than 2 % of the area of a DVB-S2 LDPC decoder [17] .
In the second application, the goal is to test the performance of the LDPC codes in a communication model. In Figure 6 , the codeword from the encoder is sent serially to emulate a BPSK modulation on the AWGN channel. This bit is concatenated with the SNR value to address the correct alias 
Other Applications
The previously described models have been used to test the LDPC encoder and decoder, but it also suits any other error control codes. Additionally, in terms of channel emulation, channels other than AWGN can be also emulated. The Rayleigh fading channel is a reasonable model to simulate the effect of heavily built-up urban environments on terrestrial wireless communication. Usually, the Rayleigh channel can be built from two uncorrelated variables: Y= R.X + G where R is a variable with Rayleigh distribution, G is a variable with Gaussian distribution, X the transmitted signal and Y the received signal.
The HDCE can be used to emulate at high frequency a Poisson distribution, exotic distribution, or any required discrete distribution. It is also possible to manage on the fly distribution evolution with the use of the index as shown in the applications.
6. EXPERIMENTAL RESULT
Complexity & Performance of the HDCE
The synthesis of the HDCE has been completed for different FPGA targets. Tables (LUT) by the synthesis tool. Substantially, the complexity is evaluated as the number of LUT. Basically, the maximum frequency of the HDCE is almost independent of its configuration. The complexity of the HDCE depends on the parameters. In this case, if the parameters lead to a high value of LUT used as RAM, it is possible to impose directly the embedded RAM of the FPGA, or eventually, an external RAM.
Accuracy of the HDCE
The evaluation of the reliability of a Gaussian channel emulator mainly stems from the trivial difference between empirical PDF and standard PDF. Nevertheless, observing from a novel standpoint, the precision of the tails of the Gaussian distribution is a good indicator of accuracy. Thus, we evaluate our emulator through the difference of the tails. In table 2, we give the percentage of the difference to a reference standard PDF of a Gaussian distribution (0, 1) for x = 2σ, 3σ, 4σ and 5σ with various l value. The Ref.
PDF represents the percentage of standard probability in vary x, also, the Diff. denotes the difference percentage to the Ref. PDF. 
CONCLUSION
In this paper, we have presented a tool called HDCE for Hardware Discrete Channel Emulator which allows emulating the effect of a base-band Gaussian channel directly in hardware. The HDCE was presented in 4 steps. First, we derived the theoretical model of the HDCE, i.e., we have mathematically defined the expression of the PDF of the output of the channel. Secondly, we explained how to transform a uniform distribution to a given non-uniform distribution thanks to the alias method. Subsequently, the architecture of the HDCE was exhibited by two blocks, the LFSR and the alias method. After which, the applications of the HDCE were demonstrated and it was explained that the HDCE was also compatible with different channel models, other than just the AWGN channel. Finally, we gave the evaluation of the HDCE observed by the trade-off between complexity and performance and the accuracy.
Overall, the proposed HDCE that has been achieved in an efficient architecture is able to emulate the effect of a base band Gaussian channel with several properties: high accuracy, high speed (more than 180 M sample per second) and also the capacity of seamless switching from hardware emulation to software simulation and vice-versa.
This last feature allows for the combining of a Reduced Monte-Carlo Simulation method and hardware emulation, thus, providing a set of very efficient and complementary methods to perform the optimization of decoders.
