A hardware simulator facilitates the test and validation cycles by replicating channel artifacts in a controllable and repeatable laboratory environment. This paper presents an overview of the digital block architectures of Multiple-Input Multiple-Output (MIMO) hardware simulators. First, the simple frequency architecture is presented and analyzed. Then, an improved frequency architecture, which works for streaming mode input signals, is considered. After, the time domain architecture is described and analyzed. The architectures of the digital block are presented and designed on a Xilinx Virtex-IV Field Programmable Gate Array (FPGA). Their accuracy, occupation on the FPGA and latencies are analyzed using Wireless Local Area Networks (WLAN) 802.11ac and Long Term Evolution System (LTE) signals. The frequency and the time approaches are compared and discussed, for indoor (using TGn channel models) and outdoor (using 3GPP-LTE channel models) environments. It is shown that the time domain architecture present the best solution for the design of the architecture of the hardware simulator digital block. Finally, a 2×2 MIMO time domain architecture is described and simulated with input signal that respects the bandwidth of the considered standards.
Introduction
The need to improve the performance of wireless networks has led to an increased interest in MIMO communication techniques which offer high data bit rates for wireless systems. The current communication standards show a clear trend in industry to support MIMO functionality. Several studies published recently present systems that reach a MIMO order of 8×8 and higher [1] . This is made possible by advances at all levels of the communication platform as, for example, the monolithic integration of antennas [2] and the design of the simulator platforms [3] .
To evaluate the performance of the recent communication systems, a channel hardware simulator is considered using the recent communication standards based on MIMO techniques. It provides the processing speed required to the evaluation of performance in real-time and allows comparing various systems in the same test conditions. These simulators are standalone units that provide the fading signals in the form of analog or digital samples [4, 5] .
With the continuous increase of FPGA capacity, entire baseband systems can be efficiently mapped onto faster FPGAs for more efficient prototyping, testing and verification. As shown in [6] , the FPGAs provide the greatest flexibility in algorithm design and visibility of resource utilization. Also, they are ideal for rapid prototyping and research use such as testbed [7] .
The simulator is reconfigurable with standards bandwidth not exceeding 100 MHz, which is the maximum for FPGA Virtex-IV. However, in order to exceed 100 MHz bandwidth, more performing FPGA as Virtex-VI can be used [3] . The simulator is configured with LTE and WLAN 802.11ac standards. The channel models used by the simulator can be obtained from standard channel models, as the TGn 802.11n channel models [8] and 3GPP-LTE channel models [9] , or from real measurements conducted with the MIMO channel sounder designed and realized at IETR [10] [11] [12] .
At IETR, several architectures of the digital block of a hardware simulator have been studied, in both time and frequency domains [12, 13] . Moreover, [14] presents a new method based on determining the parameters of a channel simulator by fitting the space time-frequency cross-correlation matrix of the simulation model to the time domain versus frequency domain architectures estimated matrix of a real-world channel. This solution shows that the error obtained can be important.
Typically, wireless channels are commonly simulated using Finite Impulse Response (FIR) filters, as in [13, 15, 16, 17] . Moreover, the Fast Fourier Transform (FFT) module can be used to obtain an algebraic product. Thus, frequency architectures are presented, as in [13, 15] .
The contributions and the structure of this paper are organized as follows.
Section 2 presents the channel models and the Kronecker method used to obtain time-varing channel.
In Section 3, the simple frequency architecture is studied. Then, it is implemented on an FPGA Virtex-IV from Xilinx. The occupation on the FPGA of the architecture, the accuracy of the output signals and its latency are given. The second part of this section presents an improved frequency architecture that accepts long input signals [18] . In fact, the simple frequency architecture limits the input signals to the size of the FFT/IFFT blocks. Moreover, if the signal is larger than the size of the FFT/IFFT blocks, tests will show that if we split the input signals to parts equal to the size of the FFT/IFFT blocks, it will present an error at the output. Therefore, in this section the improved frequency architecture is analyzed, tested and verified.
Section 4 presents a description of the time domain architecture. Then, it is implemented on an FPGA Virtex-IV. The occupation on the FPGA of the architecture, the accuracy of the output signals and its latency are given.
In Section 5, after comparing the improved frequency domain architecture and the time domain architecture, we have chosen the time domain architecture which has a better occupation on the FPGA, better latency and better precision.
For now, the comparison of the previous architectures was made using a SISO channel and long input signals to show their validation in the worst conditions. However, after choosing the best architecture, more realistic conditions have to be considered. Therefore, in Section 6, tests are made with input signal that respects the bandwidth chosen between [∆, B + ∆] and by considering 2×2 MIMO architecture. In fact, the channel impulse responses can be presented in baseband with its complex values, or as real signals with limited bandwidth B between f c -B/2 and f c + B/2, where f c is the carrier frequency. In this paper, to eliminate the complex multiplication and the f c , the hardware simulation operates between ∆ and B + ∆, where ∆ depends on the band-pass filters (RF and IF). The value ∆ is introduced to prevent spectrum aliasing. In addition, the use of a real impulse response allows the reduction by 50% of the size of the FIR filters and by 4 the number of multipliers. Thus, within the same FPGA, larger MIMO channels can be simulated.
Lastly, Section 7 gives concluding remarks and prospects.
Channel Model
A MIMO propagation channel is composed of several time variant correlated SISO channels. For MIMO 2×2 channel, the received signals y j (t,τ) can be calculated using a convolution :
The associated spectrum is calculated by the Fourier transform (using FFT modules):
The development of the digital block of a channel hardware simulator requires a good knowledge of the propagation channel. The different models of channels presented in literature used to apprehend as faithfully as possible the behavior of the channel.
Two channel models are considered to cover indoor and outdoor environments: the TGn channel models (indoor) and the 3GPP-LTE channel models (outdoor). Moreover, using the channel sounder realized at IETR, measured impulse responses are obtained for specific environments: shipboard, outdoor-to-indoor.
TGn Channel Models
TGn channel models [8] have a set of 6 profiles, labeled A to F, which cover all the scenarios. Each model has a number of clusters. For example, model E has four clusters. Each cluster corresponds to specific tap delays, which overlaps each other in certain cases. Reference [8] summaries the relative power of the impulse responses for TGn channel model E by taking the Line-Of-Sight (LOS) impulse response as reference. According to the standard and the bandwidth, the sampling frequency is f s = 165 MHz and the sampling period is T s = 1/f s .
GPP-LTE Channel Models
3GPP-LTE channel models are used for mobile wireless applications. A set of 3 channel models is used to simulate the multipath fading propagation conditions. A detailed description is presented in [9] . For LTE signals, f s = 50 MHz.
Time-Varying Channels
In this section, we present the method used to obtain a model of a time variant channel, using Rayleigh fading [19] and based on Kronecker model [20] .
The Doppler frequency f d is equal to:
.
where c is the celerity and v is the environmental speed. We have chosen a refresh frequency f ref The MIMO channel matrix H can be characterized by two parameters:
1. The relative power P c of constant channel components which corresponds to the LOS. 2. The relative power P s of the channel scattering components which corresponds to the Non-Line-Of-Sight (NLOS). The ratio P c /P s is called Ricean K-factor. Assuming that all the elements of the MIMO channel matrix H are Rice distributed, it can be expressed for each tap by:
where H F and H V are the constant and the scattered channel matrices respectively.
The total relative received power P = P c + P s . Therefore:
. !"
If we combine (5) and (6) in (4) we obtain:
To obtain a Rayleigh fading channel, K is equal to zero, so H can be written as:
P is derived from [8] or [9] for each tap of the considered impulse response. For 2 transmit and 2 receive antennas:
where X ij (i-th receiving and j-th transmitting antenna) are correlated zero-mean, unit variance, complex Gaussian random variables as coefficients of the variable NLOS (Rayleigh) matrix H V .
To obtain correlated X ij elements, a product-based model is used [20] . This model assumes that the correlation coefficients are independently derived at each end of the link:
H w is a matrix of independent zero means, unit variance, complex Gaussian random variables. R r and R t are the receive and transmit correlation matrices. They can be written by:
where 3 is the correlation between channels at two receives antennas, but originating from the same transmit antenna (SIMO). In other words, it is the correlation between the received power of channels that have the same Angle of Departure (AoD). 5 is the correlation coefficient between channels at two transmit antennas that have the same receive antenna (MISO).
The use of this model has two conditions: 1. The correlations between channels at two receive (resp. transmit) antennas are independent from the Rx (resp. Tx) antenna. 2. If s 1 , s 2 are the cross-correlation between antennas of the same side of the link, then :
• s 2 = 3 + 5.
For the uniform linear array, the complex correlation coefficients 3 and 5 are expressed by 6: 6 * 77 8 . * 79 8
where D = 2πd/λ, d = 0.5λ is the distance between two successive antennas, λ is the wavelength and R xx and R xy are the real and imaginary parts of the cross-correlation function of the considered correlated angles: The Power Angular Spectrum (PAS) closely matchs the Laplacian distribution [21, 22] :
where σ is the standard deviation of the PAS (which corresponds to the numerical value of AS). 
Frequency Domain

. Description
In the frequency domain, the architecture for the digital part of the hardware simulator for a SISO channel can be represented by Fig. 1 which describes the digital representation of signals. This architecture uses a Xilinx module performing the FFT and which can be configured to perform as IFFT. The complex multiplier, the memory block and the truncation module will also be detailed. The memory block is used to store the frequency response profiles of the considered channel. The real and imaginary parts of the frequency response are quantified on 16 bits to have a satisfied precision. There are two methods to load the frequency response to the FPGA. The first by saving the frequency responses on the RAM block of the Virtex-IV. By this method, the transfer is made just one time before the compilation of the VHDL program. However, we the number of the RAM blocks and their size are limited, especially for time variant channels to simulate a lot of profiles. In a Virtex-IV, there is 192 RAM blocks of 18 kbit each. For a 2×2 MIMO channel with N F = 512 for outdoor environment (N F is the size of the FFT/IFFT block), then there are 4 SISO channels to simulate, Thus, four frequency response profiles are needed. The data send is equal to 512 × 4 samples of 32 bits, or 2048 samples of 32 bits. Therefore, 65.536 kbit to transmit for a profile. The number of frequency profiles that can be saved in the RAM blocks of the Virtex-IV is 192×18/65.536 = 52 profiles. Thus, 13 profiles for each SISO channel. For N F = 32 (in indoor), 210 profiles for each SISO channel. If these profile numbers are sufficient for the test, we can add a function in the VHDL program which is used to load the profiles by running the address of the RAM blocks in a sinusoidal manner. To load a large number of profiles, the second method consist on using a bus transfer between the computer and the RAM block in the FPGA. The profiles containing these 32 bits samples are stored in a text file on the hard disk of a computer. Then, This file loads the memory block which will supply the hardware simulator. The transfer can be done either by the USB 1.1 interface, either by the PCI interface, both available on the prototyping board used.
In the worst case, which is for 3GPP-LTE model ETU, 
The USB bus does not meet this rate. Thus, the PCI bus has been selected to load the frequency response profiles. It has a rate up to 30 MBps. In addition, the PCI bus is a 32 bit bus, so on every clock cycle, it transmits a complex sample of the frequency response. Moreover, as a SISO channel here corresponds to a profile (512 × 32) bits, or 2048 bytes, the rate of 30 MBps allows us to load 97 SISO channels during the refresh time T ref .
The block diagram in Fig. 2 shows the connection between the PC that contains the file of the frequency response profiles and the card XtremeDSP of Nallatech containing the Virtex-IV where its digital block is shown. The programmable component Spartan-II is dedicated to the treatment of the USB/PCI interfaces. It has been programmed by Nallatech to collect data on the bus and redirect them to the Virtex-IV where the architecture is implemented.
An IP called "Host Interface" reads the data from the PCI bus and store them in FIFO memory. Then the module called "Loading profiles" reads and distributes the values of samples in the two blocks RAM or double port memory block, called "RAM_A" and "RAM_B" as we can see from the following Fig. 3 . This figure details the connection between the IP "Host Interface" and the loading profile block. The two blocks RAM are used to read a profile while loading another. In fact, a signal S control in one hand the demultiplexer, and on the other hand D K controls the multiplexer. Thus, when the multiplexer selects a RAM block to read the 32 values of a complex frequency response profile, the demultiplexer selects another RAM block to write the 32 values of the following profile. Thus, while a profile is used, the following profile is loaded and will used after T ref . The signal S is periodic with a period equal 2.T ref . This method is based on a double buffer operation. The output of the multiplexers is a 32 bit bus with 16 MSB directed to the input of the real samples and 16 LSB to the input of the imaginary samples of the complex multiplier.
The complex multiplier uses the "XtremeDSP" presented on the FPGA, which contains a multiplier of 18 ×18 bits, an adder of 48 bits and a register. After the multiplication, the length of the samples can be up to 128 bits. These multipliers have an internal truncation to provide the user the needed number of bits at the output.
The calculated values of the output of the IFFT block are quantified on M y = 34 bits. The truncation block, located after the IFFT Xilinx block, is necessary to reduce the number of bits of the output samples of the IFFT block to n DAC = 14 bits so these samples can be accepted by the DAC, while keeping the best possible accuracy. Unlike blocks presented above, this block has been programmed. The easiest immediate solution is to keep the 14 MSB. However, for low values, keeping only the MSB can cause null values at the input of the DAC while they were non-null at the output of the IFFT block.
Therefore, instead of a simple brutal truncation, which keeps the first 14 bits starting with the MSB, we considered a sliding window truncation of 14 bits. This truncation is illustrated in Fig. 4 and it considers the most significant bits. This truncation modifies each output sample. Therefore, a reconfigurable amplifier after the DAC must be used to restore the correct output value by multiplying it by a scale factor of 2 L MNO .
Implementation Results
In this section, the implement result of the simple frequency architecture on the FPGA is presented. First, we describe the choice of the input signal used for the test. Then, we implement the architecture on the Virtex-IV which consists the digital block of the hardware simulator.
In this Section, the simple frequency architecture is tested with WLAN 802.11ac and LTE signals for different environments. We have chosen to simulate the TGn model E for WLAN 802.11ac signals and 3GPP-LTE model EVA for LTE signals because they need the same size N F = 128 of FFT/IFFT modules. In that way, a comparison of the architecture in indoor and outdoor can be made. The H vector is implemented and saved on a RAM block in the FPGA Virtex-IV.
A Gaussian input signal x(t) is considered. In fact, the use of a Gaussian signal is preferred because it has a limited duration in both time and frequency domains. Thus, its Fourier Transform can be calculated by FFT block of limited size. The x(t) size is limited by the size of the FFT/IFFT module used in the simple frequency architecture. The x(t) used to test the simple frequency architecture is computed by:
The center of each Gaussian and each e are chosen in a way to show the effect of each path of the taps of the impulse responses on the output signal. The parameters depend on the channel and the standard used. The WLAN 802.11ac signals uses a sampling frequency f s = 165 MHz and a sampling period T s =1/ f s . The last Excess Time Delay (ETD) for TGn model E is 730 ns. Therefore, the size of the FFT/IFFT module will be equal to 730/T s = 120 and rounded to N F =128 (to be written in the form of 2 n where n is an integer). Thus, To compare later the results, it is better to use the same input signal that covers the same area of W t . Thus, we will use the same signal but with LTE parameters. In that way, only the scale factor of the time axis changes. For LTE signals, f s = 50 MHz. The last ETD for 3GPP-LTE model EVA is 2510 ns. Therefore, the size of the FFT/IFFT module will be equal to 2510/T s = 125 and rounded to N F =128. Thus, The occupation on the FPGA is obtained after performing three main operations from the program written in VHDL: the synthesis, the mapping and the place and route. The synthesis is the compilation of a functional description of a circuit to generate a diagram with logic gates and flip-flops. Then the mapping operation describes the combination of these logic gates as LUT, which is a kind of correspondence table as static memory, which allows combining pre-computed values. Finally, after component placement, the routing provides the connection arrangements between logic resources and I/O hardware component. Table 1 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the simple frequency architecture for the TGn model E. Table 2 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the simple frequency architecture for the 3GPP-LTE model EVA. The two channels use FFT/IFFT module of size 128. Therefore, there occupation on the FPGA is almost the same. For TGn model E the sampling frequency is higher than that of 3GPP-LTE model EVA. Thus, it uses more LUT blocks. The FFT block that has a 16 bits input and 16 bits output, needs 3 DSP blocks and 3 RAM block. The IFFT block that has a 24 bits input and 34 bits output, needs 14 DSP blocks and 5 RAM block. Moreover, 3 DSP block are added which are used by the complex multiplier, and 1 RAM block is added to save the channel frequency response.
With a SISO channel, the slice occupation is between 12 and 13, thus a 2×2 MIMO channel can easily be implemented with the additional MIMO circuit.
Improved Frequency Domain Architecture
Description
The simple frequency architecture limits the input signal to the size of the FFT/IFFT blocks. Moreover, if the signal is larger than the size of the FFT/IFFT blocks, tests will show that if the input signal is split to parts equal to the size of the FFT/IFFT blocks, it will present an error at the output. Therefore, an improved frequency architecture is proposed.
To test the architecture with modeled impulse responses, the output can't be predicted. Thus, we present firstly the parameters used for the test of the simple frequency architecture. Secondly, the cause of using a new improved architecture will be presented. Finally, the new improved frequency architecture will be introduced and analyzed.
To test the architecture with an input signal in streaming mode, we use test signals, simple to treat and with a possible prediction of their output signal. In fact, the results obtained with these test signals must be obtained by theoretical calculation. Thus, the ideal case is to use an input signal for the test with finite window in both the time domain and in the frequency domain. The Gaussian signal meets these criteria. The Gaussian is a good trade off for a finite number of points in both frequency and time domains. Thus, to test the architecture, we will use in one hand a Gaussian that stands for by input signal x(t), and on the other hand, a Gaussian signal for the impulse response h(t).
In the frequency domain which interests us here, we will use the Gaussian H(f), which is the FT of the Gaussian h(t), to represent the frequency response that will feed the simulator. The output y(t) will also be a Gaussian.
As we shall obtain the output signal y(t) given by the relation:
We express the signals x(t), h(t) and y(t) by: 
As the convolution in the time domain can be replaced by the multiplication in the frequency domain, we obtain: The test will be made with T s =20 ns which is used by LTE. Moreover, to test the simple frequency architecture with a streaming input signal, we have chosen the size of the FFT/IFFT blocks of N F = 128 (the same as the previous Section) and the window of the input signal equal to 3W t = 3N F . The other parameters of the input Gaussians are determined by: m x = W t /2 and σ f = m x /2. For H(f), its window is equal to W t , m h = W t /2 and e g = m h /2.
The samples of the quantified Gaussian input x(t), put in the VHDL program and generated by MATLAB, are used as input of the FFT block. The quantified Gaussian frequency H(f) is stored in a RAM block.
The FFT 512 block will split the corresponding quantized input vector x in three sub input signals (x 1 , x 2 and x 3 ) of N F = 128 samples each.
Applying these parts to the input of simple frequency architecture whose frequency response is H, we obtain three sub-output vectors y 1 , y 2 and y 3 . To validate the streaming mode, a comparison is made between the concatenation of these three vectors and the theoretical signal y(t) obtained by a convolution, as shown in Fig. 9 . The concatenation of the three sub-outputs obtained by the simple frequency architecture gives a wrong result if we compare it to the theory output signal. As we notice, the output signal using the simple frequency architecture is obtained on a window equal to 3N F T s = 3×512×20ns = 30.72 h s. However, the correct result is obtained on 4N F T s = 4×512×20ns = 40.96 hs.
In fact, each partial result y 1 , y 2 and y 3 must have 2N samples equal in time to 2×512×T s = 20.48 hs (if x 1 , x 2 , x 3 and h have N F samples). Using the simple frequency architecture, the IFFT block gives its result only with N F samples. There is a truncation of each partial result y i . Thus, the concatenation of these partial results gives a wrong result. time domain versus frequency domain architectures Therefore, an improved frequency architecture is proposed as a solution. It is presented in Fig. 10 and it operates using two FFT/IFFT blocks of 256 points. This solution consists on completing each vector x i with N F zeros and on using the FFT/IFFT blocks with size two times larger (2N F ). Each FFT module operates with 16 bit input samples, and has a 12 bit phase factor. The switch signal S provides alternated use of the FFT modules. The start input of the FFT modules is active on the rising edge of the switch signal S. Fig.11 presents the theory output signal versus the output signal obtained by using the improved frequency domain architecture. 
Implementation on FPGA
In this section, we will use an input Gaussian signal x(t) large enough to test the improved frequency architecture in streaming mode. The input signal x WLAN (t) is presented in Fig. 13 . The input signal x LTE (t) is presented in Fig. 14. Table 3 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the improved frequency architecture for the TGn channel model E. Table 4 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the simple frequency architecture for the 3GPP-LTE model EVA. The improved frequency architecture using FFT/IFFT modules of size 256 occupy between 26 to 27 % of slices on the FPGA for one SISO channel. Thus, it has very high occupation. Therefore, it is impossible to implement a 2×2 MIMO system using this architecture on an FPGA Virtex-IV. However, a 2×1 MIMO system or a 1×2 MIMO system can be implemented.
In the case of impulse responses that have a large excess delay, but a small number of non-null taps, a large size for the FFT/IFFT modules is needed. Therefore, the occupation increases significantly. In this case, the time domain architecture can be used which is analyzed in details in the Section.
Time Domain Architecture Design
Description
The block diagram of the digital architecture of the hardware simulator in the time domain is shown in Fig. 15 for one SISO channel. 
The time domain approach is based on a convolution between the input signal x(t) and the channel impulse response h(t).
This convolution product can be presented, as in Fig. 16 , which shows a FIR N filter architecture, with 18 multipliers, for one SISO channel. The number of bits at the output before the truncation is equal to:
where M x = 14 bits is the number of bits of the input signal, M h = 16 is the number of bits of the impulse response and M tap can be expressed by: 
As a SISO channel corresponds to a profile (18taps × 16) bits, or 36 bytes, the rate of 30 MBps allows us to load 7999 SISO channels during the refresh time of 6666 hs.
The loading procedure is the same as described in the previous Section for the frequency approach. However, for a FIR filter, the x(i)×h(i) operation are made for all the impulse response profile at once. Therefore, for an impulse response that has 18 taps and 18 excess delays for one SISO channel, we need to load 36 RAM blocks (Fig. 18) . To respect the refresh period, the second profile is saved in the same way on the 36 RAM blocks. Thus, each RAM block contains two profiles.
The signal "Selector" written on 5 bits, controls the demultiplexer which selects one of the 36 RAM block. The signal "Profile In" takes the values "1" and "0" to show which profile is active and used by the FIR filter. The address "Addr w" is the "Profile out". It takes the values "1" and "0" to select the other profile that the new coefficients will be written on. The reading and writing of the RAM blocks are independent, thus, it is possible to write the new FIR filter coefficients while still reading the old ones. The 36 RAM blocks are loaded and each output "Data out" is directed to a continuous real multiplier where the coefficients are multiplied with the input signal samples contained in the shift register of the FIR filter. The "Addr r" is actually a periodic signal of period twice the sampling period. Thus, all the profiles are charged with the refresh period.
Implementation on FPGA
The occupancy of the time domain architecture is known after performing operations of synthesis, mapping, place and route from the program written in VHDL. Table 5 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the time domain architecture for the TGn channel model E. Table 6 shows the device utilization in one Virtex-IV SX35 for one SISO channel using the time doamin architecture for the 3GPP-LTE model EVA. 
Time Domain Versus Frequency Domain Architectures
Accuracy description
In order to determine the accuracy of the digital block, a comparison is made between the theoretical and the Xilinx output signals. The theoretic output vector of the SISO channel is calculated by:
where i /jk is the number of taps of the impulse response and l m is the vector the taps position.
As we will see in the next section, the Xilinx output and the theoretical output are very close and we can't differentiate them. Thus, we calculated the relative error which is given for each output sample by: The latencies of the architectures are measured from the time where the input signal enters in the ADC and exists from the DAC.
For the improved frequency architecture, the FFT 256 needs 256 cycles to generate its first output sample. Then, the IFFT 256 needs another 256 cycles. Another cycle is needed to the digital adder. Thus, 513 cycles of 165 MHz are needed using WLAN 802.11ac signals and 513 cycles of 50 MHz are needed using 3GPP-LTE channel model EVA. These values are also obtained by ModelSim [23] . It is necessary to add 38 ns of the ADC latency, and 17 ns of the DAC latency, according to their datasheets. In summary, the 
Table of Comparison
To present better the results, the global values of the relative error and SNR has to be calculated. Table 7 presents the results and characteristics of the time domain architecture versus the improved frequency architecture. Three points resume the comparison: the precision of the output signals, the occupation on the FPGA and the latency.
Precision
We start with the precision of the architectures. In the previous figures, we mark t BT-1 , t ST-1 , t BT-2 and t ST-2 as the margin of the small values of the relative error using the BT and ST for the improved frequency and the time domain architectures respectively. The margin using the BT with the improved frequency architecture is t BT-1 =0.53 hs, while with the time domain architecture is t BT-2 = 0.75 hs. Moreover, using the ST with the improved frequency architecture is t ST-1 =0.9 h s, while with the time domain architecture is t ST-2 =1.2 hs. Also we can notice that, in these margins, the relative error is smaller using the time domain architecture, while it present high variations using the frequency domain architecture. The same discussion is made for the relative SNR which is higher using the time domain architecture.
To discuss better the results on all the window of the output signal, the global values of the output signals and SNR are computed and presented in the Table 7 . If we compare the global relative errors, using the results of TGn channel model E for example, we notice that the global relative error decreases from 3.78 % using the improved frequency architecture with BT, to 0.74 % using the time domain architecture. Also, using ST, it decreases from 0.3 % using the improved frequency architecture, to 0.01 % using the time domain architecture. Therefore, after this study, we conclude that the time domain architecture is more accurate than the improved frequency architecture. Moreover, we notice also that using the ST decreases the error using the time domain architecture from 0.3 % to 0.01%.
Moreover, form a theoretical point of view, the improved frequency architecture use many quantified signals for the input signal, the phase factor of the FFT modules, the output FFT signal, the output of the complex multiplier, the frequency responses for its real and imaginary parts, for the IFFT modules and for the output signals. However, the time domain architecture quantifies only 3 signals: the input signal, the impulse response and the output signal. This is the cause why the time domain architecture has a higher precision.
Occupation on FPGA
According to Table 7 , the improved frequency domain architecture presents a slice occupation between 26 and 27 %. However, the time domain architecture occupies 3 to 4 % of slices.
Thus, the improved frequency architecture presents a high slice occupation on the FPGA if we compare it to the time domain architecture. It requires more performing FPGAs to implement high order MIMO channels.
However, in order to simulate an impulse response with more than 192 taps, the new frequency architecture can be used. With a FPGA Vitrex-IV, the size N F of the FFT/IFFT modules can be chosen up to 65536 in contrast with a FIR filter which is limited to 192 multipliers or DSP blocks on the FPGA, which is a limitation of 192 taps for the impulse response.
Latency
The latency using the time domain architecture is between 91 and 155 ns. However, using the improved frequency architecture, it is between 3.16 and 10.31 hs.
The latencies using the time domain architecture are way better that the latencies obtained by the improved frequency architecture. In fact, with a FIR filter, the samples are computed together in one stroke, however, with the frequency architectures the samples are obtained after charging the entire coefficient in the FFT/IFFT modules.
Adopted Time Domain Architecture
After comparing the previous architectures, we have chosen the time domain architecture which has a better occupation on the FPGA, better latency and better precision.
The comparison of the previous architectures was made using a SISO channel and long input signal to show their validation in the worst conditions. However, after choosing the best architecture, we have to consider more realistic conditions. First, the input signal has to respect the bandwidth chosen between [∆, ∆+B]. Secondly, for the new standards, we have to consider working with MIMO systems. For simplification, a 2×2 MIMO system is considered.
Real Input Signal
In order to determine the accuracy of the digital block, a comparison is made between the theoretical/Xilinx output signals. An input Gaussian signal x(t) is considered for the two inputs of the 2×2 MIMO simulator. To simplify the calculation, we consider x 1 (t) = x 2 (t): 
Thus, e 7 that corresponds to the considered band of the standard used, is obtained:
To obtain x(t) centered between [∆, ∆+B], it must be multiplied by:
In our work, we considered e 7 = B π / 3
. m x is chosen equal to 20T s > 3e 7 for both WLAN 802.11ac and LTE signals. Moreover, ∆ << B is chosen equal 2 MHz. These values are small enough to show the effect of each tap on the output signal. For WLAN 802.11ac, B = 80 MHz and T s = 1/f s = 6 ns. Thus, we obtain e 7 = 2T s . This signal is named x WLAN (t) and is presented in Fig. 25 . For LTE, B = 20 MHz and Ts = 1/fs = 20 ns. Thus, we obtain e 7 = 2.5Ts. This signal is named xLTE(t) and is presented in Fig. 26 . 
2×2 MIMO Architecture
Four FIR filters are considered to simulate 2×2 MIMO channels. For each SISO channel, the FIR length and the number of used multipliers are determined by the non-null taps of the impulse responses. To use a limited number of multipliers on the FPGA, the delays addresses are controlled by connecting each multiplier block of the FIR by the corresponding shift register block. Thus, the number of multipliers in the FIR filters is equal to the maximum number of non-null taps.
The theoretic output signals of a 2×2 MIMO channel are calculated by:
N t is the number of taps of the impulse response. h q (i k ) is the attenuation of the k th path with the delay i k T s . Figure 27 presents 2×2 MIMO time domain architecture based on 4 FIR filters with N t = 18 multipliers. We have developed our own FIR filter instead of using Xilinx MAC FIR filter to make it possible to reload the FIR filter coefficients. The number of bits at the output before the truncation is computed by:
where M MIMO is computed by:
where in our case M MIMO = 1 bit for 2×2 MIMO system (for the sum: y 11 +y 21 and y 12 +y 22 ). 
Occupation on FPGA
As the development board has 2 ADC and 2 DAC, it can be connected to only 2 down-conversion and 2 up-conversion RF units. Four FIR filters are needed to simulate a one-way 2×2 MIMO radio channel. The occupancy of the time domain architecture is known after performing operations of synthesis, mapping, place and route from the program written in VHDL. Table 8 shows the device utilization in one Virtex-IV SX35 for 2×2 MIMO channel using the time domain architecture for the TGn channel model E. Table 9 shows the device utilization in one Virtex-IV SX35 for 2×2 MIMO channel using the time domain architecture for the 3GPP-LTE model EVA.
We notice that the occupation of slice on the FPGA of a 2×2 MIMO system is 16 % for the TGn channel model E and 16 % for the 3GPP-LTE model EVA. In fact, these occupations are equal to the occupations of a SISO channel multiplied by four and with additional slices added because of the two digital adders that operates y 11 + y 21 and y 12 + y 22 . Moreover, the 2×2 MIMO system has small occupation on the FPGA Virtex-IV. In fact, we can implement up to 4×4 MIMO system in the FPGA for the 3GPP-LTE model EVA (because for TGn channel model E the number of multiplier is equal to 18×(4×4) = 288>192). However, we are limited by the 2 ADC and the 2 DAC. Table 10 shows the global values of the relative error and SNR for the considered 2×2 MIMO time domain architecture of the TGn channel model E and 3GPP-LTE channel model EVA. Fig. 28 , 29, 30 and 31 presents the Xilinx output signal, the relative error and the relative SNR for y 1 (t) and y 2 (t) using TGn model E with x WLAN (t) at the input and 3GPP-LTE channel model EVA using x LTE (t), for the 2×2 MIMO time domain architecture. We can see that the benefit of a ST in the case of using a real signal that respects the band [
Results and Accuracy
]. Also, as we can see from the figure of the relative errors that the ST provide a low variation around zero in the margin of time where the output signal is high. However, the BT presents high variations of the relative error and on lower time margin.
The latency of the 2×2 MIMO time domain architecture is calculated in the same as previously, however, one additional cycle is needed to sum the outputs for the 2×2 MIMO system. Thus, in summary, the time domain architecture and the converters have a latency, using 2×2 
Conclusion
In this paper, the frequency approach has been presented and analyzed in detail. First, the simple frequency architecture has been studied. Each block that compose it, form the FFT/IFFT modules, to the multiplier, the truncation, the memory and to the convertors, have been presented, analyzed and detailed. The size of the FFT/IFFT modules depends on the last excess delay of the impulse response. After that, the entire simple SISO frequency architecture has been implemented on the FPGA. It has been tested with Gaussian input signal that have limited duration in time and frequency domains. Its occupation on the FPGA (12 % of used slices), latency and accuracy have been analyzed. It has been shown that the ST reduces the relative error of the output signal significantly. After testing the simple frequency architecture with long input signal that has a duration larger than the size of the FFT/IFFT blocks, it has been shown that this architecture gives wrong output results. Therefore, we analyzed an improved frequency architecture that works for input signals in streaming mode. The improved frequency architecture for a SISO channel has been implemented on the FPGA and the results were provided. It has been shown that the improved frequency architecture has very high occupation (27 % of used slices) on the FPGA. Therefore, it is impossible to implement a 2×2 MIMO system using this architecture on an FPGA Virtex-IV.
The time domain architecture of the digital part of the hardware simulator has also been analyzed. Each block that compose it, form the FIR filter (that contains the multiplier, the shift register and the memory) to the truncation, have been presented, analyzed and detailed. The number of multipliers used depends on the number of non-null tap of the impulse response. After that, the entire simple SISO time domain architecture has been implemented on the FPGA and the results were provided. A comparison between the time domain architecture and the improved frequency architecture has been made with the same input signal and for the same channel. It has been show that the time domain architecture has a better occupation on the FPGA (4 % of occupied slices instead of 27 % using the improved frequency architecture), a better latency (of 155 ns instead of 10.31 µs using the improved frequency architecture), and better precision (up to 76 dB instead of 46 dB using the improved frequency architecture). The comparison of the previous architectures was made using a SISO channel and long input signal to show their validation in the worst conditions. However, after choosing the best architecture which is the time domain architecture, we have considered more realistic conditions. First of all, the input signal has to respect the bandwidth chosen between [∆, ∆+B]. Secondly, for the new standards, we have considered working with 2×2 MIMO systems. The channel has been simulated using these two conditions. For our future work, simulations made using a Virtex-VII [3] XC7V2000T platform will allow us to simulate up to 300 SISO channels. In parallel, measurement campaigns will be carried out with the MIMO channel sounder realized by IETR to obtain the impulse responses of the channel for specific and various types of environments. The final objective of these measurements is to obtain realistic MIMO channel models in order to supply the hardware simulator. A graphical user interface will also be designed to allow the user to reconfigure the simulator parameters.
