Abstract This paper presents a new frequency domain architecture for the digital block of a hardware simulator of MIMO propagation channels. This simulator can be used for LTE and WLAN IEEE 802.11ac applications, in indoor and outdoor environments. It accepts signals in streaming mode. A hardware simulator must reproduce the behavior of the radio propagation channel, thus making it possible to test "on table" the mobile radio equipments. The advantages are: low cost, short test duration, possibility to ensure the same test conditions in order to compare the performance of various equipments. After the presentation of the general characteristics of the hardware simulator, the new architecture of the digital block is presented and designed on a Xilinx Virtex-IV FPGA. It is tested with time-varying 3GPP TR 36.803 channel model EVA and TGn channel model E. Finally, its accuracy is analyzed.
Introduction
The Long Term Evolution (LTE) and the Wireless Local Area Networks (WLAN) IEEE 802.11ac are mobile and wireless telecommunications standards of the fourth generation and beyond, able to offer to general public high-rate multi-media services.
Wireless communication systems may offer high data bit rates by achieving a high spectral efficiency using Multiple-Input Multiple-Output (MIMO) techniques. MIMO systems make use of antenna arrays simultaneously at both transmitter and receiver site to improve the capacity and/or the system performance. However, the transmitted electromagnetic waves interact with the propagation environment. Thus, it is necessary to take into account the main propagation parameters during the design of the future communication systems.
A wireless system can be tested either in real propagation environments or by using a simulator reproducing the propagation channel behavior. Tests conducted under real conditions are difficult, because tests taking place outdoors, for instance, are affected for example by the weather and season that change all the time. In addition, a test conducted in one environment (city A) does not fully apply to a second corresponding environment (city B). Moreover, usually it is not possible to test the worst situation under real conditions.
The use of hardware simulators allows reproducing, at low cost, a desired type of radio channel. Moreover, it provides The channel models used by the simulator can be obtained from standard channel models, as the 3GPP TR 36.803 [8] and the TGn 802.11n [9] , or from real measurements conducted with the MIMO channel sounder designed and realized at our laboratory.
The channel sounder is presented in [10, 11] and shown in Figure 1 . In the MIMO context, little experimental results have been obtained regarding time-variations, partly due to limitations of measuring equipment [12] . In our work, time-varying channels are considered using Rayleigh fading [13, 14] .
Typically, wireless channels are commonly simulated using finite impulse response (FIR) filters, as in [15, 16] . The FIR filter performs a convolution between a channel impulse response (CIR) and a fed signal in such a manner that the signal delayed by different delays is weighted by the channel coefficients, i.e. tap coefficients, and the weighted signal components are summed up. The channel coefficients are periodically actualized in order to reflect the behavior of an actual channel. Nowadays, different approaches have been widely used in filtering, such as distributed arithmetic (DA) and canonical signed digits (CSDs) [17] .
However, using a FIR filter for a Single-Input Single -Output (SISO) channel simulator presents some limitations. In fact, the number of operations caused by multiplying by the channel coefficients and summing the delayed signals increase quadratically with the length of FIR filter. Covering a long delay period by a large number of delay elements is not practical, because, in this case, it becomes difficult to perform the calculation sufficiently quickly.
With a FPGA Virtex-IV, tests show that it is not possible to simulate a FIR filter that has more than 192 multipliers (impulse response with more than 192 taps).
In order to simulate an impulse response with more than 192 taps, the Fast Fourier Transform (FFT) module can be used. With a FPGA Vitrex-IV, the size N of the FFT module can be chosen up to 65536. Thus, frequency architectures are presented, as in [18, 19] . In fact, [19] presents a new method based on determining the parameters of a channel simulator by fitting the space time-frequency cross-correlation matrix of the simulation model to the estimated matrix of a real-world channel. This solution shows that the obtained error can be important.
Also, a proposed VLSI implementation shows that for high order MIMO arrays, frequency domain architectures are highly modular and scalable by design.
At IETR, several architectures of the digital block of a hardware simulator have been studied, in both time and frequency domains [15, 18] . However, the previous considered frequency domain architectures operate correctly only for signals with a number of samples not exceeding N. Thus, a new frequency architecture avoiding this limitation is presented in this paper.
This new scheme is tested with TGn channel model E and 3GPP TR 36.803 Vehicular A (EVA) channel model. In addition, the test indicates a Signal to Noise Ratio (SNR) of 56 dB which is higher than the SNR presented in previous architectures [20] .
The rest of this paper is organized as follows. Section 2 presents channel models used for our tests. Section 3 presents the new frequency domain architecture of the digital block of the hardware simulator which is described in details. Section 4 shows the actual realization of the digital block. The prototyping platform is described and simulated. The accuracy of the new architecture is also analysed. Lastly, Section 5 gives concluding remarks and prospects of this work.
Channel Model
The simulator must reproduce the behavior of a MIMO propagation channel. The design of the RF blocks for the Universal Mobile Telecommunications System (UMTS) was completed in a previous work [18] . The simulator is able to accept input signals with wide power range, between -50 and 33 dBm, which implies a power control for the simulator inputs.
The objectives of our study mainly concern the channel model and the digital block of the MIMO simulator, as shown in Figure 2 . The output signal y of the FIR filter can be presented as convolution that is a sum of the products of the delayed input signal x and the weighting coefficients h, as: where 7 is the convolution operation, k is the number of the tap of h and T s is the sampling period.
In the present solution, Fast Fourier Transformation (FFT) and Inverse Fast Fourier Transformation (IFFT) are used.
A MIMO channel is composed of several time variant correlated SISO channels. According to the considered propagation environments, Table 1 summarizes some useful parameters for LTE and WLAN 802.11ac standards. [10, 11] . Moreover, at first, we introduce the method used to obtain the time-varying channel for the TGn and the 3GPP models for a MIMO 2×2 propagation channel.
2.1.3.GPP TR 36.803 channel model EVA 3GPP TR 36.803 channel models are used for mobile wireless applications. A set of 3 channel models are implemented to simulate the multipath fading propagation conditions. A detailed description is presented in [8] . The definition of the EVA channel model is shown in Table 2 . 
TGn channel model E
TGn channel models [9] have a set of 6 profiles, labeled A Each model has a number of clusters. For example, model E, which is used for indoor environment, has four clusters. Each cluster corresponds to specific tap delays, which overlap each other in certain cases.
In our work, tests are made with TGn channel models using 802.11ac standard with a bandwidth of 80 MHz. The sampling frequency and the period are f s = 180 MHz and T s = 1/f s respectively. Table 3 summaries the relative power of different taps of the impulse responses for TGn channel model E by taking the LOS path as reference [9] . The relative powers of the taps of all impulse responses for all TGn channel models are presented in [9] . 
Channel sounder
Channel models can also be obtained from measurements by using the time domain MIMO channel sounder designed and realized at the IETR [10] and shown in Figure 1 . The measurement campaign was carried out using this MIMO sounder for indoor, outdoor and outdoor to indoor environments as in [11] . The obtained MIMO impulse responses will be used by the hardware simulator.
Our channel sounder uses a periodic PN sequence. It offers 11.9 ns temporal resolution for 100 MHz sounding bandwidth. The used carrier frequencies are 2.2 GHz and 3.5 GHz. The synchronization between the transmitter and the receiver is achieved with highly stable 10 MHz rubidium oscillators.
Different architectures of antenna arrays can be used for outdoor and indoor measurements [21] . Two UCA (Uniform Circular Array) were developed at 3.5 GHz ( Figure 6 ) to characterize 360° azimuthal double directional channel at both link sides. The transmitter (Tx) contains 4 active elements and the receiver (Rx) 16 . At the transmitter we integrated the power amplifiers close to antenna array to increase the transmitted power, and at the receiver we added Low Noise Amplifiers (LNA) behind the antennas to obtain more dynamic measurements. This antenna array enables the characterization in azimuth and elevation plans in order to be used for indoor and penetration environments.
Time-varying channel for TGn and 3GPP models
The intent of the IEEE 802.11n channel model was to simulate an indoor home or office environment in which the wireless devices are fixed but the channel is dynamic due to the people moving in the environment [9] . This explicitly differs from outdoor mobile systems where the user terminal is moving [8] In order to obtain a time-varying channel, we consider a 2×2 MIMO Rayleigh fading channel using the same method as in [9] . The MIMO channel matrix H for each tap, at one instance of time, can be separated into a fixed (constant, Line-of-Sight or LOS) matrix and a Rayleigh (variable, Non Line-of-Sight or NLOS) matrix [21] :
where K is the Ricean factor, and P is the power of each tap. For 3GPP channel model EVA, P is given in Figure 4 for each of the 9 taps. For TGn channel model E, P is given in Figure 5 for each of the 18 taps. K is equal to zero to obtain a Rayleigh fading channel, so H can be written as:
For 2 transmit and 2 receive antennas:
where X ij (i-th receiving and j-th transmitting antenna) are correlated zero-mean, unit variance, complex Gaussian random variables as coefficients of the variable NLOS (Rayleigh) matrix H V . To correlate the X ij elements of the matrix X, a product-based model is used. This model assumes that the correlation coefficients are independently derived at each end of the link. It can be expressed by:
where R tx and R rx are the receive and transmit correlation matrices, respectively. H iid is a matrix of independent zero means, unit variance, complex Gaussian random variables. It is a Rayleigh fading channel and it depends on the speed of the environment [14] . R tx and R rx can be written:
where C txij are the complex correlation coefficients of the angles of departure between i-th and j-th transmitting antennas, and C rxij are the complex correlation coefficients of the angles of arrival between i-th and j-th receiving antennas. For the uniform linear array, the complex correlation coefficient is expressed as:
where D=2Hd/λ, d=0.5λ is the distance between the two correlated antennas, λ is the wavelength and R XX and R XY are the cross-correlation functions between the real parts (equal to the cross-correlation function between the imaginary parts) and between the real part and imaginary part respectively of the considered correlated angles:
The calculation of the complex correlation coefficients for each tap delay is based on the PAS (Power Angular Spectrum) with AS (Angular Spread) being the second moment of PAS. AS can be found in [8, 9] for the 3GPP channel models and the TGn channel models respectively. The PAS is found to closely match the Laplacian distribution [22] [23] [24] :
where V is the standard deviation of the PAS (which corresponds to the numerical value of AS).
New Design of the Digital Block of the Hardware Simulator
This part presents an improved frequency domain architecture for a SISO channel, which can be used in streaming mode, in contrast to the simple frequency domain architecture presented in [18] . First, the error of the simple frequency architecture is presented. Then, the new frequency architecture is described in details. Figure 9 describes simple frequency domain and time domain architectures of the digital block of a SISO channel, which were presented in [18] .
Previous Frequency Domain Architecture
The simple frequency domain architecture is tested with 3GPP channel model EVA. A continuous Gaussian signal x(t) is considered. This signal is long enough to use the FFT/IFFT blocks in streaming mode (the use of a Gaussian signal is preferred because it has a limited duration in both time and frequency domains. Thus, its Fourier Transform can be calculated by FFT block of limited size):
where N = 128 (the closest 2 n to the last excess delay presented in Table 2 H is the presentation of h (given in and each w l is quantified on 12 bits. The FFT 128 will split the corresponding quantized input vector x in three parts (x 1 , x 2 and x 3 ) of 128 samples each. Applying these parts to the input of a linear system whose frequency response is H, we obtain three output vectors y 1 , y 2 and y 3 . To validate the streaming mode, a comparison is made between the concatenation of these three vectors and the theoretical signal y(t), as shown in Figure 11 .
The theoretical result is obtained by (1) 
Therefore, an improved frequency architecture is proposed as a solution. It is described in details and it is implemented on the platform of an FPGA Virtex-IV. 
New Frequency Domain Architecture
This part presents an improved frequency domain architecture [25] which can be used in streaming mode, in contrast to the simple frequency architecture presented in Figure 9 .
The new frequency domain architecture presented in Figure 12 will operate using two FFT/IFFT blocks of 256 points. Each 128 input samples fed alternately a FFT module due to a switch signal S. To avoid increasing their size, it is convenient to preserve the size of FFT/IFFT blocks and to split the input test vector x into six parts, each one with N/2 = 64 samples. However, in our case, the last excess delay of the impulse response is 125Ts (Table 2) . Thus, it is not possible to work with xi signals less than 128 samples.
Each FFT module operates with 12-bit input samples, and has a 12-bit phase factor. The switch signal S provides alternated use of the FFT modules. The start input of the FFT modules is active on the rising edge of the switch signal S. The block delay takes into account the processing delay of the FFT modules and the delay of the multipliers. Figure 13 presents the operating principle of the architecture and the result on 4W t of each partial response y i . As in [15, 18] , the truncation block, located at the output of the digital adder, is used to reduce the number of bits of the signal obtained at the output of the final adder to 14 bits so that these samples can be accepted by the Digital-to-Analog Converter (DAC).
The immediate solution is to keep the 14 first most significant bits. It is called a "brutal" truncation.
However, for low values of the output of the digital adder, the brutal truncation generates zero values to the input of the DAC. Therefore, a better solution is the sliding window truncation presented in Figure 14 , which uses the 14 most effective significant bits. For TGn channel model E, N eff = 131 samples. However, to test the new architecture, it is mandatory to extend each partial input signal with a "tail" of N zeros. Therefore, the FFT module used has 512 samples. The new frequency architecture with TGn channel model E is presented in Figure 15 .
Implementation
In order to implement the hardware simulator, the adopted solution uses a prototyping platform from Xilinx (XtremeDSP Development Virtex-IV) [7] presented in Figure 16 .
The simulations and synthesis are made with Xilinx ISE [7] and ModelSim software [26] .
Description
The XtremeDSP development board features dual-channel high performance ADCs (AD6645) and DACs (AD9772A) with 14-bit resolution, a user programmable Virtex-IV FPGA, programmable clocks, support for external clock, host interfacing PCI, two banks of ZBT-SRAM, and JTAG interfaces.
This development kit is built with a module containing the Virtex-IV SX35 component, selected to correspond to the complexity constraints. It contains a number of arithmetic blocks (DSP blocks) which makes it possible to implement many functions occupying most of the component.
This device enables us to implement different time domain or frequency domain architectures and thus to reprogram the FPGA according to the selected (indoor or outdoor) environment and the channel model. As a development board has 2 ADC and 2 DAC, it can be connected to only 2 down-conversion RF units and 2 up-conversion RF units. Therefore, four SISO frequency domain blocks can be used to simulate a one-way 2×2 MIMO radio channel. However, in Virtex-IV, the number of slices is limited to 15360.
Thus, in our work, a SISO channel will be simulated. To test a higher order MIMO channel, a system with shorter channel models can be simulated. It decreases the size of the FFT/IFFT modules and uses less hardware resources. Elsewhere, the use of more performing FPGA as Virtex-VII [7] is mandatory.
Implementation Process
The channel frequency response profiles are stored on the hard disk of the computer and read via the PCI bus then they are stored in the FPGA dual-port RAM. Figure 17 shows the connection between the computer and the FPGA board to reload the coefficients.
For 802.11ac standard, the maximum Doppler frequency f d = 6 Hz. The refreshing frequency is considered f ref = 18.18 Hz and the refreshing period T ref is 55 ms during which we must change the four profiles. The impulse responses are presented on 32 bits (16 bits for the real part and another 16 bits for the imaginary part). We add one bit to present the addresses of the successive varying impulse responses. For one MIMO profile, (32+1)×4 = 132 words of 32 bits = 528 bytes are transmitted. Therefore the data rate is: 528/(55ms) = 9.6 KB/s For LTE standard, T ref = 3.3 ms. Thus, (512+1)×4 = 2052 words of 32 bits = 8208 bytes to transmit for a profile, which is: 8208/(3.3ms) = 2.464 MB/s.
The profiles of 33 bits are stored in a text file on the hard disk of a computer. This file is then read and sent to the memory block which will supply the simulator equipment. Reading the file can be either from USB interface, either from the PCI interface, both available on the prototyping board.
The PCI bus has been chosen to load the profiles of frequency responses because its speed can be up to 30 MB/s. In addition, the PCI bus is a bus of 32 bits. So, on two clock pulse, one complex sample of the frequency response is transmitted. The Nallatech driver provides an IP called "Host Interface" that reads the data from the PCI bus and stored in the FIFO of the IP.
The module called "Loading profiles" reads and distributes the values of samples in two blocks "RAM" or double port memory block, called "RAM A" and "RAM B". This module called "BOX RAM" is the block memory of the digital architecture in the frequency domain.
A "ping-pong" operation between RAM A/RAM B blocks is mandatory to supply two multiplexers of the first way (using the FFT1/IFFT1 modules) and the second way (using the FFT2/IFFT2 modules). The two blocks "RAM" are used to read a profile while loading another.
A periodic signal controls in one hand the demultiplexer, and on the other hand, the multiplexer. Thus, when the multiplexer selects a block "RAM" to read the 32 complex values of a profile frequency response, the demultiplexer selects another block "RAM" to write the 32 values of the following profile.
Therefore, while a profile is used, the following profile is loaded and will be used after the update time T ref .
The Virtex-IV SX35 utilization summary for the architecture with 512 FFT/IFFT modules is given in Table 4 . The Virtex-IV SX35 utilization summary for the architecture with 256 FFT/IFFT modules is given in Table 5 . Figure 18 . The theoretic and Xilinx output signals, the relative error and the SNR for the frequency architecture using 3GPP model EVA In order to determine the accuracy of the digital block, a comparison is made between the theoretic and the Xilinx output signals. With Gaussian input signal, the theoretic output signal can be obtained. Therefore, an input Gaussian signal x(t) is considered as in (13) and presented in Figure 10 .
The impulse response corresponds to 3GPP channel model EVA has 9 paths. The theoretic output signal is the sum of the 9 Gaussian signals corresponds to the paths of the impulse response, and it is expressed in (1) where Tap Max = 9.
The relative error for each output sample is: Figure 18 presents the Xilinx output signal, the relative error and the SNR with LTE signals (f s = 50 MHz) using 3GPP channel model EVA.
For TGn channel model E, each impulse response has 18 paths. The theoretic output signal is expressed in (1) where Tap Max = 18. Figure 19 presents the Xilinx output signal, the relative error and the SNR with 802.11ac signals (f s = 180 MHz) using the TGn channel model E. Table 6 shows the global values of the relative error and SNR for the considered architectures of the 3GPP channel model EVA and the TGn channel model E. The results are given without truncation, with sliding window truncation and with brutal truncation. The goal is to discuss the output signal of the new frequency architecture and the advantage of the sliding window truncation. Three points are considered: the precision, the FPGA occupation and the latency.
Precision
If we compare the results in Figure 18 and Figure 19 , we observe that with brutal truncation, if the output voltage is greater than 1.75 V, then the relative error is less than 1 %. However, with sliding window truncation, if the output voltage is greater than 0.2 V, then the relative error is less than 1 %.
We conclude that the sliding window truncation is more accurate to use because it reduces the error and make possible the use of output signals as low as 0.2 V. The global relative error presented in Table 6 does not exceed 0.1 % (with sliding window truncation), which is sufficient for the test. The SNR increases and reaches 60 dB which is 11 dB higher than with a brutal truncation. The SNR with sliding window truncation tends to the SNR without truncation and to the SNR presented in [18] using a time domain architecture. Thus, the sliding window truncation presents better precision.
However, we must also take into account the complexity introduced by this the sliding window truncation. In fact, there should be an analog amplifier whose gain varies to correct the value of the output of the DAC. Also, it will be able to transmit the number of positions with which the window was dragged and that for each sample. Therefore, if the error is very small and the SNR is large enough, it is better to use the brutal truncation. It decreases the complexity and the occupation rate on the FPGA.
Also, for impulse responses with large attenuations, the error increase significantly. Thus, a solution based on normalizing the impulse responses and the input signal will provide high precision.
FPGA Occupation
According to Table 4 and Table 5 , the new frequency domain architecture presents a slice occupation of 30 % on the FPGA Virtex-IV using 512 FFT/IFFT modules and a slice occupation of 26 % using 256 FFT/IFFT modules.
The new frequency architecture presents a high slice occupation on the FPGA if it is compared to the time domain architecture presented in [18] . It requires more performing FPGA as Virtex-VII to implement high order MIMO channels.
However, in order to simulate an impulse response with more than 192 taps, the new frequency architecture can be used. With a FPGA Vitrex-IV, the size N of the FFT module can be chosen up to 65536 in contrast with a FIR filter which is limited to 192 multipliers (192 taps for the impulse response).
Latency
The new frequency domain architecture has a latency of 9 1s using 512 FFT/IFFT modules and of 7.2 1s using 256 FFT/IFFT modules.
Conclusions
In this work, a new frequency domain architecture was proposed and analyzed. This new architecture accepts long input signals in contrast with the previous simple frequency domain architecture proposed in [18] and presented in Figure  9 . The new architecture was tested with Gaussian input signal and with TGn channel model E and 3GPP TR 36.803 channel model EVA. The accuracy and the latency of this new architecture have been determined.
Simulations made using a Virtex-VII [7] XC7V2000T platform will allow us to simulate high order MIMO chan- nels. Measurement campaigns will also be carried with the MIMO channel sounder realized by IETR, for various types of environments. A Graphical User Interface will also be designed to allow the user to select the propagation environment, to select the channel model and to reconfigure the channel parameters. The final objective of these measurements is to obtain realistic and reliable impulse responses of the MIMO channel in order to supply the digital block of the hardware simulator.
