Abstract-A hardware simulator can simulate a desired radio channel, making it possible to test "on table" various mobile radio systems. This paper presents the new architecture of the digital block of an Outdoor-to-Indoor MIMO hardware simulator. Measurements campaign carried out at 3.5 GHz has been conducted to obtain the impulse responses using a time channel sounder. The measurements are processed with a high resolution algorithm extracting the dominant paths. The new architecture is implemented on a Xilinx Virtex-IV FPGA. The accuracy, the occupation on the FPGA and the latency of this architecture are analyzed.
INTRODUCTION
Multiple-Input Multiple-Output (MIMO) techniques improve the capacity and the performance of wireless communication systems. Several studies published recently present systems that reach a MIMO order of 8×8 and higher [1] . This is made possible by advances at all levels of the simulator platforms [2] . With continuing increase of the Field Programmable Gate Array (FPGA) capacity, entire baseband systems can be mapped onto faster FPGAs for more efficient prototyping and testing [3] . Some MIMO hardware simulators are proposed by industrial companies like Spirent [4] , Azimuth (ACE), Elektrobit (Propsim F8) [5] , but they are quite expensive and they do not cover all types of environment.
The channel models can be obtained from standard models, as the TGn 802.11n [6] and the LTE models [7] , or from measurements conducted with the MIMO channel sounder designed and realized at IETR [8] . In the MIMO context, little experimental results have been obtained regarding timevariations, partly due to several limitations of the channel sounding equipment [9] . However, theoretical models of timevarying channels can be obtained using Rayleigh fading [10] . At IETR, several architectures of the digital block of a hardware simulator have been studied [11, 12] . Typically, radio propagation channels are simulated using finite impulse response (FIR) filters, as in [11, 12, 14] . The Fast Fourier Transform (FFT) modules can also be used to obtain an algebraic product, as in [11, 13] . In [15] , a method fitting the cross-correlation matrix to the estimated matrix of a real-world channel was presented. This solution shows that the error can be important.
The frequency architecture considered in [11, 13] operates correctly for signals not exceeding the FFT size. Thus, new frequency architecture avoiding this limitation has been presented and tested in [16] . However, [17] and [18] show that the time domain architecture is better in terms of occupation on FPGA, output error and latency. Therefore, in this paper, only the time domain architecture is considered. Recently, the channel sounder was used during a measurement campaign to characterize the outdoor to indoor EM wave propagation penetration within buildings. The measurements were made at 3.5 GHz for WiMAX Networks. During the measurements, the channel was time invariant, without people moving in the environment. Therefore, in order to simulate a time-varying channel, a Rayleigh fading method was used.
The main contributions of the paper are:
• Tests have been made for indoor [17] and outdoor [18] environments using standard channel models. However, in this paper, tests are made with real outdoor to indoor measurements.
• The time domain architecture presented in [11, 12, 14] has an occupation of 11 to 13 % of slices on the FPGA for one SISO channel. In this paper, we present a time domain architecture with an occupation of 5 % for one SISO channel and up to 80 % for MIMO 4×4. • Studies are made relating the number of bits used for the samples of the impulse response to the error at the output in order to identify the best trade-off between the occupation on the FPGA and the accuracy. The rest of this paper is organized as follows. Section II presents the channel characteristics. Section III describes the new architecture and its hardware implementation. In this Section, the accuracy of the architecture is also analyzed. Section IV presents some improvement solutions. Lastly, Section V gives some concluding remarks and prospects.
II. CHANNEL CHARACTERISTICS
Few MIMO outdoor to indoor measurement campaigns are reported in the literature [19] , but not at 3.5 GHz. For our This work is part of PALMYRE II Project supported by Région Bretagne. measurements, the channel sounding bandwidth is 100 MHz and the sample frequency f s is 200 MHz (corresponding to a sampling period t s of 5 ns). Two Uniform Circular Array (UCA) were developed at 3.5 GHz to characterize 360° azimuthal double directional channel at both link sides. Each of the transmitter (Tx) and the receiver (Rx) contains 2 active elements. The transmitter was placed on the rooftop of a building and the receiver was located in multiple positions in different rooms of another building. The Tx-Rx distance is about 100 m. The channel sounder provides the complex envelope h ce (t) of the channel impulse response. The used real impulse responses in the band of [∆, ∆ ] are:
. cos 2 . sin 2 1
where h p (t) and h q (t) are the real and imaginary parts of h ce (t) and:
With a FPGA Virtex-IV, the number of multipliers used by a FIR filter is limited to 192. Thus, high resolution methods are proposed [20, 21] in order to obtain significant impulse responses with a limited number of taps and hence a limited number of multipliers. These methods are heavy computation load. Thus, a new method which detects the taps that are points of change for sign of the curve slope is used. Table I shows the number of taps and the time window W t of the MIMO impulse responses. W t is equal to the last sample delay multiplied by t s . To simulate a time-varying channel we consider a 2×2 MIMO Rayleigh fading channel. At a center frequency of 3.5 GHz, the Doppler spread is f d = 13 Hz for a speed of v = 4 km/h. Thus, the refresh frequency f ref between two successive varying profiles is chosen to be f ref = 28 Hz > 2. f d . The MIMO channel matrix H can be characterized by two parameters: the power P c of constant channel components which corresponds to the Line-Of-Sight (LOS), and the power P s of the channel scattering components which corresponds to the Non-Line-OfSight (NLOS). The ratio P c /P s is the Ricean K-factor. Assuming all coefficients of H are Rice distributed, then H is expressed by:
where H F and H V are the constant and the scattered matrices respectively. The total received power is P = P c + P s . Thus:
where K = 0 to obtain a Rayleigh fading channel. The normalized P is given in Fig. 1 for each tap. For 2 transmit and 2 receive antennas:
To correlate the X ij elements, a product-based model is used. This model assumes that the correlation coefficients are independently derived at each end of the link:
H iid is a matrix of independent zero means, unit variance, complex Gaussian random variables. R t and R r are the transmit and receive correlation matrices:
The complex correlation coefficients and are expressed as:
. 8
where D = 2πd/λ, d = 0.5λ is the distance between two antennas, λ is the wavelength and R xx and R xy are the real and imaginary parts of the cross-correlation function of the considered correlated angles:
cos . sin . . 9
sin . sin . . 10
The PAS (Power Angular Spectrum) closely match the Laplacian distribution:
where σ is the standard deviation of the PAS. 
III. DIGITAL BLOCK DESIGN OF THE HARDWARE SIMULATOR

A. Implementation of the architecture
The time domain architecture, using a specific number of multipliers that corresponds to the number of taps of the impulse response, is better in term of occupation on the FPGA. Moreover, in [17] and [18] it was shown that the time domain architecture has two other advantages: a higher SNR and a much lower latency. Thus, in this work, the time domain architecture is used for the tests.
4 SISO channels are implemented. Fig. 2 presents a FIR 180 filter with 49 multipliers (49 taps for h 11 ) for one SISO channel. We have developed our own FIR filter instead of using Xilinx MAC filter to make it possible to reload the filter coefficients. The general formula for a FIR 180 with 49 multipliers is:
The index q suggests the use of quantified samples and h q (i k ) is the attenuation of the k th path with the delay i k T s .
Due to the use of a 14-bit digital-to-analog converter (DAC), the final output must be truncated. The best solution is the sliding window truncation presented in Fig. 3 which uses the 14 most significant bits. This prototyping board is described in [16] . The simulations are made with ISE [2] and ModelSim software [22] . Fig . 5 shows the connection between the computer and the FPGA board to reload the coefficients. The PCI bus is chosen to load the profiles of the impulse responses. It has a speed of 30 MB/s. While a MIMO profile is used, the following MIMO profile is loaded and will be used after the refresh period.
C. Accuracy
In order to determine the accuracy of the digital block, a comparison is made between the theoretical output signal and the Xilinx output signal. An input Gaussian signal x(t) is considered and long enough to be used in streaming mode:
where W t = 900 ns (the largest W t in Table I) . . 14 .
. 15
The relative error is computed for each output sample by:
.
% 16
where where E =Y Xilinx -Y theory is the error vector. Table III shows the global values of the relative error and the global SNR. 
D. Global Error Variation with Time-Varying Profiles
The time used to simulate 500 profiles is 500/ f ref = 17.85 s. Fig. 7 shows the time variation of the average global SNR (AV SNR) of y 1 and y 2 for the 500 successive profiles. For v = 4 km/h, the variation of SNR is 0.97 dB. Therefore, after several variations of v between 0 and 9 km/h, we notice that the rate of SNR variation and the global error are related proportionally to speed environment.
IV. IMPROVEMENT SOLUTIONS
The goal is to improve: the precision, the FPGA occupation and the latency. Using a Normalization Factor (NF) at the input signal decreases significantly the error. Also, decreasing the number of bits of the impulse responses will decrease the occupation of slices in the FPGA and the latency.
A. Normalization Factor (NF)
The best solution is to multiply every sample of the input signal in the digital block by NF = 2 where k 0 is the biggest integer verifying x max > 2 . x. The input signal is limited to [-V m ,V m ] with V m = 1 V. Thus, x max = 0.5 V to leave a sufficient margin for the input signal. However, this method requires a super reconfigurable analog amplifier placed after the DAC and works at a sampling period smaller than 5 ns, which is hard to realize. Therefore, another solution is proposed.
Two thresholds are considered: SH = x max = 0.5 V and SL = 0.125 V (higher than 0.125 V the SNR is high as presented in Fig. 6 ). If | | > SH, the signal is divided by NF = 2. In this case, a signal S = "01" at the output of the digital block is related to a reconfigurable analog amplifier to multiply the output signal by 2. If | | < SL, the signal is multiplied by NF = 2 and S = "10". Table IV presents the new values of the global relative error and the global SNR. We notice that after adding the NF, the relative error decrease significantly, and in this case the use of the sliding window truncation is not required. 
B. The Error Versus the Number of Bits of h
A study of the average global relative error and the average global SNR versus the number of bits of h is given in Fig. 8 . We can conclude that for a number of bits for h greater than 6 bits, the AV SNR exceeds 40 dB. For a number of bits for h equal to 6 bits, the occupation on the FPGA is reduced from 20 % to 18 %. However, AG RE using a brutal truncation exceeds 100 %, while with a sliding truncation it is 0.75 % which is acceptable. Thus, the sliding truncation is mandatory to use in this case.
The amount of data transmitted for a profile is also reduced. In fact, the PCI bus is a bus of 32 bits. Thus, on each clock pulse five samples of the response are transmitted (instead of two). The number of bits at the output before the truncation is related to the number of bits of h:
19
where n y is the number of bits at the output, n h is the number of bits of h, n x = 14 is the number of bits of the input signal and n t can be expressed by:
20
where n tap is the number of taps.
V. CONCLUSION
In this paper, real impulse response of a 2×2 MIMO channel has been obtained by outdoor to indoor measurement campaign. The impulse response has been used by the hardware simulator.
It has been shown that it has a large number of taps if we compare it to standard channel models. Thus, the architecture proposed requires a large number of multipliers. However, to reduce the number of multipliers, an algorithm extracting the dominant paths has been proposed. Also, in order to reduce the error and the occupation on the FPGA, two improvement solutions have been presented.
Simulations made using a Virtex-VII [7] XC7V2000T platform will allow us to simulate up to 16×16 MIMO channels. A graphical user interface will also be designed to allow the user to select the channel model and to reconfigure the channel parameters. The final objective of these measurements is to obtain realistic and reliable impulse responses of the MIMO channel in order to supply the digital block of the hardware simulator. 
