Tel: +33 (0)2 23 23 86 04 +33 (0)6 17 57 13 87 Bachir.habib@insa-rennes.fr www.ietr.fr A wireless communication system can be tested either in actual conditions or with a hardware simulator reproducing actual conditions. With a hardware simulator it is possible to freely simulate a desired radio channel and making it possible to test "on table" mobile radio equipments. This paper presents new architectures for the digital block of a hardware simulator of MIMO propagation channels. This simulator can be used for LTE and WLAN IEEE 802.11ac applications, in indoor and outdoor environments. However, in this paper, specific architectures of the digital block of the simulator for shipboard environment are presented. A hardware simulator must reproduce the behavior of the radio propagation channel. Thus, a measurements campaign has been conducted to obtain the impulse responses of the shipboard channel using a channel sounder designed and realized at IETR. After the presentation of the channel sounder, the channel impulse responses are described.
Typically, wireless channels are commonly simulated using finite impulse response (FIR) filters, as in [4, 20, 21] . The FIR filter form a convolution between a channel impulse response and a fed signal in such a manner that the signal delayed by different delays is weighted by the channel coefficients, i.e. tap coefficients, and the weighted signal components are summed up. The channel coefficients are periodically modified to reflect the behavior of an actual channel. Nowadays, different approaches have been widely used in filtering, such as distributed arithmetic (DA) and canonical signed digits (CSDs) [22] .
However, using a FIR filter in a channel simulator has a limitation. With a FPGA Virtex-IV, it is impossible to implement a FIR filter operating at a sample frequency of 200 MHz with more than 192 multipliers (impulse response with more than 192 taps).
To simulate an impulse response with more than 192 taps, the Fast Fourier Transform (FFT) module can be used. With a FPGA Vitrex-IV, the size N of the FFT module can reach 65536 samples. Thus, several frequency domain architectures have been considered and tested [4, 19] . Moreover, a proposed VLSI implementation shows that for high order MIMO arrays, frequency domain architectures are highly modular and scalable by design.
In this paper, we present a study of two alternative approaches. The first approach performs in frequency domain, while the second approach operates in time domain and is based on FIR filter.
The main contributions of the paper are:
• The previous considered frequency domain architectures operate correctly only for signals with a number of samples not exceeding the size of the used FFT block. Thus, in this study, a new frequency architecture [23] avoiding this limitation and a new time domain architecture are both tested for a shipboard environment.
• The time domain architecture presented in [18, 20] determines an occupation of 11 % to 13 % of slices on the FPGA for one SISO channel. However, in this paper, we present a time domain architecture with an occupation of 5 % for one SISO channel and up to 80 % for a MIMO 4×4 systems.
• In general, the channel impulse responses can be presented in baseband with its complex envelope, or as a real signal with limited band between f c -B/2 and f c + B/2, where f c is the carrier frequency and B is the bandwidth. In this paper, to eliminate the complex multiplication and the f c , the hardware simulation operates between 1 and B + 1, where 1 depends on the band-pass filters (RF and IF). The value ∆ is introduced to prevent the overlap of the positive and the negative sides of the frequency response. In addition, the use of a real impulse response allows the reduction by 50% of the size of the FIR filters. Thus, within the same FPGA, more SISO channels (hence, larger MIMO channels) can be simulated.
• Tests have been made for indoor [24] , outdoor [25] and vehicular [26] environments using standard channel models. However, in this paper, tests are made for a shipboard environment with real channel measurements realized with the channel sounder for 2×2 MIMO channels. Moreover, time-varying channels are obtained using Rayleigh fading.
• In this study, several improvement solutions are presented; studies are made relating the number of bits used for the samples of the channel impulse response to the relative error at the outputs in order to identify the best trade-off between the occupation on the FPGA and the accuracy.
The rest of this paper is organized as follows. Section 2 presents the channel models used to test the proposed architectures. Section 3 describes the new architectures of the simulator in frequency and time domain respectively. The prototyping platform used to implement these architectures and the occupation on the FPGA for the implementation of each architecture are also described. Section 4 presents the accuracy of the output signals when measured impulse responses are used in the digital block of the hardware simulator. Section 5 presents some improvement solutions to reduce the error, the latency and the occupation on the FPGA. The accuracy of the new architecture is also analyzed. Lastly, Section 6 gives concluding remarks and prospects.
Channel Models
A MIMO propagation channel is composed of several time-variant correlated SISO channels. For this MIMO channel, the received signal y j (t,1) can be calculated using a convolution in time domain: 
The associated spectrum is calculated by the Fourier transform (using FFT modules): According to the considered propagation environments, Table 1 summarizes some useful parameters for LTE standard, WLAN 802.11ac standard and channel sounder signals for a specific environment on the Armorique which is presented in Figure 1 . 
where W tF is the closest value for W t eff which is imposed by the size N F = 2 n of the FFT modules.
Two channel models are considered to cover many types of environments: the TGn channel models (indoor environments), the LTE channel models (outdoor environments).
Moreover, using the channel sounder realized at IETR, measured impulse responses are obtained for specific environments.
In this study, measured complex impulse responses of the MIMO propagation channel obtained in a shipboard indoor metallic environment were used to supply the digital block of the channel simulator.
TGn Channel Model
TGn channel models [5] represent a set of 6 profiles, labeled A to F, which cover all the for all TGn channel models are presented in [5] by taking the LOS (Line-Of-Sight) path as reference.
LTE Channel Model
LTE channel models are used for mobile wireless applications. A set of 3 channel models is used to simulate the multipath fading propagation conditions. A detailed description is presented in [6] .
Measurement Data
Impulse responses of a MIMO channel can be obtained from measurements by using a time domain channel sounder designed and realized at the IETR [7] . Several measurement campaigns were carried out for indoor and outdoor environments. Recently, a measurements campaign was carried out in order to obtain measured MIMO impulse responses for a shipboard environment. These MIMO impulse responses will be used by the hardware simulator.
The channel sounder uses a periodic pseudo random binary sequence. It has 11.9 ns temporal resolution for 100 MHz sounding bandwidth. 
Therefore:
where C 4 347 and C 5 347Fare the real and imaginary parts of the complex response.
The channel sounder provides the complex envelope h ce (t) of the baseband signal with a B=100 MHz bandwidth and with a center frequency:
The real impulse responses are obtained by:
Therefore, we can work with a real impulse response that occupies the band [6,F6 D 8] ( Figure 4 ). The first results (without normalization) are obtained for a 2×2 MIMO channel. Figure 5 presents the impulse responses given by the channel sounder on 2048T s . Thus, a high resolution method is proposed [27, 28] in order to estimate the propagation parameters of this channel and to obtain significant impulse responses (hDis) with a limited number of taps and hence a limited number of multipliers for the FIR filter. However, these methods are heavy computation load. Therefore, a new method is proposed which consist of detecting the taps considered as points of change for the sign of the slope of the curve. Figure   6 presents the impulse responses used by the simulator after discrimination, normalization and limitation between 0 and -20 dB of the real impulse responses. 
Time-Varying Channels
During the measurements, the channel was time invariant, without people moving in the environment. Therefore, in order to simulate a time-varying channel, a Rayleigh fading method was used.
We define a 2×2 MIMO Rayleigh fading channel [29, 30] using the static impulse responses presented in Figure 6 . The MIMO channel matrix H can be characterized by two parameters:
1) The power P c of constant channel components which corresponds to the Line-OfSight (LOS).
2) The power P s of the channel scattering components which corresponds to the Non-Line-Of-Sight (NLOS).
The ratio P c /P s is called Ricean K-factor and it is often represented in decibels.
Assuming that all channel coefficients of the channel matrix H are Rice distributed, the MIMO channel matrix H for each tap can be expressed by:
where H F and H V are the constant and the scattered channel matrices respectively.
The total received power P = P c + P s .Therefore:
where K is the Ricean factor and P is the power of each tap given in Figure 6 .
Moreover, if we combine (12) and (13) in (11) we obtain:
K is equal to zero to obtain a Rayleigh fading channel because the measurements were taken in NLOS, so H can be written as:
For 2 transmit and 2 receive antennas:
where X ij (i-th receiving and j-th transmitting antenna) are correlated zero-mean, unit variance, complex Gaussian random variables as coefficients of the variable NLOS (Rayleigh) matrix H V .
To correlate the X ij elements of the matrix X, a product-based model is used. This model assumes that the correlation coefficients are independently derived at each end of the link. It can be expressed by:
H iid is a matrix of independent zero means, unit variance, complex Gaussian random variables.
The method for generating the Rayleigh random is: 2) We take the complex conjugate (x 1c and x 2c ) of these sequences to generate the complex Gaussian random variables for the negative part from -f d to 0 .
3) Therefore we obtain x 1 = x 1p + x 1c and x 2 = x 2p + x 2c . r i is an element of the H iid matrix and it is the desired Rayleigh distributed with the required temporal correlation.
R r and R t are the receive and transmit correlation matrices, respectively.
We consider Q A , Q E the correlations between channels at two receive antennas, but originating from the same transmit antenna (SIMO). R A and R E are the correlations between channels at two transmit antennas, but originating from the same receive antenna (MISO). S A
and S E are the cross-correlation between antennas of the same side of the link.
The use of this model has two conditions:
1) Q A = Q E 8 Q and R A = R E =FR, the correlations between channels at two receive (resp. transmit) antennas are independent from the considered Rx (resp. Tx) antenna, as shown in Figure 7 . 2) S A = Q T R and S E F= Q B T R.
R t and R r can be written by:
For the uniform linear array, the complex correlation coefficients Q and R are expressed by W:
where D = 22d/λ, d = 0.5λ is the distance between two successive antennas, λ is the wavelength and R xx and R xy are the cross-correlation functions between the real parts (equal to the cross-correlation function between the imaginary parts) and between the real part and imaginary part respectively of the considered correlated angles:
The calculation of the complex correlation coefficients for each tap is based on the PAS (Power Angular Spectrum) with AS (Angular Spread) being the second moment of PAS. The PAS is found to closely match the Laplacian distribution [31, 32] :
where 3 is the standard deviation of the PAS (which corresponds to the numerical value of AS).
Digital Block Design of the Hardware Simulator
In this section, improved frequency and time domains architectures are presented and implemented on a FPGA Virtex-IV.
New Frequency Domain Architecture
The new frequency architecture presented in Figure 8 has been verified with a Gaussian impulse response [23] . It operates correctly for signals with a number of samples exceeding N F , where N F = 2 n is the size of the FFT module.
For the shipboard channel models, the largest excess delay is 180T s (Figure 6 ). Thus, N F = 256. However, it is mandatory to extend each partial input of N F samples with a "tail" of NF null samples, as in [23] , to avoid a wrong result. Therefore, the FFT/IFFT modules operate with 512 samples.
H is the FFT of h (given in Figure 6 ). It can be calculated by: The truncation block is located at the output of the digital adder. It is necessary to reduce the number of bits after the sum of the signals computed by the IFFT blocks to 14 bits. Thus, these samples can be accepted by the digital-to-analog converter (DAC), while maintaining the highest accuracy.
The immediate solution is to keep the first 14 bits. It is a "brutal" truncation.
However, for low voltages of the output of the digital adder, the brutal truncation generates zeros to the input of the DAC. Therefore, a better solution is the sliding window truncation [23] presented in Figure 9 which uses the 14 most effective significant bits. for h 11 , the largest excess delay corresponds to the 178 th sample. Thus, N T = 179 samples.
Because h 11 has 47 paths (non null taps), the FIR filter must use 47 multipliers. Figure 10 presents a FIR 179 with 47 multipliers. We have developed our own FIR filter [25] instead of using Xilinx MAC FIR filter to make it possible to reload the FIR filter coefficients. 
In this relation, the index q suggests the use of quantified samples and h q (i k ) is the attenuation of the k th path with the delay i k T s . Figure 11 shows the XtremeDSP Virtex-IV board from Xilinx [3] used for the implementation of each architecture. This prototyping board is described in [23] .
Implementation of Each Architecture

Description
The simulations and synthesis are made with Xilinx ISE [3] and ModelSim software [33] .
The XtremeDSP features dual-channel high performance ADCs (AD6645) and DACs (AD9772A) with 14-bit resolution, a user programmable Virtex-IV FPGA, programmable clocks, support for external clock, host interfacing PCI, two banks of ZBT-SRAM, and JTAG interfaces. 
Implementation
A) Implementation of the Frequency Domain Architecture
Four SISO channels are needed to be implemented for a one-way 2×2 MIMO radio channel.
The V4-SX35 development board utilization summary shows that in frequency domain, four SISO channels using 512 FFT/IFFT modules occupy more than 15,360 slides on the FPGA.
Thus, more than 100 % of the slices are needed. Hence, it is impossible to simulate 4 SISO channels in frequency domain. The time domain architecture is better in term of occupation on the FPGA. Moreover, in [23] and [24] we have shown that the time domain architecture has two other advantages: a higher SNR and a much lower latency.
B) Implementation of Time Domain Architecture
Thus, in this work, the time domain architecture is used for the tests. Also, solutions are proposed to modify the number of bits used in this architecture to decrease the latency and the occupation on FPGA. 
Implementation of the Impulse Responses in the Digital
Block
Description
The channel impulse responses are stored on the hard disk of the computer and read via the PCI bus and then stored in the FPGA dual-port RAM. Figure 12 shows the connection between the computer and the FPGA board to reload the coefficients. The Nallatech driver presented in Figure 12 provides an IP called "Host Interface" that reads the data from the PCI bus and stores these data into the FIFO memory of the IP. The module called "Loading profiles" reads and distributes the impulse responses in RAM blocks.
This module called "BOX RAM" is the block "Memory" of the time domain architecture.
While a MIMO profile is used, the following MIMO profile is loaded and will be used after the refresh period.
Accuracy of the Architecture
In order to determine the accuracy of the digital block, a comparison is made between the theoretical and the Xilinx output signal. In order to test the time domain architecture, a specific input Gaussian signal x(t) is considered. This input signal presented in Figure 13 is long enough to be used in streaming mode (the use of a Gaussian signal is preferred because it has a limited duration in both time and frequency domains): 
The relative error is given for each output sample by: The global values of the relative error and of the SNR computed for the output signal before and after the final truncations are necessary to evaluate the accuracy of the architecture. The global relative error is computed by:
The global SNR is computed by:
where E = Y Xilinx -Y theory is the error vector.
For a given vector X = [ x 1 , x 2 , …, x L ], its Euclidean norm || x || is: Figure 14 presents the Xilinx output signal, the SNR and the relative error. brut a l t runc a t ion  brut a l t runc a t ion  brut a l t runc a t ion  brut a l t runc a t brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion y y y y 2 brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion y y y y 2 2 2 2 brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion y y y y 1 Table 3 shows the global values of the relative error and the SNR between the Xilinx output signal and the theoretical output signal using MIMO 2×2 time domain architecture.
brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion brut a l t runc a t ion y y y y 1 1 1 1 s liding t runc a t ion s liding t runc a t ion s liding t runc a t ion s liding t runc a t ion y y y y s liding t runc a t ion s liding t runc a t ion s liding t runc a t ion s liding t runc
sliding t runc a t ion sliding t runc a t ion sliding t runc a t ion sliding t runc a t ion y y y y 2 2 2 2 sliding t runc a t ion sliding t runc a t ion sliding t runc a t ion sliding t runc a t ion
The results are given without truncation, with sliding window and with brutal truncations. 
Global Error Variation with Time-Varying Profiles
To test the simulator with time-varying 2×2 MIMO channels, 500 successive profiles are considered. For an environmental speed of 0.5 km/h, the refresh frequency f ref = 2.5 Hz.
Therefore, the time to simulate the 500 profiles is 200 s. Figure 15 gives the time variation of the Average Global Relative Error (AG RE) and the Average Global SNR (AG SNR) of y 1 and y 2 for the 500 successive profiles. For an environmental speed of 4 km/h, f ref = 20 Hz. Therefore, the time to simulate the 500 profiles is 25 s. Figure 16 shows the time variation of the AG RE and the AG SNR of y 1 and y 2 for the 500 successive profiles.
For v = 0.5 km/h, the variation of SNR is 2.03 dB. For v = 4 km/h, the variation of SNR is 2.37 dB. Therefore, after several variation of the environmental speed between 0 and 9 km/h (the maximum for an indoor environment), we conclude that the rate of variation of the SNR and the global error is related proportionally to the speed environment. 
Improvement Solutions
The goal is to improve: the precision, the FPGA occupation and the latency.
Using a Normalization Factor (NF) at the input signal decreases significantly the error.
Also, decreasing the number of bits of the implemented impulse responses presented by h in Figure 10 will decrease the occupation of slices in the FPGA and the latency.
Normalization Factor (NF)
After analyzing the relative error in Figure 14 , we conclude that it is high only for small values of the output signal. Therefore, to decrease the error, a solution is proposed which consists on considering two thresholds: SH and SL. SH is equal to x max . In fact, the input and sufficient margin for the input signal. SL is considered equal to 0.125 V (higher than 0.125 V the SNR is high as presented in Fig. 13 and Fig. 14 We notice that after adding the NF, the relative error decrease significantly. Table 4 presents the new values of the global relative error and the global SNR. brut a l t runca t ion brut a l t runca t ion brut a l t runca t ion brut a l t runca t ion y y y y 2 2 2 2 brut a l t runca t ion brut a l t runca t ion brut a l t runca t ion brut a l t runca t ion y y y y 1 1 1 1 sliding t runca t ion sliding t runca t ion sliding t runca t ion sliding t runca t ion y y y y 2 2 2 2 sliding t runca t ion sliding t runca t ion sliding t runca t ion sliding t runca t ion 
The Error Versus the Number of Bits of h
To decrease the occupation of slices on the FPGA of the time domain architecture, we decrease the number of bits of h. A study of the average global relative error in function of the number of bits of h is given in Figure 19 . We can conclude that for a number of bits for h bigger than 5 bits, the AG RE is acceptable and the AV SNR is more than 40 dB. For a number of bits for h equal 6 bits, the occupation on the FPGA is reduced from 19 % to 17 %. However, the average global error using a brutal truncation exceeds 100 %, while with a sliding truncation it is 0.77 % which is acceptable. Thus, the sliding truncation is mandatory to use in this case. The STF is quantified on 28 bits and feeds a reconfigurable analog amplifier placed after the DAC to obtain the correct output signal. The amount of data transmitted for a profile is also reduced. In fact, the Figure 20 presents the output signal, the relative error and the SNR using 6 bits for h with sliding window truncation. The number of bits at the output before the truncation is related to the number of bits of h: 8 0 where n y is the number of bits at the output, n h is the number of bits of h, n x = 14 is the number of bits of the input signal and n t can be expressed by:
where n tap is the number of taps. Table 5 summarizes the global relative error and the global SNR using 6 bits for h. By reducing the number of bits of h from 14 to 6, we reduce the occupation on the FPGA.
Conclusion
In this paper, the measured impulse responses of a 2×2 MIMO propagation channel obtained by measurement on a shipboard environment have been presented. These impulse responses have been used in the digital block of a hardware simulator.
After a comparative study, the time domain architecture used for the design of the digital block represents the best solution, especially for MIMO systems. In fact, it occupies just 19 % of slices on the FPGA Virtex-IV. Also, it generates a small latency of 115 ns.
Moreover, a study of the precision of the architecture for time-varying 2×2 MIMO channels has been presented. It has been shown that the global relative error does not exceed 0.9 %. Therefore, time-varying impulse responses can be used by the architecture.
Lastly, in order to reduce the error of the output signals and the occupation on the FPGA, two improvement solutions have been presented. The first uses a normalizing factor. It reduces the global output relative error from 0.7707 % to 0.0119 % using brutal truncation.
The second varies the number of bits of impulse response. It reduces the occupation of slices on the FPGA from 19 % to 17 %.
Simulations made using a Virtex-VII [3] XC7V2000T platform will allow us to simulate up to 300 SISO channels. Measurement campaigns will also be carried out with the MIMO channel sounder realized by IETR, for various types of environments. A graphical user interface will also be designed to allow the user to select the channel model and to select the channel parameters. The final objective of these measurements is to obtain realistic and reliable impulse responses of the MIMO channel in order to supply the digital block of the hardware simulator.
