The growing demand for high data rates for wireless communication systems leads to the development of new technologies to increase the channel capacity thus increasing the data rate. MIMO (multiple-input multiple-output) systems are best qualified for these applications. In this paper, we present a MIMO test environment for high data rate transmissions in frequency-selective environments. An overview of the testbed is given, including the analyzed algorithms, the digital signal processing with a new highly parallel processor to perform the algorithms in real time, as well as the analog front-ends. A brief overview of the influence of polarization on the channel capacity is given as well.
INTRODUCTION AND MOTIVATION
The increasing demand for high data rates for wireless communication requires efficient use of the available bandwidth. Multiple-input multiple-output (MIMO) systems provide exploitation of spatial diversity and/or spatial multiplexing and enables SDMA (space division multiple access). These systems comprise on the one hand point-to-point MIMO systems where both at the transmitter and the receiver site antenna arrays are used; on the other hand, also multiuser MIMO systems with a multiantenna base station (or access point) and several users equipped with one or several antennas can be considered. However, for evaluating such systems, it is essential to also analyze their efficiency with respect to implementation issues. That is, besides spectral efficiency, one needs to consider complexity of digital signal processing as well as requirements concerning analog components.
At the Institute of High Frequency Technology at RWTH Aachen University, a testbed for a real-time MIMO system with a broadband air interface has been developed. It has been designed for evaluation of concepts and for proposing new realizations. The testbed consists of one base station and several user terminals. The base station is equipped with an antenna array and the user terminals are equipped with single antennas. However, several users can be grouped together to one multiantenna user station. Our research is focused on future wideband WLAN systems. This paper is organized as follows. First, a general overview of the real-time MIMO testbed is given. Second, the influence of polarization on channel capacity is described. Third, some of the algorithms and the theoretical background for signal processing are described. Fourth, the realtime implementation of those algorithms is presented and discussed. Fifth, analog components are described and their requirements arising from system configuration and used algorithms are analyzed. Finally, a summary is given and an outlook on future projects is presented.
OVERVIEW OF THE SABA REAL-TIME MIMO TESTBED
The SABA (smart antennas for broadband access) real-time MIMO testbed is designed for one base station with an array of four antennas and up to four user terminals. Depending on the configuration, four terminals can use one antenna each or several terminals can be grouped to one multiantenna terminal. The system concept of the analog hardware includes receive and transmit branches with two calibration schemes, off-line and on-line calibration, switchable attenuators to increase the power dynamic range, and antennas with switchable polarization. The digital signal processing is based on a modular FPGA system and enables real-time processing of the algorithms used.
The carrier frequency is 10.525 GHz with a bandwidth of 30 MHz. Up-and downlink use the same spectrum and are separated in the time domain (TDD, time division duplex). A hybrid time division (TD), code division (CD), and space division (SD) multiple access is implemented to enable an efficient medium access in time, code, and space domain. Equalization of frequency-selective MIMO channels is realized by joint detection in the uplink. Furthermore, we use joint-predistortion at the base station, which means that the data streams for the user terminals are pre-equalized at the base station. The required channel information for the equalization is obtained by channel estimation from uplink transmission. Regarding joint-predistortion, no additional signal processing for equalization at the user terminals is needed. But on the other side, the reciprocity of channel and transceiver and a constant channel between up-and downlink are required. The underlying system model of the digital hardware allows single-and multicarrier (OFDM) transmission with the same digital hardware architecture [1] . For both transmission schemes, equalization is performed in the frequency domain. A cyclic burst transmission with a duration of 10.24 microseconds, as used in OFDM systems, simplifies signal processing in the frequency domain. The choice of a radio frequency at 10.525 GHz is addressed to future standards according to standard IEEE 802.16 covering a frequency range between 2 and 11 GHz. It seems likely that these higher frequency bands will become important for commercial use in future WLAN systems. In Figure 1 , the configuration of the SABA MIMO testbed is shown. The testbed supports up to eight transceiver modules. Each module includes a receive and transmit branch and an antenna switch with a calibration path. In the receiver part, the 10.525 GHz RF signal is down-converted to an intermediate frequency of 175 MHz and in the transmit part the IF signal is up-converted to the RF frequency. Each transceiver is equipped with a local oscillator being synchronized by a PLL using a 10 MHz reference signal. The transceiver modules are connected to one central digital hardware platform by cables. The transmitter and the receiver are controlled by the central digital hardware. A short duration after a transmission begins, the receiver is started. The guard period and the duration between transmission and reception is dimensioned for typical WLAN indoor applications. Due to cyclic burst transmission and channel estimation, no further time synchronization techniques are required. Transmitter and receiver share one common clock, so no further frequency synchronization is needed. The TDD operation enables an alternating use of receive and transmit branches in the base station and in the user terminals. This results in less effort and maintenance of the testbed. Power supply for the transceivers, control bus, the intermediate IF signals, and the 10 MHz reference signal are transmitted via cables. The maximal extension of several 10 m is enough for investigation of indoor applications. In future steps, the cables can be replaced by a carrier and clock synchronization module.
The industrial PC rack in the left part of Figure 1 includes the digital signal processing, embedding a host PC which is connected to two carrier boards with FPGAs modules via cPCI interface. Together the two carrier boards provide eight slots for FPGA modules. Figure 2 shows one partition for a 4 × 4 system. The two BenADDA 1 modules are equipped with 2 DA converter channels and 2 AD converter channels. The AD converters operate at a sample rate of 100 Msps with a resolution of 14 bits, which enables an undersampling of the 175 MHz IF signal. Furthermore, the module provides one FPGA and two memory blocks with a total storage capacity of 8 MBytes . The digital transceivers include pulse 3   ADC  ADC  CLK  DAC  DAC  ADC  ADC  CLK  DAC  DAC  80-bit digital IO  80-bit shaping filters, cyclic extension, and digital up-and downconversions. They are implemented on the FPGAs. The symbol rate in the baseband is 25 MHz. The BenBLUE II (BigBlue) module is equipped with two Virtex II FPGAs from XILINX 2 and additional memory blocks (8 Mbytes), including the FFT and IFFT, as well as a highly parallel processor architecture to perform the algorithms in real time. With speedgrade −6, these modules provide a clock frequency up to 200 MHz. The architecture is described in detail in Section 5. The second BenBLUE module is used to realize channel coding, data generation, and evaluation. In this configuration, the second carrier board is not required but the modular system enables flexibility and extensibility for future investigations. The data rate between the modules on one carrier board is 12.8 Gbits/s. The carrier boards are connected via a back plane allowing a data rate of 6.4 Gbits/s.
The modulation for each data stream can be selected to BPSK, QPSK, 8-PSK, or 16-QAM. Spreading is realized by OVSF codes (orthogonal variable spreading factors) with spreading factors 1, 2, 4, 8, or 16. Training sequences enable necessary channel estimation.
INFLUENCE OF POLARIZATION ON CAPACITY
The channel capacity for different antenna configurations has also been examined [2] . For these investigations, based 2 XILINX Inc., San Jose, Calif.
on indoor simulations using a ray-trace program developed in our institute, simple electric and magnetic dipoles have been used as shown on the right side of Figure 3 .
For every simulation, the antenna configurations shown on the right side of Figure 3 are the same for the transmit and the receive side. Main purpose of these investigations was to compare the dual-polarized with the singlepolarized antennas by analyzing the average channel capacity. In Figure 4 , the advantage of dual-polarized antenna arrays is demonstrated. For single-polarized arrays, a distance of zero between the array elements does not exist, but for illustration this case is also depicted. In this simulated 2 × 2 MIMO channel, the ergodic capacity using dual-polarized elements at high-SNR range is higher than the capacity of the single-polarized antennas. Furthermore, for small aperture lengths, the gain in average channel capacity of the dual-polarized compared to the single-polarized configuration is higher than for greater aperture lengths. As depicted in Figure 4 as well as in the left side of Figure 3 , the capacity of the dual-polarized antennas is independent of the overall aperture of the array. Increasing the number of antennas of both the base and the mobile station(s), as shown in Figure 3 on the right, the resulting ergodic capacity of the MIMO system grows. On the other hand, the contribution of each antenna decreases. Therefore, there is a trade-off between the complexity of the system and the achieved MIMO channel capacity. The use of dual-polarized antenna elements allows to turn the antennas in any direction without any noticeable performance degradation, which is of great advantage, for example, for hand-held applications. As a consequence of the advantages provided by the use of different polarizations, the SABA testbed is designed with the option of using dualpolarized antennas.
ALGORITHMS
Signal processing for our testbed is based on block processing in the frequency domain. As already discussed, this allows for coexistence of OFDM and single-carrier transmission with frequency-domain equalization (SC-FDE). Both schemes show strong similarities regarding performance and signal processing complexity. However, differences that exist between both transmission schemes can be analyzed and discussed using our testbed. In the following, algorithms that are used for joint detection [3] of transmitted data in the uplink and joint-predistortion in the downlink [4] are presented, starting with the fundamental system model. In the following, lower-case letters are used for complexvalued scalars, lower-case boldface letters for complex-valued vectors, and upper-case boldface letters for complex-valued matrices. We use (·) * , (·) T , (·) H , E{·}, and tr{·} for conjugation, transposition, conjugate transposition, expectation, and trace of a matrix, respectively. The Kronecker matrix product is denoted by ⊗. The n × n unit matrix is defined by the symbol I n . The notation [X] i, j refers to the element in the ith row and jth column of a matrix X.
The system model
Since block processing for a multiuser MIMO system with K single-antenna mobile stations and a base station deploying M antenna elements are assumed, for each of the K mobile stations, a PSK or QAM symbol stream d (k) of length N is assigned, where the stacked overall symbol vector follows as
In the following, the covariance matrix of the data symbols is assumed as
That is, the data symbols of each user are assumed to be temporally and spatially uncorrelated. For uplink and downlink transmission, the data symbols of all users are preprocessed before transmission. For the uplink, this preprocessing depends on whether single-carrier transmission or OFDM is used. On the other hand, the transmit symbols also depend on the choice of a spreading scheme that might be used, but this scheme is not described in this section. During downlink transmission, the symbols that are to be transmitted additionally depend on the transmission channel which is used to design filter coefficients for downlink predistortion.
Next, a MIMO channel matrix between K user antennas and M base station antennas is defined in order to describe the relations between all transmit and all receive antennas. For the uplink, the channel matrix follows as
For all examined transmission techniques, additional transmission of a cyclic prefix of length W − 1 before each data stream is assumed, where W denotes the channel length as a multiple of symbol intervals. The cyclic prefix has two Daniel Borkowski et al. functions. First, it prevents contamination of a block by intersymbol interference from the previous block. Second, it makes the received block appear to be periodic with period N. Thus, the convolution of a data stream with a complexvalued channel impulse response (CIR) h (m,k) ∈ C W appears to be circular, which is essential for the proper functioning of the FFT operation. Each of the N × N dimensional subblocks H (m,k) in (2) contains, therefore, the corresponding channel vector h (m,k) in a circulant shape. For the downlink, transmission of a cyclic prefix of length W − 1 before each data stream is also assumed. At the receiver side, the cyclic prefix is discarded. The total system model and the receive vector x then follow for the uplink, for example, as
The transmit symbol vector s is determined in the following sections. Equation (3) also contains a stacked noise vector n. Noise is assumed temporally and spatially white throughout this paper, with zero mean and covariance matrix R n = E{nn H } = σ 2 UL I MN for the uplink and R n = E{nn H } = σ 2 DL I KN for the downlink.
All transmission schemes that are examined in this paper require channel knowledge in the frequency domain. Using the Fourier matrix F N of dimension N, with
the uplink channel matrix in the frequency domain follows
The resulting matrix Δ H is of a blocked diagonal form, that is, composed of diagonal submatrices. These diagonal submatrices Δ (m,k) H in (5) contain the eigenvalues of the circulant channel submatrices H (m,k) . For implementation, the Fourier matrix and its inverse are replaced by efficient FFT and IFFT algorithms, respectively.
For the downlink, the frequency-domain channel matrix follows as Δ T H . That is, reciprocity between uplink and downlink channels is assumed. However, the transmission channels also contain influence of transceivers in the base stations and the mobile stations, which, in general, results in nonreciprocal overall channels.
Joint detection in the uplink

Single-carrier transmission
For the single-carrier transmission scheme using frequencydomain equalization (SC-FDE), the uplink stacked transmit data vector s just follows as
Linear joint detection of the K symbol streams d (k) in the frequency domain yields symbol estimates
where the matrix W contains receive filter coefficients. Depending on whether zero forcing (ZF) or minimum mean square error (MMSE) optimization is applied, W follows as (see [5, 6] )
For efficient matrix inversion to be the fundamental part of the joint detection operation, there exist permutation matrices in order to transform the matrix Δ H into a block-diagonal form with N blocks of dimension M ×K . For notational convenience, the description of the permutation matrices is neglected at this point. Furthermore, note that joint detection in (7) and (8) only holds if the previously assumed covariance matrices R d and R n are temporally white. The whole single-carrier transmission scheme is depicted in Figure 5 .
EURASIP Journal on Applied Signal Processing
Figure 5: Single-carrier transmission system in the uplink.
As described in [5, 6] for time-domain equalization, the zero-forcing-based equalizer yields unbiased symbol estimates, completely eliminating intersymbol interference (ISI) and multiple-access interference (MAI), and thus, containing only the desired symbols and noise. The MMSE detector, however, leads to biased symbol estimates still containing ISI and MAI. These observations also hold for the frequencydomain equalizer described in this section.
OFDM transmission
Using OFDM transmission, all data symbols of each user are transmitted in parallel over N subcarriers. Compared to the previously described single-carrier transmission, only the IFFTs are shifted from the receiving base station to the transmitting mobile stations. The resulting data streams to be transmitted can be expressed as
Joint detection for OFDM follows as
Here, the same ZF-or MMSE-based joint detection operation as described for single-carrier transmission is applied, with W in (10) being defined in (8).
Joint predistortion in the downlink
Single-carrier transmission
To compensate for channel influence and to allow for spatial multiplexing while combating ISI and MAI already at the transmitter, joint-predistortion is applied at the transmitter. The stacked transmit data vector s is then obtained as
where the matrix W contains frequency-domain transmit filter coefficients. The normalization factor β is used in order to constrain the overall transmit energy E tr = E{s H s} to E tr = NKσ 2 d , where
Similar to time-domain predistortion in [7] , either ZF-or Wiener-based optimization can be applied, where W follows as
Both approaches minimize the mean square error (MSE) under the constraint of constant transmit energy by using β. The ZF solution additionally follows a zero forcing constraint. Note that the Wiener approach requires knowledge at the base station of noise power at the terminals. At the receiver, the symbol estimates directly follow as
with just compensating the received signals for power normalization at the transmitter using β. Throughout this paper, uncorrelated data symbols and equal symbol power σ 2 d for all users is assumed. On the other hand, uncorrelated noise is assumed at the receiving base station antenna elements in the uplink, as well as at the receiving mobile stations during downlink transmission. Using these two assumptions and, additionally, assuming equal noise power for up-and downlink with σ 2 UL = σ 2 DL , the predistortion matrix W in (13) can be directly obtained from the uplink filter matrix W in (8) via transposition. Thus, no additional matrix inversion is necessary.
OFDM transmission
Linear predistortion in OFDM systems can be derived easily from the preceding single-carrier concept by moving the FFTs from the transmitter to the receiver with
The same filter matrix W as in (13) and the same normalization factor β as in (12) can be used. At the receiving mobile stations, the symbol estimates follow as
This approach uses the same normalization factor β for one data block as done for single-carrier transmission. That is, identical signal-to-noise ratio (SNR) for all data symbols in a block when using zero forcing is obtained. In combination with Wiener filtering, the MSE averaged over the whole data block is minimized, allowing different SNRs for different symbols. These two optimization strategies lead to system performance where the uncoded bit error rate (BER) of the Daniel Borkowski et al. Wiener approach is worse in high-SNR range than for the zero forcing approach. This is contrary to the MMSE filter in the uplink which always improves the system performance compared to zero forcing. However, when considering coded BER, the Wiener approach becomes always better than zero forcing because the few symbols that might exhibit low SNR due the Wiener filter can be corrected by using proper channel coding. In an uncoded system, those symbols might be lost and cause the degradation of the Wiener filter compared to zero forcing.
In contrast to joint-predistortion for single-carrier transmission, in the OFDM-based system one can also normalize not only the mean transmit power of the whole block by using β, but also has access to each subcarrier to normalize the transmit power of each corresponding symbol separately. However, for this approach, the N normalization factors for all subcarriers need to be known at the receiving mobile stations in order to compensate for transmit normalization. This approach is not further considered for implementation since erroneous estimation of N normalization factors additionally degrades system performance.
Extensions towards successive algorithms
For the uplink, detection of K symbol streams can also be achieved successively. That is, symbols of one user are estimated, quantized, fed back, and subtracted from the received signals. If quantization has been done correctly, the interference of this user is thus removed from the received signals so that the reliability of the succeedingly detected symbol streams increases. This concept is termed spatial decision-feedback equalization and is part of the well-known BLAST systems. The drawback consists in error propagation when symbols to be fed back are estimated wrongly. Successive joint detection is better suited to be implemented within OFDM systems rather than within SC-FDE, because for OFDM, quantization of already detected symbols, as well as feedback and feed forward filtering, take place in the frequency domain. Thus, there is no need for additional transformation between frequency and time domain as there is for SC-FDE.
For the downlink, successive predistortion is very similar to the uplink case. The downlink filter matrices can also, under some conditions that are discussed in Section 4.3.1, be directly obtained from uplink filter matrices via transposition. However, the quantization operation is replaced in the downlink by a modulo operation, which periodically extends the complex symbol plane in order to minimize transmit power. The drawback of this approach is symbol ambiguity due to the modulo operation.
In general, the main advantage concerning BER performance of both successive detection and successive predistortion is obtained in near-far scenarios where mobile terminals have different distances towards the base station. For examination of coded BER, a straightforward approach consists in sequential processing of symbol estimation and channel coding. That is, even for successive detection, symbols of all users are estimated before feeding this information to the channel coding block. For more advanced signal processing, but which is also more complex, one can integrate channel coding into each feedback loop of successive detection. This leads to an approach that is usually termed as turbo equalization.
Channel estimation
Channel estimation is achieved in the frequency domain by transmission of a preceding training block, followed by several data blocks. That is, each user transmits a cyclically extended training block containing complex-valued random symbols that are known to both the transmitter and the receiver. In the current real-time version of our testbed, training blocks of different users/transmit antennas are transmitted sequentially. For a more efficient way of channel estimation, all training sequences can be transmitted at the same time, but over different subcarriers so that there is no interference between the different training sequences. For this approach one can exploit the finite length W of the channel impulse response. Due to this assumption, each transmitter needs to transmit training symbols only over each (N/W)th subcarrier and the channel transfer function of all subcarriers in between can be estimated via interpolation.
DIGITAL SIGNAL PROCESSING
Data flow
The system flow of the digital signal processing at the base station for the receive and transmit mode is shown in Figure 6 . Regarding the receive mode, the A/D converter samples and quantizes the analog signal from each antenna. After that, the digital transceiver applies a pulse shape filter (here raised cosine filter) on the data, removes the cyclic extension, and converts the signal digitally down to baseband. Dedicated FFT blocks then transform each data stream to the frequency domain. Afterwards a parallel signal processor, which is described in detail in Section 5.2, performs the equalization algorithms. Regarding single-carrier (SC-FDE) transmission, symbol decision is realized in the time domain; therefore, the data has to be transformed back to the time domain. In OFDM systems, symbol decision is made in the frequency domain; therefore, the data is routed directly to the sink. Concerning OFDM transmission in the transmit mode, the symbols are available in the frequency domain and transmitted directly to the processor. In SC-FDE, a Fourier transform is required before applying joint-predistortion in the frequency domain. The pre-equalized data streams are transformed back to the time domain before being forwarded to the digital transceiver.
For each transmission mode, the equalization is performed on the same digital hardware processor. Only the use of the FFTs and IFFTs is different. The redirection of the data flow is realized by multiplexer.
Parallel hardware architecture
The requirements for the signal processing can be derived from the used algorithms as described in Section 4. All algorithms are based on matrix operations: matrix inversion and multiplication. According to the used frequency-domain system model, groups of subcarriers can be calculated independently of each other in parallel. An efficient hardware architecture should therefore be optimized for matrix and vector operations and should provide parallel processing.
In the SABA MIMO testbed, a software approach for the digital signal processing is implemented. The benefits are (i) use of different algorithms, for example, MMSE, ZF, adaptive filters, (ii) flexibility in matrix dimension, (iii) reduction of hardware complexity.
Furthermore, a novel highly parallel hardware architecture was developed to cope with the high computational burden.
The architecture is based on the SIMD (single instructionmultiple data) principle. As shown in Figure 7 , up to thirtytwo processor elements work in parallel and execute the same program. A daisy chain serves each processor element with data from the FFT/IFFT. Each processor element is equipped with a local memory unit to store the data streams as well as intermediate results during calculation. To serve the arithmetical unit, three read and one write accesses per cycle are required. Inside the arithmetical unit, a word length of 18 bits for real and imaginary parts of the complex values is used.
The arithmetical unit is optimized for vector operations and includes a MAC (multiply and accumulate) unit as well as a divider unit. The divider unit enables to calculate the reciprocal of a real value. This operation is required, for example, to normalize intermediate results. The number of divisions in the considered algorithms is very small compared to the number of MAC operations. In this approach, eight processor elements share one divider unit. This leads to a reduction of the amount of logic resources with a very small impact of run time in the used algorithms.
Some algorithms use information of adjacent frequency points, for example, frequency tracking or interpolationbased algorithms. This contradicts the SIMD principles. To also realize these algorithms on this hardware architecture, an interconnection network was implemented. This network is realized as a daisy chain, which means that each element is connected to the adjacent one.
One common control unit is used which operates on vector commands. The unit executes the programs stored in a separate program memory and generates the same control sequence for each processor element. A vector command has a size of 64 or 128 bits and includes the control sequence for the arithmetical unit and information to calculate address information for vectors. The integrated address generator enables to calculate the addresses of a vector with up to 128 elements based on start address and a jumping width.
The arithmetical unit is optimized for MAC operation. Four real multipliers and two accumulators enable one Daniel Borkowski et al. complex multiplication per cycle. The accumulators for real and imaginary parts can optionally be preloaded with any value stored in the local memory.
Fixed-point arithmetic
The whole architecture is based on fixed-point arithmetic.
It is well known that fixed-point calculation causes underand overflows during computation. Matrix inversion is also a problem in fixed-point arithmetic. To solve this problem, programmable scaling units are implemented on dedicated positions in the MAC and divider unit (e.g., between multiplier and accumulator in the MAC unit). The units are designed as programmable shifters which cut off the unused MSBs or LSBs. The shift length is defined in the program. Furthermore, simulations were made to investigate an optimum word length. In Figure 8 , the estimated bit error ratio (BER) of a fixed point MMSE algorithm is compared with a floating point algorithm. The simulations are based on a worst case scenario using a system with eight receive antennas and seven transmit antennas. Moreover, correlation between the antenna elements is assumed. The floating-point algorithm achieves a BER of 10 −2 assuming an SNR of 20 dB. The reserve parameter B indicates the parameter set for the scaling units. The parameter for each scaling unit in the computing chain can be derived from the reverse parameter B, which is not further discussed in this paper. As a result of Figure 8 , a word length W mem of 18 bits is required to achieve a BER comparable to the floating-point simulations.
ANALOG SIGNAL PROCESSING AND CONTROLLING
MIMO algorithms also affect the analog signal circuitry. Compared to SISO systems, there are higher requirements for linearity, dynamic range, LO phase noise, and return loss suppression between antennas and transmitter outputs and receiver inputs, respectively. Some downlink predistortion schemes based on channel uplink matrix estimation require transceiver calibration in order to provide a reciprocal channel matrix [8] . The large number of adjustable parameters, for example, uplink, downlink, calibration mode, attenuation, antenna polarization, and the transceiver state monitoring demands an efficient controlling procedure.
All RF circuits described below have been simulated and designed by common CAD tools and have been realized on soft substrates using SMD components.
Before the analog transceiver module is described in detail, the calibration concept is introduced.
Calibration concept
A very important issue that has to be considered in MIMO antenna systems is the calibration of the front-ends. It is known that for a bidirectional transmission, using downlink predistortion, the reciprocity of the channel has to be provided. Front-end imperfections are the reasons for the nonreciprocity of the system.
The MIMO antenna system description uses an ideal transmission scattering matrix model in the frequency range such as that shown on the left side of Figure 9 with
In this case, the reciprocity of the channel is given if the following conditions are fulfilled:
In (17), S BM and S MB represent the uplink and the downlink channel matrix. S BB and S MM contain the matrices of the reflection factors of the base and mobile station. a B,M and b B,M are the incoming and the outgoing waves at the base or the mobile station. In a real channel, as depicted on the right side of Figure 9 , other effects like the mutual coupling of the antennas and the transceiver mismatch influence the overall performance. Regarding these effects, the matrices S BM and S MB result in [9]
The diagonal matrices A XX contain the amplifier coefficients and V, U T , W, X T the antenna mismatching and coupling.
Using the following equations
where the diagonal matrices R XX comprehend the reflection coefficients, the reciprocity of the real channel can be achieved by doing the following tasks [9] . (1) Minimizing the matrices S yy and R xx within V, U T , W, and X T . This requires a very good matching of the transmit/receive (TR) modules and the antennas, depending on the antenna coupling. (2) Equalizing the responses of the transmitters and receivers to Dirac pulses.
The calibration consists of two parts. The first one is a wideband off-line calibration which is used to compensate the influence of the passive elements, the DA converter, the nonideal transformers, and the IF filter on the signal processing side. The other part is a narrowband on-line calibration of the active components which are causing changes in phase and amplitude of the signals during operation. To perform these two calibration schemes with sufficient accuracy, certain hardware requirements have to be met.
A real MIMO system was modeled and simulated using the scattering matrix form introduced above. Figure 10 shows the results of these simulations which are based on an indoor scenario at a frequency of 10.5 GHz. The number of the base and mobile station antennas is 8 and 4. The modulation used is QPSK. For a 1 dB degradation at a BER of 10 −3 in comparison to an ideally calibrated and matched system, the unbalance of the magnitude must be lower than 1.4 dB and the unbalance of the phase lower than 10
• between all transceivers. The transceiver return loss should be lower than −3 dB and the antenna matching and coupling should both be lower than −10 dB.
To perform 16 QAM modulation, the requirements for reciprocity increase.
Block diagram of the transceiver
The block diagram of Figure 11 gives an overview of the analog signal processing of a transceiver. Each transceiver of our demonstrator consists of filters, amplifiers, controlled attenuation circuits, mixers, LO frequency processing, voltage/current supply, sensors, and control and surveillance circuits.
For calibration purposes, each transceiver is provided with a reciprocal third channel. This signal path has an individually measured and recorded transmission behavior. During calibration mode, it helps to supply the receiver input or to clean up the transmitter output via the combined antenna/calibration switch.
Only one analog frequency conversion is employed, resulting in less effort for LO generation and smaller number of components (e.g., filters and mixers). The corresponding larger filter losses (5-8 dB) can easily be compensated by lowcost amplifiers at low-power levels. In contrary to the commonly implemented complex I-Q conversion using two A/D converters, the present concept utilizes subsampling conversion, so that problems with mixer and I-Q imbalance can be avoided. After the A/D conversion, a digital signal of about 30 MHz bandwidth with a resolution of nominal 14 bits is available, which is further shifted down to the complex baseband. The same principle in reverse direction is used for the transmitter.
All transceivers must be supplied with a common phaselocked oscillator signal. Base station transceivers at higher frequencies should have short connections to the antennas, otherwise these may have distances of several to many wavelengths from one to each other. Moreover terminal transceivers should have the same architecture as the base station transceivers. Therefore it is advisable to generate the local oscillator signal on the transceiver board. To lock the phase of the LOs, a 10 MHz reference signal is used, which is generated in a low-noise TCXO in the centralized hardware and is distributed via cables. The output of the 10 GHz VCO is amplified and distributed to the three frequency converters of the transmitter, receiver, and calibration path.
A factor which causes a considerable degradation of system performance is phase noise (PN). Phase noise is introduced during the up-and down-conversion of the signal. It was shown by simulations in [8] , that the main degradation is caused at the base station. However, when the phase noise at the base station is coherent, which means that the T/R branches share one common LO, less interference is introduced. A common phase error (CPE) occurs, which can be easily estimated and compensated.
Daniel Borkowski et al. Due to the calibration requirements, all transceivers should behave in the same manner in case of fluctuation of the power supply, temperature, and the variation of the used elements. To guarantee a stable operating point of the amplifiers, stabilizing circuits with mounted sensors are used. The adjustment of the operation point of these amplifiers can also be performed manually. Furthermore, these circuits are able to warn if any component is not working properly. The maximum transmitter output power is 20 dBm for linear operations, realized in five amplification stages. The minimum receiver input sensitivity is −83 dBm and is amplified by six stages up to +6 dBm. The noise figure of the receivers is about 1 dB. together covering an attenuation range of about 70 dB with a 1 dB resolution. These attenuators consist of analog and passive components, which are digitally adjusted to a certain attenuation value. Another requirement of these components is a short switch time. The reaction of the digital controlling and analog attenuation circuit must be quite small, which means smaller than the guard period of the used chip. The attenuators at RF have to realize attenuation steps of about 0, 20, and 40 dB. These values are allowed to have large tolerances of several dB. However, they should be constant within one calibration period. As no precise attenuation values are required, the circuit of Figure 12 has been chosen, which is distinguished by a very simple structure using only four PIN diodes for the required three stages. The PIN diodes are connected between the microstrip transmission lines and the stubs, marked with the numbers 1 to 4 in the layout. The achieved attenuations are about 1, 22, and 42 dB with return losses at both ports better than 20 dB at all stages.
Attenuators
Antenna and calibration switches
Because the transmission scheme is based on TDD, the antennas must be connected alternately to the receiver input and to the transmitter output. They are designed similarly to the attenuator circuits and are digitally controlled as well. For the calibration mode, the switch has to provide a further port, serving as an input or output for the calibration RF signal. The specifications of this four-port component result from the following requirements: low insertion loss in the transmit or receive path (< 2 dB), high isolation within the corresponding off-state (> 40 dB), connection of the calibration port to the transmitter output or receiver input with controllable attenuation factors for supplying the calibration signals at different levels, sufficient low return loss at all ports during all states of the switch, and fast switch time to fulfill the timing requirements. Figure 13 shows the antenna and calibration switch realized using microstrip technology. The switch consists of 90
• hybrids that are connected in series. The PIN diodes are connected in such a way, that they cause a short circuit in various places of the circuits, to realize the different operation states of the switch.
Control units, system monitoring, and power supply
A microcontroller is used to survey and control the transceivers. The microcontrollers are nowadays not only low-cost and very small elements, but also easily programmable and very fast. The used microcontroller is able to both handle all mnemonic codes and perform an 8×8 multiplication during one cycle of 100 nanoseconds. The program memory is based on FLASH-ROM so that the code can be reprogrammed on demand without any need to change the circuit. In the microcontroller, a multichannel A/D-D/A converter and an EUSART (enhanced universal synchronous asynchronous receiver transmitter) are integrated. Different states can be read and monitored through the analog interface and the sensor values, for example, the output power or the fluctuations of the operating voltage can be evaluated and processed. EUSART is a free programmable serial interface unit which enables data transfer to the FPGA. The microcontroller receives the control information by this fast bidirectional serial interface from the FPGA and gives surveillance information through a feedback channel over the same interface as before to the FPGA. This makes the FPGA able to supply each transceiver independently with control data. Some of the control information are listed below:
(i) operating state (down-or uplink, calibration mode), (ii) current attenuation value of the transmitter or the receiver branch, (iii) polarization state of the antenna, (iv) data delays of the cables used for synchronization and equalization in all switch states.
The surveillance information of the transceivers includes, for example, (i) the currently transmitted power of the antenna, (ii) malfunction of circuit components, (iii) fluctuations of the operating voltage, (iv) the current state of the switching circuits, (v) data failure in the system. The serial data channels are realized by using two coaxial cables, each cable for one direction. The data rate is 10 Mbits/s for each direction and is quite the same as in the RS232 protocol. In these coaxial connections the IF signals for both directions (up-and downlink) are also added. The IF signal is a passband signal of f IF = 175 MHz and a bandwidth of f BW = 30 MHz. The serial control information on the other hand are lowpass signals with a cut-off frequency of 100 MHz. They can be distinguished from the IF signals by filtering. In the microcontroller, additional functions have also been implemented to simplify the off-line or the on-line calibration. The microcontroller is able to check for every state which is needed to perform a calibration, thus making the electrical measurement of the transceivers more convenient. The power supply of the transceivers is implemented by means of DC/DC converters. They work with only one input voltage value to create more than one high-power output voltage state. Some of the advantages of such a circuit are that it does not need a negative input voltage and that even high fluctuations of the input voltage do not affect the power dissipation. In general the power dissipation of such circuits is much lower than in circuits with linear controllers and the surveillance of the output power can be performed in more convenient ways. Instead of using a common two-wire line, we use a coaxial cable both for supplying the input voltage and the f ref = 10 MHz reference frequency. A filter is used to separate the reference frequency from DC.
Realization of RF components
To demonstrate the realization and integration of the RF components, a complete RF transmitter/receiver branch is depicted in Figure 14 . The operation frequency is f RF = 10.5 GHz as mentioned above. The two different branches can be distinguished, the branch on top is the receiver and the other one the transmitter. The first one consists of lownoise amplifiers (LNAs) at the right input, a controlled attenuation circuit followed by another amplifier stage and a passband filter. One more amplifier stage is at the end of the receiver branch. From there, the signal is directed to the IF mixer. The transmitter branch, beginning from the left, consists of a two-stage high-power amplifier followed by a passband filter and a balanced amplifier stage. A Wilkinson power divider connects the amplifier stage with a controlled attenuation circuit. By using one more divider, we connect the previous circuits to the final amplifier stage, which is also balanced. At the end of the circuit, a coupler can monitor the transmission power.
SUMMARY AND OUTLOOK
In this paper, the real-time MIMO testbed SABA has been presented. A system concept has been described to perform equalization algorithms in the frequency domain. The digital signal processing is based on a novel highly parallel processor architecture using the SIMD principle to compute these algorithms in real time. Furthermore, the architecture enables single-carrier and OFDM transmission on the same hardware platform. A MIMO specific calibration concept is presented and the concept and design of the analog hardware is described.
The 25 MHz RF bandwidth is divided into 256 subcarriers. Regarding 16-QAM modulation, each data stream achieves a raw data rate of 100 Mbits/s. The carrier frequency of 10.525 GHz and the exploitation of multipath fading targets typical WLAN applications.
Using this testbed, future transmission schemes for spatial multiplexing and multiuser systems can be tested and evaluated. Due to bidirectional transmission, uplink channel knowledge can be reused for downlink predistortion. Thus, among different transmission schemes, the promising approach of predistortion can be tested under realistic conditions.
The testbed is still under development. Currently, the system consists of two transmitters and two receivers enabling joint detection of two spatially multiplexed data streams. Joint-predistortion can be currently used only for the same transmission direction because the switch and the controlling part for bidirectional transmission are still in progress. However, first performance measurements result in an average uncoded BER of 10 −2 for 16-QAM transmission at 25 dB SNR.
In the next step, the system will be extended to 2 × 2 bidirectional transmission and after that to a 4 × 4 MIMO system. Furthermore, channel coding algorithms will be implemented.
Since the transmission quality varies with channel, adaptively choosing between spatial multiplexing and conventional orthogonal transmission schemes as, for example, TDMA is desirable. Therefore, the physical layer provided by our MIMO testbed will be combined with an adaptive MAC layer. The MAC layer functionality will be implemented in the existing FPGA system and will provide real-time capability. The hardware platform for the MAC layer will be a FPGA module with two embedded PowerPCs on board. This extension is granted by the German Research Foundation (DFG).
