The design and implementation of the Virginia Tech Space-Time Advanced Radio (VT-STAR), a multiple antenna element spacetime (ST) processing prototype testbed, is presented. The testbed is a research tool for comparing practical and theoretical performance metrics (e.g., throughput, link reliability) in different wireless channel conditions. The prototype builds around softwaredefined radio (SDR) concepts on a DSP platform and provides the flexibility to implement various forms of ST techniques. Different components of the system are described in detail, including the software implementation, I/O schemes with custom hardware, and data transfer mechanisms between the DSP and the host PC. Two different example realizations are presented, a real-time demonstration and an offline measurement tool. Finally, some representative measurement results obtained in indoor environments are presented. These results show VT-STAR to be a promising tool for performing MIMO experiments and generating channel measurements that can complement simulation studies in this area.
INTRODUCTION
With the integration of Internet and multimedia applications in next-generation wireless communications, the demand for reliable high-data-rate services is rapidly growing. The wireless channel introduces a variety of impairments to the transmitted signal, including large-scale and smallscale fading, channel-induced intersymbol interference (ISI), noise and multiuser interference. To mitigate these phenomena, diversity can be exploited to enhance performance over a broad range of channel realizations. Space-time coding (STC) schemes implement multiple forms of diversity by combining the channel code design and the use of multiple transmit and receive antennas, thereby creating a multipleinput multiple-output (MIMO) channel. The encoded data is split into n T streams that are simultaneously transmitted using n T transmit antennas. The received signal is a linear superposition of these simultaneous transmitted symbols corrupted by noise, interference, and channel-induced ISI. Space-time decoding algorithms using channel estimation techniques are incorporated at the receiver to achieve diversity and coding gains. Various techniques that exploit the capabilities of MIMO channels have been proposed in the literature. Among them, the main classes are (i) BLAST-Bell Labs layered space-time architecture, proposed by Foschini et al. [1] ; (ii) space-time trellis codes (STTC), proposed originally by Tarokh et al. [2] ; (iii) space-time block codes (STBC), proposed originally by Alamouti [3] .
While BLAST technology strives towards increasing the throughput of wireless systems by an order of magnitude, space-time codes allow for improved link reliability by exploiting the spatial and temporal diversity of the MIMO channel. Space-time codes have been adopted recently by the 3G standardization committee for implementation as one of the transmit diversity modes in 3G wireless networks [4] .
A hardware platform is desired to fully explore some of the details of the implementation of STTC, STBC, or BLAST. While it is possible to study the performance of these algorithms in simulation, the assumptions inherent to simulation mean that the algorithm's performance when applied to a real system may not match those of a practical real-world system. In most of the work in this area, researchers have assumed ideal timing and phase tracking at the receiver as well as a perfect channel estimation process for their simulations. In practical systems, however, these assumptions are not realistic. In fact, the performance of the space-time architecture relies heavily on accurate channel tracking process [5] . In order to explore the multiple aspects of MIMO systems described above, the goal of the Virginia Tech SpaceTime Advanced Radio (VT-STAR) [6] is to create a platform that allows the evaluation of the channel and the implementation of various space-time algorithms. Employing software defined radio (SDR) concepts, a variety of baseband configurations can be implemented in software only with minimal nonprogrammable hardware; furthermore, these software modules may be leveraged by future research activities.
There are several MIMO testbed systems that have been reported in the recent literature and we briefly summarize some of these. It is possible to measure some of the characteristics of the channel using multiple elements only at the receiver. To this end, a system with 1 antenna element at the transmitter and 8 antenna elements at the receiver, denoted 1 × 8 system, operating at 2.4 GHz was developed at Ohio State University to measure the wireless channel in receive diversity settings [7] . Another approach to measure the characteristics of the MIMO channel is to use a wide variety of single-input single-output (SISO) measurements with static channel conditions [8] . By performing this series of measurements, the authors claim to measure the behavior of a MIMO channel. In [9] , a 4 × 4 MIMO system was created that performs pseudo-parallel transmissions, a switch cycles through antennas every 200 microseconds, and performs parallel receptions of the signals, estimating the MIMO channel. Wallace and Jensen [10] implemented a 4×4 MIMO system with a wide variety of antenna geometries that was limited to data collection capability only. Yu et al. [11] reported the use of an 8 × 8 system for characterizing narrowband indoor propagation channels at 5.2 GHz. The authors in [12] report the field test of a 4 × 4 system with 30 kHz bandwidth in outdoor mobile environments and the MIMO measurements were based on transmitting separate orthogonal Walsh sequences from each transmit antenna. In [13] , the spectral efficiencies for a BLAST-based communication system were verified by outdoor channel measurements using a 5 × 7 system developed at Bell Labs. The testbed used a 2.44 GHz narrowband system where five narrowband frequencies were transmitted simultaneously from the five transmitting antennas. In [14] , the authors reported a rapid prototyping system using FPGA for implementing a 4 × 4 BLAST system over the UMTS standard with 5 MHz bandwidth. The authors in [15] report a 1 × 8 MIMO channel measurement system that was used to emulate multiple virtual-antenna operation and to study the capacity of both frequency-flat and frequency-selective channels at 5.2 GHz. A 3 × 3 broadband 20 MHz V-BLAST-based MIMO-OFDM prototype was developed for 802.11a standard in [16] , where digital downconversion and signal conditioning were implemented on FPGA boards and the signal processing was done offline on collected data. A simple Alamouti scheme using with QPSK modulation for 2 × 2 STBC transmission was prototyped on FPGA boards [17] and verified on a wireless channel emulator (rather thana real-time over-the-air experimentation). The authors in [18] presented three types of MIMO testbeds developed at UCLA, the first two of which were based on offline processing while the remaining one was implemented on ASIC chips to provide real-time operation.
Two common characteristics emerge from the review of the aforementioned testbeds. First, majority of the prototypes reported were specifically designed for channel measurements [7, 8, 9, 10, 11, 12, 13, 14, 15, 16] to study the improvement in MIMO channel capacities and effect of correlation between the antennas to verify simulation and analytical results. The second type of prototypes developed and showcased the requirements for implementing different MIMO algorithms in real time [17, 18] . The VT-STAR system was built to allow both channel measurements as well as to demonstrate real-time and reconfigurable implementation aspects on the same DSP platform through software radio concepts. A DSP-based system provides reconfigurability, rapid prototyping, and low-cost implementation, albeit the supported data rate may not reach that from ASIC implementation. The low-cost implementation of the VT-STAR system has proven to be a small budget educational tool to enable students to understand practical implementation issues regarding MIMO systems and to enhance their knowledge on capacity improvement in a real channel environment.
The remainder of the paper is organized as follows. Section 2 provides an overview of the system architecture. Sections 3 and 4 describe the transmitter and receiver architectures, respectively, addressing system operating modes, RF front ends, multichannel data conversion, and spacetime coding algorithms implemented in baseband. Section 5 presents representative capacity results measured using VT-STAR. Finally, Section 6 concludes the paper.
SYSTEM ARCHITECTURE
Some form of programmable processing is necessary to implement a variety of space-time algorithms. Two primary options are available that allow both programmability and high performance: field-programmable gate array (FPGA) and digital signal processor (DSP). While FPGA offers a powerful platform that can provide higher performance than DSP, it suffers from one major drawback: difficult programming interface. The goal of VT-STAR is to support the research of a variety of wireless engineering researchers, and it is not reasonable to expect every user of this system to display the proficiency necessary in very high-speed integrated circuit hardware description language (VHDL) to implement their algorithms on an FPGA. DSP, on the other hand, can be programmed in a high-level language such as C, which is well understood by the vast majority of wireless engineers, and can be programmed using floating-point arithmetic, significantly reducing the complexity of the software. The drawback in DSP is that it is not as fast or computationally efficient as an FPGA, limiting the complexity of the real-time algorithms that can be tested on the system. While an efficient design would include both FPGA and DSP with the functions partitioned appropriately, however, the main focus in this work was on a short development cycle for the first prototype. One of the fastest floating-point platforms available, the Texas Instruments TMS320C67 DSP [19] , was selected as the computational platform, which is usually capable of 1 GFLOPS. While a powerful processor was selected as the core of the VT-STAR, its real-time data exchange (RTDX) feature allows it to operate as an acquisition unit that stores the raw received data vectors. Algorithms that are beyond the capabilities of the real-time processor or research that does not have real-time demands may be implemented using postprocessing.
VT-STAR architecture, described in Figure 1 , is based on a 2 × 2 antenna element array, which allows the exploitation of transmit and receive diversity mechanisms at the signal processing level.
The processing on the transmitter side is carried out with a TI TMS320C67 (50 MHz, 900 MFLOPS max) DSP starter kit (DSK) while that on the receiver side with a TI TMS320C670 (33 MHZ, about 1 GFLOP) EVM. The radio frequency (RF) transmit and receive front ends accommodate a multichannel two-stage up-(and down-) conversion between the RF section, which is centered at 2050 MHz, and the baseband section. The VT-STAR operating frequency of 2050 MHz was chosen because of propagation similarities compared to the US PCS band, worldwide 3G radio bands, and the US 2.4 GHz unlicensed band. Performance improvements demonstrated in the 2050 MHz band by VT-STAR would be realizable by worldwide wireless communication systems operating in nearby bands. The system bandwidth at the baseband level spans up to 750 kHz. This bandwidth constraint stems from the design choice of the multichannel ADC, which has a maximum sampling rate of 1.5 MSPS per channel. Four identical and time-synchronized TI THS5661 EVMs, connected to the C67 DSK through custom interface boards, performed the digital-to-analog conversion (DAC). A multichannel TI THS 1206 EVM, mated to the TMS320C67 EVM without an external interface board, performed the analog-to-digital conversion (ADC) on the receiver side.
The core algorithms, implemented on TMS320C67 floating-point DSP processors, include space-time encoding along with modulation and pulse shaping at the transmitter side and matched filtering, space-time processing, automatic gain control (AGC), channel estimation, timing recovery, and maximum likelihood decoding at the receiver side. The RTDX feature of the C67 supports host target communications at the receiver side, and offers both real-time monitoring of physical layer parameters (e.g., bit error rate, diversity gain, constellation diagrams) and data acquisition operation. A host PC, which runs a multithreaded application to manage a Matlab session, is used to display the physical layer parameters, or perform postprocessing of stored data. Table 1 summarizes key parameters used in the design of the VT-STAR.
TRANSMITTER
The component layout of the VT-STAR transmitter is shown in Figure 2 . The transmitter is composed of three separate sections, the processing core, the data interface, and the radio hardware. The processing core is a C67-based DSK, providing the processing backbone to generate baseband D-STBCencoded symbols that are synchronously transmitted to the dual RF chains. The data interface is composed of multiple DACs, since a single multichannel commercial DAC board or EVM was not available. boards were operated in parallel to emulate a 4-channel DAC. The THS5661 DAC board is a relatively simple EVM running at a sampling rate of up to 100 MSPS with 12-bit input data resolution. The time synchronization between the DACs was maintained by driving them from a single clock from the DSK. Finally, the analog output signals were fed to the RF chains where the signals get upconverted to the RF carrier frequency of 2.05 GHz. The phase synchronization between the RF chains was maintained by driving them with the same local oscillator (LO).
D-STBC algorithm on the transmitter
The STBC algorithm implemented on VT-STAR is the differential-STBC (D-STBC) with simple maximum likelihood (ML) detection [20] . D-STBC has the main advantage of rendering carrier phase recovery and channel estimation unnecessary. This feature allows for a far simpler implementation of STBC as the first prototype. The functional blocks that were implemented include QPSK and M mappings, differential encoding, and STBC.
Software implementation of D-STBC on the C67
Prior to implementing the algorithms on the DSP, a complete link level simulation was developed in Matlab. The simulation tools played an important role in the design process of the radio, providing a verification of system-level issues such as performance versus complexity tradeoffs. These tools also act as a source for generating test vectors for validating the different DSP functional blocks, simplifying the debugging process of the DSP code. The flow diagram of the software implementation on the C67 is shown in Figure 3 . A pseudonoise (PN) generator was used to generate m-length PN sequence that acted as the input information stream. The information bits are modulated by QPSK mapping and encoded by core D-STBC processing. The resulting baseband complex symbols, I 1 , Q 1 for antenna 1 and I 2 , Q 2 for antenna 2, were individually pulse shaped by square-rootraised-cosine (SRRC) filters with rolloff factor of 0.35. The pulse-shaping filters are of finite impulse response (FIR) filters with 19 taps. Four filters (I and Q each for two antennas) with oversampling factor of 3 were implemented. Simulation results indicated that oversampling of 3 samples/symbol would suffice and result in less than 0.5 dB degradation as compared to the performance of the system with 4 samples/symbol. The design choice of 3 samples/symbol allowed us to reduce processing load and increase throughput with minimum penalty in performance. Finally, the filter outputs were properly formatted in the data-packing segment to match the output interface requirements. This segment is described in details as follows.
A parallel output scheme is necessary to maintain time synchronization across the antenna elements while transmitting I 1 , Q 1 and I 2 , Q 2 . The C67 DSK has an external memory interface (EMIF) bus J that supports parallel I/O of a 32-bit word. Since four independent DACs have to be addressed with the single 32-bit word, 32-bit wide I 1 , Q 1 , I 2 , and Q 2 words were truncated to 8-bit words, and then concatenated to form a single 32-bit wide transmitter (TX) word. interface design. The DACs were addressed through memory mapped addressing in the CE1 space of the DSP. The related timing parameters, for example, hold time and rise time, were checked to ensure that they matched the timing specifications of the data converters. The write operations generate a periodic control signal that is used as an external clock to the DACs.
Certain level of code optimization was performed by writing the computationally dominant pulse-shaping filtering in assembly language. Profiling was performed on the overall code for an instruction cycle of 6.7 nanoseconds, and the resulting number of clock cycles required for each functional block is shown in Figure 4 . The cycle counts in Figure 4 represent time required for 4 information bits. Pulse-shaping block represents SRRC filtering in assembly on 24 samples and presents itself as the most computationally intensive process. PN generator and data packing dominate the remaining processes.
Transmitter I/O mechanisms
Real-time generation and transmission of data at a constant rate was maintained through the use of both the softwareand hardware-driven interrupt capabilities of the DSP BIOS configuration section of the DSK. The output scheme was based on a double output buffer concept. When one buffer was used for storing STBC encoded symbols, the other buffer was used for transmitting previously stored symbols to the output port, and vice versa. A high-priority hardware interrupt (HWI), driven by timer 0 (T0) with time period T = 144 milliseconds, services an interrupt service routine (ISR) that accesses one buffer and transmits a 32-bit TX word to the interface board. During HWI intervals, a low-priority software interrupt (SWI) performs D-STBC encoding process and stores a 32-bit TX word to the other buffer. The timer period T is chosen such that the SWI rate is slightly faster than the HWI rate, and the SWI process waits after filling up its designated buffer until the HWI process is done transmitting all the contents of its buffer. The DSP BIOS capabilities have been used to monitor and maintain all the process control works so as to achieve a real-time implementation and transmission at a certain constant rate.
Routing of digital data from the DSP to the four DACs (one for I and Q for each antenna element) required special data distribution and custom hardware to support the distribution mechanism. Two boards were designed and fabricated to interface the nonstandard 50-mil 80-pin connector J1 on the DSK to the standard 100-mil connector on the DACs. Data routing or signal distribution included splitting the clock signal via a clock distribution chip (CDC) and splitting the 32-bit TX word into four 8-bit words and connecting them to the 8 MSB (each DAC is 12-bit) [21] . The write enable (XWE) signal, acting as the master clock, was distributed by the CDC and the outputs were synchronized within 50 nanoseconds. This is considered to be a satisfactory keeping in mind the long sampling interval of 144 milliseconds. An additional RC network was introduced to each DAC to remove the DC bias from its single-ended outputs, to avoid transmission of the carrier signal.
Transmitter RF front end
The RF section is based on two-stage upconversion with a 68 MHz IF for each antenna element and the two RF chains were phase-synchronized with common local oscillators (LO). The upconverted signal centered on 2050 MHz RF carrier was transmitted by two vertically polarized, coplanar, quarter-wavelength monopole antennas. Monopole antennas were selected because of their simple design, demonstrating that performance gains can be realized using antennas that are practical for handheld wireless devices. Antenna spacing can be varied on the VT-STAR to test the performance of the system versus antenna spacing for different radio environments.
RECEIVER
The receiver architecture, presented in Figure 5 , is composed of two RF branches, multichannel ADC, TMS320C67 DSP EVM, and a host PC. The receiver front end uses two vertically polarized, coplanar, quarter-wavelength monopole antennas to receive the signals, centered at 2050 MHz. The signals are amplified, downconverted to baseband via IF and sampled by the multichannel THS1206 ADC EVM. The C67 DSP software performs space-time decoding in real-time mode or collects raw data as a data acquisition unit in snapshot mode. The host PC is used for control of the DSP via TI's code composer, for display and storage of relevant physical layer parameters, and, when applicable, for the postprocessing of raw data in Matlab.
Receiver RF front end
The receiver RF front end is based on two-stage downconversion with an IF of 68 MHz. The receiver RF chains were designed to accept automatic gain control (AGC) signals so that the DSP can control the gain of the RF front end. Imbalances between the I and Q channels of the chains are characterized and compensated with scaling factors at the DSP.
Receiver I/O
The multichannel THS 206 ADC EVM selected for the interface between the RF front end and the DSP has a maximum sampling rate of 1.5 MSPS/channel with a resolution of 12 bits [22, 23] . This maximum sampling rate was not utilized, since the computational complexity of the decoding algorithms at the receiver DSP on the receiver side would overwhelm the processor. The ADC uses an internal FIFO of variable length (up to 16 words) to store digitized received samples and generates a hardware interrupt when the FIFO gets filled to a preset depth. The DSP executes an ISR to retrieve the samples from the FIFO. The real-time sample retrieval relies on alternating double-buffer concept with an appropriate sampling rate similar to the one used on the transmitter side.
Receiver operating modes
The VT-STAR receiver has two modes of operation: continuous mode and data acquisition mode. In the continuous mode, the receiver DSP operates in real-time, performing full space-time demodulation and decoding and sending relevant physical layer parameters to the host PC via the RTDX.
This mode is used to demonstrate the capabilities of spacetime coding and to study the interactions between spacetime decoding, timing and phase recovery, and channel estimation. In the data acquisition mode, it collects raw data into buffers and dumps the buffer contents into the host PC hard drive for postprocessing in Matlab. This mode is used to characterize indoor MIMO channel in terms of spatial and temporal characteristics, achievable throughput and link reliability. These two modes are discussed in the following subsections.
Real-time mode
This mode of operation is supported by several functional blocks: matched filtering, differential decoding, bit and block synchronization, RX combining, channel estimation, and RTDX. The raw in-phase and quadrature samples, collected from the ADC FIFO, are first processed by a squared-rootraised-cosine (SRRC) matched filter with rolloff factor 0.35 (the same filter specs at the transmitter side). These filters were implemented in hand-coded assembly for speed optimization. These filtered samples are then differentially demodulated which undergo bit and block synchronization. AGC is performed with a first-order IIR filter on the differentially demodulated symbols to estimate average gain on each channel (antenna element). Note that the AGC is performed per channel in order to compensate for chain mismatch and obtain a nominal signal level at the ADC output. Since the AGC amplifies (or attenuates) the sum of signal and noise, it does not change the SNR and thus the combining procedure is not affected by the mechanism.
Block synchronization module finds the "borders" of the ST block such that the differential demodulation process can be performed correctly. The bit synchronizer determines which sample (out of 3 samples per symbol) is the best instant and decimates the signal accordingly. Both synchronization modules are based on correlation processing of known (training) sequences that are transmitted periodically. ML detection is performed by finding the constellation point that is the closest (in terms of Euclidean distance) to the decision statistics after decoding and combining.
Channel estimation is based on generating an estimate of the transmitted symbols (ŝ 1 ,ŝ 2 ) by performing M mapping, . These estimates are achieved through the following linear processing:
where r j t r j t+T for j=1,2 denotes the 4 consecutive samples at the matched filter output.
It is important to note that the channel estimation process takes place in order to allow for monitoring of the MIMO channel. The channel estimation algorithm is based on a "decision-directed" mode, and is operated only when the error rate is below an acceptable level to avoid the error propagation problem. Figure 6 shows a flow diagram of the receiver software implemented on the C67 DSP, where the different operations described above are mapped into software. The software shown in Figure 5 was profiled, yielding the cycle counts shown in Figure 7 . Profiling on the receiver DSP operated on instruction cycles with duration of 7.5 nanoseconds. The cycle count for matched filtering operation is different from the pulse-shaping operations in Figure 4 because the matched filtering at the receiver includes additional tasks such as rearranging the filter output and decimation. Following the filtering operation, the next most computationally intensive operation is the maximum likelihood (ML) detection, while the remaining STBC operations consume a small number of clock cycles.
Communications between DSP receiver and host PC
Communications between the host computer and the TI C67 EVM are performed through the EVM's real-time data exchange (RTDX) capabilities. RTDX facilitates bidirectional real-time transfer of data between the host PC and the target TI C67 EVM through the JTAG interface such that the target application is almost not affected [24] . A communications protocol over the RTDX link was implemented to guarantee that no buffer overflows occurred in the data transfer. Acknowledgement from the host PC to the C67 DSP is received asynchronously since the host may require a lengthy amount of time to display the received information and process the reply.
Matlab display
The data collected by the host PC is passed to the Matlab environment for postprocessing and display. A sample of the telemetry data sent from the receiver processor is shown in Figure 8 , including constellation diagrams at the matched filter output before and after the decimation process (oversampling factor = 3), AGC curves for estimated received signal power in dBm, fading profiles of the MIMO channel, diversity gain curves, and bit error rate (BER) measurements. This sample was collected from the target DSP by using synthetic data at the input to the DSP. It validates the real-time processing at the DSP and the communication protocol with the host PC via RTDX.
Data acquisition mode
In order to collect and store snapshots of data, the C67 uses its RTDX utility to perform transfer of data from the target (DSP) to the host (PC) without affecting other real-time operations on the DSP as shown in Figure 9 .
The RTDX utility provides application protocol interface (API) commands to set up an RTDX channel between the DSP and the PC. Data collected in a buffer in the DSP is first passed to the target RTDX library in the form of messages consisting of a group of words. The target library then sends one message at a time to the host RTDX library by issuing low-priority message interrupt (MSGINT) during the idle cycles of the DSP. This ensures that no data is lost or overwritten during the transfer process. This transfer takes place over the JTAG interface. The debugger controls the host RTDX library such that the received messages at the host are stored in a log file.
The data acquisition buffer depth is set to collect snapshots of 1200 samples. Four words corresponding to in-phase and quadrature samples for antennas 1 and 2 are stored in the buffer at each hardware interrupt from the ADC. After filling up this buffer, this data is transferred to the RTDX target library in the form of messages consisting of 10 words. Before initiating this RTDX transfer, the ADC interrupt is disabled hence making the DSP idle for the transfer to take place. The transfer is initiated by a software interrupt, generated every 1 millisecond period, which sets up the RTDX channel transfer. This period is sufficient to ensure complete transfer of 10 data samples or one message from the target to the host library using RTDX. The ADC hardware interrupt is reenabled after completion of data transfer of all the 1200 data samples in the buffer.
At the host, the received data is stored in a file of .rtd format. After collection of the specified amount of data, the debugger is used to halt the DSP. The .rtd log file is played back by the code composer utility and a C++ program, which uses the component object model (COM) interface provided by the code composer to convert this file into a binary format. This binary file contains all sample values received from the ADC, and is used for postprocessing by the Matlab software.
Postprocessing
The postprocessor operates on the raw samples by passing them via matched filter, removing residual frequency offsets and performing correlation processing to extract the channel fade coefficients. Once the channel matrix is obtained, it is embedded into the calculation of the channel capacity for various antenna configurations (i.e., single antenna system, transmit diversity, receive diversity, and MIMO channel).
MIMO CAPACITY MEASUREMENTS RESULTS
Some representative measurements for MIMO channel characterization performed in various indoor locations and the resulting capacity are presented in this section. Capacity in this case refers to throughput, normalized with respect to the bandwidth and is measured as bps/Hz. The measurements were carried out in three locations: 478 MPRG DSP Lab, 476 MPRG student cubicle area, and the Durham Hall 4th floor corridor as shown in floor plan in Figure 10 . The receiver was stationary while the transmitter was placed in different locations. The total numbers of measurements were twenty, eleven, and eight for the DSP Lab, cubicle area, and the corridor, respectively, where each measurement provided twenty MIMO channel estimates. DSP Lab and the student cubicle area provided non-line-of-sight (NLOS) propagation channel while the corridor measurements included both lineof-sight (LOS) and NLOS channels. For each measurement campaign, movement was minimized to ensure a quasistatic channel. Throughout the measurement campaign, calibration process took place to guarantee small frequency offset (at the order of 20 Hz). This residual frequency error was calculated and compensated in the postprocessing module prior to the channel estimation process, by phase adjustment of the symbols according to their position in the buffer.
To calculate the capacity of the MIMO channel, we use the key result of Foschini and Gans in [1] :
where H = {h i j } and H † is the transpose conjugate of H. Each element h i j refers to the channel gain from ith transmit antenna to the jth receive antenna. SNR is the signal-to-noise ratio at the jth receive branch and I nR is an n R × n R identity matrix. Figure 11a presents measured capacity at a particular location in NLOS environment (DSP Lab) for a fixed 20 dB SNR for MIMO system as well as each one of the SISO links (C h11 , C h12 , C h21 , and C h22 ). With the use of 2 × 2 antenna array configuration, twofold capacity increase is observed as compared to any one of the SISO channels. Such a twofold increase in normalized throughput will result in significant increase in data rate for a wideband system. Next, we compare the capacity of the MIMO channel with the capacity achieved by a single-input multipleoutput (SIMO) channel with either optimal combining (OC) or diversity selection (DS) criteria, multiple-input singleoutput (MISO) channel employing transmit diversity only and single-input single-output (SISO) channel. Figure 11b illustrates the measured complementary cumulative distribution functions (CCDFs) for these cases. Similar to the theoretical results of [1] , MIMO channel capacity outperforms receive (SIMO) or transmit diversity (MISO). The measured CCDF plot for MIMO has generally the same trend as shown in [11] . Note that, receive diversity outperforms transmit diversity due to the power splitting in MISO, and, within receive diversity schemes, optimal combining outperforms selection diversity.
To compare the empirical findings with theoretical ones, theoretical capacity results in the form of complementary cumulative distribution functions (CCDFs) are presented in Figure 12 . We observe that only 30% of channel realizations will achieve capacities comparable to those measured in our lab. The difference between the measured range of capacities in indoor environments and the theoretical performance in an ideal Rayleigh channel can be attributed to practical implementation issues such as imperfect channel estimation, frequency downconversion errors, A/D quantization noise, existence of finite correlation between the antenna elements among others. Thus, it is imperative to build hardware to perform measurements to assess the achievable capacity improvements in real-life propagation channels. The MIMO capacity evaluations from channel measurements for the three different indoor environments are summarized in Table 2 . Table 2 presents estimated average capacity for the single channel, optimum combining (the best among the transmit/receive diversity schemes), and MIMO. The values indicate the superior performance from MIMO systems in all environments even in LOS case. Note that although the LOS propagation causes the individual antenna elements to be more correlated than NLOS case, however, the presence of indoor scattering environment can provide sufficient decorrelation that enables MIMO system to perform better than SISO or SIMO in an NLOS scenario.
CONCLUSION
This paper presented the design and development of a MIMO system prototype capable of performing multiple tasks through the modification of software. We presented an overview of the VT-STAR platform which implements both D-STBC and MIMO channel measurements. The transmitter and receiver sections of VT-STAR were examined in detail, outlining some of the challenges and design issues that needed to be resolved in the development of this prototype. The implementation of the D-STBC algorithm has verified that the algorithm is robust to arbitrary phase errors and to frequency mismatch of 1 KHz of the local oscillators at the receiver. The D-STBC algorithm, which was designed originally for quasistatic environments, works well in slow time-varying environments (e.g., indoor wireless communications). Capacity improvements were observed through the use of MIMO technology. VT-STAR has an open SDR 
