Abstract-When developing gigabit speed communication systems, a lot of design and implementation challenges emerge. And prototype is important to assess the overall system performance and complexity trade-offs under real-world conditions. In this contribution, prototype design and implementation of a time division-fourth generation (TD-4G) system-time division duplex-gigabit speed (TDD-GBPS) demonstration system is presented. TDD-GBPS is a multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) system, capable of transmitting at a peak data rate of 1 Gbit/s over 100 MHz bandwidth. Aspects of TDD-GBPS system architecture, key physical layer transmission technologies and demonstration system hardware implementation are all covered. Testbed results of TDD-GBPS system are also provided in the paper to validate the system performance.
I. Introduction
Due to the scarcity of the spectrum resource, anticipated future wireless communication systems will require high spectral efficiency capable technologies. And time-division duplex (TDD) is a promising choice to meet this requirement for its superiority to take full advantage of the unpaired spectrum [1] . TDD wireless communication systems can dynamically allocate the radio resource between uplink and downlink via making adjustment to the proportion of uplink/downlink time slot. This means that TDD systems have the inherence to support asymmetric services, e.g., mobile Internet and web page viewing. In addition, sophisticated signal preprocessing schemes, e.g., joint transmission, smart antenna and beamforming can be exploited in TDD systems, utilizing the channel reciprocity.
Currently, more and more attention has been paid to the TDD-based wireless communication systems. Time divisionsynchronous code division multiple access (TD-SCDMA) was proposed by China academy of telecommunications technology (CATT) and has been accepted as one of 3G standards. TD-SCDMA has adopted various advanced technologies, such as smart antenna, joint detection, software defined radio, and deployed by China Mobile Communications Corporation since 2008.
The evolution of TD-SCDMA arises from the increasing demands of multimedia-based applications which the 2 Mbit/s peak data rate of TD-SCDMA failed to meet. And the evolution can be categorized into four phases [2] , i.e., high speed uplink/downlink packet access (HSxPA), time division-long term evolution (TD-LTE), time division-beyond 3G (TD-B3G) and time division-fourth generation (TD-4G). The predictable trend is that TD-SCDMA will evolve smoothly to TD-4G, aiming for higher data rate, lower latency and higher network efficiency. Tremendous of technology innovations are the driving force behind the evolution of TD-SCDMA. Enhanced with technologies such as adaptive modulation and coding (AMC), hybrid automatic repeat request (HARQ), orthogonal frequency division multiplexing (OFDM) and multiple input and multiple output (MIMO), the evolution of TD-SCDMA is very promising. And 100 Mbit/s peak data rate within a bandwidth of 20 MHz has already been realized in some B3G outfield testing systems [3] . Driven by the rapid growth of mobile applications, e.g., video on demand and streaming media, 4G wireless communication systems are referred to as IP-based, mobile ultrabroadband (gigabit speed) access and multi-carrier transmission networks. In the existing LTE specification Release 9, 1 Gbit/s downlink peak data rate and 500 Mbit/s uplink peak data rate has been proposed to meet the requirements of 4G systems [4] . However, it is indeed challenging for wireless transmission technologies in gigabit speed systems. And prototype implementations of such systems have become an excellent tool to identify the system performance and assess the trade-offs. From 2007 to 2009, we have developed a gigabit speed wireless communication demonstration system-TDD-GBPS by the National High Technology Research and Development Program 863. The main physical layer transmission characteristics of TDD-GBPS system is described as:
• Peak data transmission rate in excess of 1Gbit/s; • Spectrum efficiency is up to 10 bit/s/Hz; • Bit error rate (BER) under 10 −6 at SNR around 15 dB.
The remainder of the paper is structured as follows: Section II depicts the TDD-GBPS system design. The key radio transmission technologies are discussed in Section III. The hardware implementation issues of TDD-GBPS are addressed in Section IV. Finally, a conclusion is drawn in Section V.
II. TDD-GBPS System Design
The TDD-GBPS system is defined under a bandwidth of 100 MHz with N=2048 tones and a cyclic-prefix (CP) length of 256 samples. In order to support spatial multiplexing, the antenna configuration of TDD-GBPS system is set up as 4 ×6. The main parameters of TDD-GBPS system is listed in Table  I .
A. TDD-GBPS MIMO-OFDM System Model
A generic block diagram of TDD-GBPS uplink is given in Fig. 1 . The data flow is first coded under the low density parity check code (LDPC) scheme with a coding rate of 3/4. Then the encoded bits are scrambled, punctured, interleaved bitwise and divided into N tx =4 data streams for different transmit antennas. Each of these data streams is then mapped to 16-QAM constellation and fed into OFDM modulator independently. After the insertion of CP, which is larger than the expected maximum excess delay of the channel, and digital to analog (D/A) conversion, the resulting signals from different transmit antennas are sent through the radio channel.
The channel between each transmitter and receiver pair is assumed to be block Rayleigh-fading and remains constant during an OFDM symbol. Signals from different transmit antennas are received along with noise and interference and the corresponding digital baseband receiver is also shown in Fig. 1 . Automatic gain control (AGC) operation is performed to restrict the signal power and frames can be detected by synchronization. The frequency domain symbol on the j-th receive antenna can be obtained after removal of the CP and OFDM demodulation.
The channel frequency response (CFR) can be acquired for each data carrying tone and with the help of channel estimation, a MIMO processing unit extracts soft-information for LDPC decoding. Once the received signals for each transmit antennas are detected, corresponding reverse operations of deinterleaving, de-puncturing, de-scrambling and decoding are performed.
B. Radio Frame Structure
Backward compatibility with the existing 3G standards should be taken into consideration, especially in the frame structure design [3] . And the parameters and characteristics of the radio frame adopted in TDD-GBPS system are listed as:
• The duration of one radio frame is 5 ms with a sampling rate of 122.88 MHz. A peak data rate of 1Gbit/s has been achieved over 100MHz bandwidth where the spectrum efficiency is up to 10 bit/s/Hz; • The guard time between uplink and downlink is 20 us-a cellular radius as large as 3 km can be supported.
• Regarding the asymmetric tendency of future services, lengths for downlink and uplink time slots (TSs) are unequal and the ratio between the uplink and downlink can be flexible. From Fig. 2 , it is seen that the synchronization symbols are placed in the beginning of each radio frame to perform timing and frequency synchronization for both downlink and uplink. The remaining eight data TSs are utilized for data transmission, where the first seven TSs are reserved for the downlink transmission whereas the last TS is reserved for uplink. In addition, two block-type pilot training symbols are inserted in each TS. And in order to support MIMO channel estimation, a frequency division multiplexing (FDM) pilot scheme is adopted among different transmit antennas. Pilot tones are equidistantly inserted in the available bandwidth as shown in Fig. 3 . 
III. Key radio transmission technologies
The focus of this section is on the individual algorithms employed in the most critical receiver components. To this end, the synchronization, the channel estimation, the MIMO detection and the LDPC decoding are discussed. The timing and frequency synchronization is implemented both in uplink and downlink of TDD-GBPS system. Frame and symbol synchronization can be accomplished by the correlation operation between the local training sequence and the sampled receiving signal [5] , [6] . To combat the long capture time and great computation of long training sequence, a novel synchronization algorithm is proposed. It utilizes repeated training sequences and the character of fast Fourier transform (FFT) to calculate the timing matric. Compared to the conventional methods based on sliding correlation and matched filter, the proposed algorithm can achieve better timing capture performance with much lower complexity and hence is suitable for hardware implementation.
A. Synchronization
The signal processing of the synchronization module at the receiver is shown in Fig. 4 and the performance simulation is shown in Fig. 5 . It can been seen than the correct capture probability of the proposed synchronization algorithm which utilizes the multiple antenna combining technology is dramatically improved. Once the system gets started, signal from the receive antennas is first fed into AGC for the gain processing. When the output of AGC is stable, the signal passes through the low-pass filter. Then the timing acquisition module gets the coarse timing position. And the fractional carrier frequency offset (CFO) estimation module estimates the fractional frequency offset by the training sequence. Then the fractional CFO compensation module compensates the offset for the output signal from the low-pass filter. After the fractional CFO compensation, the integer CFO estimation module is activated and outputs the result for the subsequent modules. Meanwhile, the timing tracking module keeps track of the frame drifting to guarantee the optimal timing position and synchronization state indication signal [7] . In most practical OFDM systems, some tones are turned off to avoid interference with adjacent system and to ease the implementation of spectral masking filter. In the presence of these nulled tones, also referred to as virtual tones (VTs), information of a portion of pilots cannot be acquired, and hence degrades the performance of channel estimation significantly.
B. Channel Estimation
From theoretical analysis [8] - [10] , one can see clearly that the energy leakage occurs in the presence of VTs. Especially, CFRs for activated tones close to VTs suffer from the leakage issue considerably. The phenomenon is hereby known as the "edge effect". On the one hand, significant gains can be achieved with sophisticated signal processing schemes; on the other hand, implementation complexity [11] of the algorithm should be taken into account.
Bearing the "edge effect" and complexity issues in mind, we propose an enhanced discrete Fourier transform based (DFT-based) channel estimation. The basic idea behind our proposed method is to re-satisfy the equidistant pilot spacing condition [12] , [13] via creating artificial CFRs for VTs. The algorithm steps are shown in Fig. 6 and the MSE performance curves are provided in Fig. 7 . Compared to the DFT-based scheme, the MSE performance of our scheme is greatly improved. In particular, our proposed channel estimator is efficient to combat the negative effects caused by VTs, yet without introducing extra computational complexity. The aforementioned energy compensation and partial refinement operations can be performed simply by utilizing the preestimated LS estimations, which in turn only a few memory units are required to store the coefficients. A more detailed analysis and discussion of this algorithm is provided in [8] .
C. MIMO Detection
MIMO technique is adopted to realize high spectrum efficiency wireless transmission. And ordering zero forcing -serial interference cancellation (OZF-SIC) is one of the traditional detection algorithms widely applied in vertical-Bell laboratories layered space-time (V-BLAST) system, which has better performance than linear detection algorithm but also costs more complexity. Traditional Golden ZF-SIC (traditional Golden for short) has been considered as the optimal OZF-SIC method in MIMO detection. However, iteration for matric pseudo-inverse greatly increases its complexity [14] . A fast Golden (FG) detection algorithm based on QR decomposition is presented and applied in TDD-GBPS system [15] . This algorithm avoids the iteration for pseudo-inverse required by traditional Golden ZF-SIC with a novel sorting scheme. In addition, a more cost-efficient sorting scheme and Schmidt orthogonalization detection with much lower complexity are proposed. Compared to traditional Golden method, the performance loss of the proposed scheme is marginal. The LDPC decoding structure is shown in Fig. 8 . It is seen that exit information is fed into LDPC variable node detection (VND) and check node detection (CND). The LDPC decoder adopted in TDD-GPBS system is based on exit information analysis with novel iteration scheme. Compared to traditional iterative LDPC design, the novel scheme transfers an nonlinear programming problem easily into a linear one which is especially suitable for hardware implementation. And uniformly most powerful belief propagation based (UMP BPbased) decoding algorithm is adopted. UMP BP-based is a simplified scheme of log likelihood ratio (LLR) BP decoding where only minimum and sign operations are performed to the information transferred from check node to variable node. A more detailed discussion and analysis of this topic is provided in [16] . 
D. LDPC Decoding

IV. TDD-GBPS Hardware Implementation
In this section, we consider the architecture and implementation of TDD-GBPS radio transmission demonstration system.
A. TDD-GBPS Testbed
After three years of design and research, TDD-GBPS demo system has been developed by wireless technology innovation institute (WTI) of BUPT. The TDD-GBPS hardware testbed consists of the RF front-end, transmitters, multiple-antenna receiving units (MARUs), baseband receiving units (BRUs), MAC units (MACUs), gigabit switches, servers and backplane network. The transmitters accomplish all the baseband transmission operations mentioned in section II. MARUs perform the synchronization, OFDM demodulation and channel estimation operations. BRUs carry out the MIMO detection, softinformation extraction and LDPC decoding. With MACUs and gigabit switches, multiple-service interfaces are provided and the TDD-GBPS system is connected to the IP network.
For the FPGA-based realization, TDD-GBPS system physical layer algorithms are implemented on Xilinx Virtex-5 FPGA chips. Twenty five Virtex-5 SX95T chips in all are utilized and the detailed occupation is shown in Table II .
B. TDD-GBPS Demo System
The topology diagram of TDD-GBPS system is shown in Fig. 9 . As can be seen, the TDD-GBPS demo system can be divided into four parts, namely, mobile equipment (ME), access point (AP), control unit (CU) and demonstration terminal (DT). ME consists of mobile terminal (MT) and terminal equipment (TE), and is connected with Gigabit network cable. APs communicate with ME through radio transmission channels and connect to CUs for link-level performance surveillance. In the meantime, APs can support multicast services and display on DTs through Gigabit Ethernet. The real TDD-GBPS demo system testbed is shown in Fig.  10 . And the radio transmission link-level performance is under surveillance by CU as shown in Fig. 11 . The relativity of MIMO channels, constellation, high-definition video data rate and BER are shown on the monitor. Note that the BER of TDD-GBPS systems is kept under 10 −6 and the 16-QAM constellation is very distinct.
V. Conclusions
In this paper, prototype design and testbed development of TDD-GBPS wireless communication system are presented. Promising broadband radio transmission technologies are discussed in both aspects of algorithm performance and implementation complexity. The implementation of TDD-GBPS demonstration system allows to evaluate the system performance and hardware implementation trade-offs under realworld conditions. Given testbed results of the system, it is found that a considerable performance improvement can be achieved via careful design of hardware-efficient algorithms and architecture optimizations.
