Abstract-In current ultra-wideband (UWB) baseband synchronizer approaches, the parallel architecture is used to achieve over 500 MSamples/s throughput requirement. Therefore achieving low power and less area becomes the challenge of UWB baseband design. In this paper, a low-complexity synchronizer combining data-partition-based correlation algorithms and dynamic-threshold design is proposed for orthogonal frequency division multiplexing based UWB system. It provides a methodology to reduce design complexity with an acceptable performance loss. Based on the data-partition algorithms, both single auto-correlator and moving-average-free matched filter are developed with 528 Msample/s throughput for the 480 Mb/s UWB design. Simulation results show the synchronization loss can be limited to 0.8-dB signal-to-noise ratio for 8% system packet-error rate.
I. INTRODUCTION
O RTHOGONAL frequency division multiplexing (OFDM) based ultra-wideband (UWB) technology has received attention owing to the provided 480 Mb/s high data rate and below 323 mW power requirement [1] . In baseband receiver, the timing and frequency synchronizer is used to detect the incoming packet and solve the carrier frequency offset (CFO) which is expected as 20 ppm for UWB [2] - [6] , [10] . In the WLAN system, existing synchronizers use the matched filter (MF) and the fast-Fourier-transform (FFT) symbols for accurate timing detection and fine CFO estimation [3] - [7] . However, the moving-average circuit of MF and registers storing FFT symbol will consume large power, i.e. 110 mW in [3] . As the system migrates to UWB, parallel architecture is exploited. References [8] and [9] use 20 and 128 parallel MF to detect the symbol timing in 10-and 2-GHz sampling rates respectively. Thus, achieving low power becomes the main concern in designing UWB baseband synchronizer [8] .
To achieve a power-efficient synchronizer for OFDM-based UWB system, a novel low-complexity scheme combining a data-partition and dynamic-threshold design is proposed. The data-partition method can reduce the used data amount for synchronization (Sync), thus the register-access amount and moving-average complexity can be reduced. The dynamic-threshold design can adapt the threshold value of timing detection to the channel condition, thus enhancing the Sync performance. Simulation result shows the performance loss of the proposed design with 75% register reduction can be limited to 0.8-dB signal-to-noise ratio (SNR) for 8% system packet-error rate (PER). This paper is organized as follows. System block diagram of UWB baseband receiver is described in Section II. The proposed low-complexity scheme is described in Section III. Simulated results are shown in Section IV. The proposed architecture and implementation result are described in Section V.
II. SYSTEM BLOCK DIAGRAM Fig. 1 shows the system block diagram of the UWB baseband receiver. And system parameters are listed in Table I [12] . In the receiver, after the automatic gain control (AGC) adjusts the RF gain the proposed synchronizer begins to detect the incoming packet. physical layer convergence protocol (PLCP) preamble transmitted in the initial of each packet can be used for Sync. The structure of PLCP preamble defined in [10] is shown in Fig. 2 . The preamble comprises 21 packet sequences (PS), three frame sequences (FS), and six channel-estimation sequences (CES). In the preamble the proposed design can sequentially finish packet detection (PD), CFO estimation, FFT-window detection (FWD), and preamble-timing detection (PTD). After the synchronizer, the received signal is sent through FFT, channel equalizer, the de-quadrature phase shift key (QPSK), the forward error control (FEC) decoder, and de-scrambler, and then the data are sent to medium access control (MAC).
III. ALGORITHM DESIGN

A. Data-Partition-Based Auto-Correlation
In order to detect the repeated PS of the incoming preamble and estimate the CFO from the linear phase rotation caused by CFO, the auto-correlation (AC) can be used in the preamble- based OFDM system [2] - [6] . The algorithm used in the existing approaches can be derived as (1) where is the sample amount of a repeated symbol, and is the received sample in the th cycle of the th repeated symbol. In the UWB system, the preamble comprises the repeated OFDM symbols and each of which has 165 samples [10] . So is equal to 165 in the OFDM system with 128-point FFT symbol and 37-sample guard-interval. And the 165 samples will be stored and multiplied in (1) . To reduce the multiplications, a data-partition-based AC algorithm is proposed and derived as (2) where is the reduction factor and are the used samples. In the proposed algorithm, input data are partitioned into groups, and only one group of data is used. Thus the multiplications can be reduced to . And the registers for storing the input samples can be also reduced. The AC output power can be used to detect valid packet. The algorithm of PD can be derived as (3) where is the AC output power, is a pre-defined threshold value, and is the sum of signal power of th OFDM symbol. Fig. 3 shows the examples of normalized AC power of the received signal in a high SNR condition of an AWGN channel (better channel) and a low SNR condition of an indoor multipath channel for UWB system (worse channel) [11] . The correct preamble is set to begin in 0 ns. Before 0 ns only the noise comes. And the normalized AC power of received noise may become higher as is increased. That means the larger value will cause the false-alarm of PD more easily. So it's important to find a value to simultaneously keep Sync performance and reduce design complexity. The AC can be also used for CFO estimation [3] - [7] . The CFO estimation can be derived as (4) where is the estimated CFO, is the sample amount of an OFDM symbol, is the sample period, and is the AC result. After CFO estimation, the phase rotation caused by CFO can be compensated, and FWD can begin without CFO distortion.
B. Moving-Average-Free MF
For correct FWD, the MF can be used [4] , [5] . The algorithm used in existing approaches can be derived as (5) where is the sample amount of an OFDM symbol, is the FWD timing from 0 to , is the received sample after CFO compensation, and is the coefficient of the MF. The conventional MF in (5) where the used received samples are fixed as , and the MF coefficients are still . Since the proposed algorithm can only use fixed N received samples to calculate all outputs of the MF, the moving-average design is not needed. Moreover the computation of the moving-average-free MF can be still reduced by the data-partition method. Finally, the proposed MF algorithm can be derived as (7) where the index is the reduction factor as in (2) . As (2), multiplications and stored samples of (7) can be reduced to of the original amounts. The filter taps can be also reduced. The MF output power can be used for FWD. The timing when MF peak power appears can be derived as (8) where is the timing with peak power and is the MF output power. Fig. 4 shows the MF power of the received preamble in the channel conditions which is the same as in Fig. 3 . As shown in Fig. 4 , the correct FFT-window (FW) boundary is set to 0 ns. As is increased, the highest peak of MF output power will not only appear in the FW boundary (0 ns). For solving the problem the sub-optimal timing location algorithm can be used [5] . And the FW boundary can be detected as the timing of the earliest searched MF peaks. As shown in Fig. 4 when is equal to 4, the correct FW boundary (0 ns) is the timing of the earliest one of 2 highest peaks. In this case we can search 2 MF peaks and detect the FW boundary on the earliest peak. The sub-optimal timing location algorithm can help to adjust the FWD result according to the chosen value.
C. Dynamic-Threshold Design
After FWD, the synchronizer can start the PTD to find the boundary between PS and FS of the preamble. Since the FS is the sign-inversed signal of PS [10] , we can use sum of two continuous AC results to detect the timing. The algorithm of PTD can be derived as (9) where is the AC result of th and th OFDM symbol, is a threshold value, and is the sum of signal power of the mth OFDM symbol. If the th OFDM symbol belongs to PS and th OFDM symbol belongs to FS, the sign-inversed characteristic will let be sign-inversed of . Thus will become smaller than the product of threshold and sum of the signal power. For accurate PTD, a dynamic-threshold design, which adapts value to the channel condition, is proposed. The adapted threshold can be derived as (10) where is a fixed ratio to shift the level of to perform accurate PTD, and the threshold value can be updated according to AC result and sum of signal power and . Simulation result shows the proposed dynamic threshold design can achieve the lower FER and PER than those fixed threshold designs.
IV. SIMULATION ANALYSIS
System PER and FER of the proposed design is shown in this section. The simulation environment mainly comprises additive white Guassian noise (AWGN), CFO effect, SCO effect, and the indoor multipath channel [11] with typical 5 ns RMS delay spread for 480 Mb/s UWB system. The CFO and SCO between transmitter and receiver design are both set as 40 ppm .
A. PER Analysis of Data-Partition-Based Design
As shown in Fig. 5 , system PER of the proposed low-complexity scheme with different reduction factor is simulated and compared with perfect Sync ( and ) in 480 Mb/s data rate mode. Compared with the perfect Sync, the SNR loss for typical 8% PER is 0.14, 0.15, 0.3, and 3.1 dB of , 2, 4, and 8. The design with is not efficient to achieve 8% PER. The PER curves of , 2, 4 are very close to each other, and the SNR loss becomes obviously higher when is 8. Fig. 6 shows the FER of the proposed dynamic-threshold design compared with fixed-threshold designs. The designs with and 0.1 can respectively achieve the low FER in 0-dB and 2 6-dB SNR regions. However they can't achieve the lowest FER in all SNR regions. The proposed dynamic-threshold design can achieve the lowest SNR regions because of the adapted threshold tuning. Fig. 7 shows the PER of the proposed dynamic-threshold design in 120 Mb/s data rate. Since the proposed design can achieve the lowest FER, it can achieve lower 0.13-dB 2.33-dB SNR for 8% PER compared with the fixed-threshold designs. 
B. FER and PER Analysis of Dynamic-Threshold Design
V. ARCHITECTURE DESIGN
In order to efficiently achieve 528 Msamples/s throughput for UWB specifications, the synchronizer is designed with four parallel signal paths at 132-MHz clock frequency. The architecture of the proposed auto-correlator with is shown in Fig. 9 . Since the computation of AC can be reduced to quarter in (2), only one auto-correlator is needed instead of the parallel four auto-correlators. And the stored sample amount for the auto-correlator can be also reduced to . Architecture of the proposed MF is shown in Fig. 10 . Based on (7), the needed tap number of MF is reduced from N to . In [10] , the preamble has the constant magnitude and varied sign values. So the MF can be realized with addition/subtraction design instead of the signed multipliers. And like the auto-correlator, the proposed moving-average-free MF also needs to store 41 samples. So the registers for storing 41 samples can be shared by the auto-correlator and MF. Based on the proposed low-complexity scheme, the synchronizer can be realized with a single auto-correlator, the quarter-tap moving-average-free MF, and the quarter-size registers. Table II lists the hardware comparison of the proposed design and a parallel approach with 4 parallel auto-correlators, 4 parallel 165-tap MF, and 165-sample registers. The power comparison is based on the post-layout simulation in 528 MSamples/s throughput and the standard 0.18-m CMOS process. Besides the auto-correlator, MF, and registers, the synchronizer designs also contain CFO compensators which are realized by complex multipliers to compensate the phasor error. With the reduced auto-correlator, MF, and registers, the proposed design Table III lists the chip testing summary and Fig. 11 shows the chip microphoto. Designed in 0.18-m CMOS process, the proposed synchronizer consumes 33 mW for 480 Mb/s data rate and 528 Msamples/s throughput. It occupies 20.4% of the OFDM receiver (RX) power. The proposed low-power scheme reduces 26.7% of OFDM receiver power when compared with the parallel approach.
VI. CONCLUSION After algorithm introduction, performance analysis, and architecture design, a low-complexity synchronizer is proposed for OFDM-based UWB baseband processor. Combining datapartitioning and dynamic-threshold schemes, the proposed design can achieve 528 Msamples/s throughput to meet 120 480 Mb/s data rates in 0.18-m CMOS process. It needs 37.6% gate count and consumes only 43.3% power of the parallel approach. 
