Abstract-This work addresses power reduction and performance improvement for wireless orthogonal frequency-division multiplexing (OFDM) systems using a dynamic sample-timing controller (DSTC) and phase-tunable clock generator (PTCG). The receiver, applying the proposed DSTC algorithm, searches for the optimal sampling phase at the symbol rate, instead of the Nyquist rate (or higher), to reduce the extra power consumed in high-rate operations. The proposed PTCG circuits provide the desired clock phase for optimum sampling to improve system performance. Both the DSTC and the PTCG are evaluated in a multibandt OFDM (MB-OFDM) ultra-wide-band system. Simulation results indicate that the overall system performance is improved by 1.7-dB signal-to-noise ratio at a packet error rate of 8% and the total baseband power is reduced by 40%.
I. INTRODUCTION
T RADEOFF between system performance and power dissipation is one of the most critical issues in the design of a wireless portable device. Timing synchronization plays an important role in ensuring good signal decoding performance, since it determines the sampling timing and frequency of the analog-to-digital converter (ADC) on incoming signals or packets. Existing design approaches apply multirate sampling (at Nyquist rate or higher than symbol rate [1] - [3] ) to the incoming waveform with a fixed high-rate clock source that drives an ADC circuit. Those high-rate sampled signals are then calculated by an interpolation algorithm [4] to yield a symbol-rate signal stream for data decoding. This design methodology to designing power-thirsty portable devices is facing increasing difficulty, because both the ADC circuits and the interpolation circuits are operated at a higher processing rate, resulting in higher power consumption.
To enable power reduction with symbol-rate sampling, both Mueller-Muller detection (MMD) [5] and MMD-based timing recovery methods [6] have been proposed under a pulse amplitude modulation (PAM) scheme for best sampling timing search within a sample period. The literature explores the timing synchronization issue in orthogonal frequency-division multiplexing (OFDM) systems based on the best block-boundary search for each fast Fourier transform (FFT) window [7] , [8] . However, those studies [7] , [8] do not guarantee that the signals in each block are sampled at the best sampling timing. Accordingly, multirate sampling schemes [1] , [2] have been developed to maintain system performance; hence the high-rate operations significantly increase power dissipation.
To maintain system performance and, in the meantime, to reduce power dissipation, this work presents a dynamic sampletiming control (DSTC) scheme for symbol-rate synchronization in OFDM systems, where the optimal sampling timing within a symbol-period interval can be calculated. Unlike multirate sampling methods [1] - [3] , this DSTC requires aided circuits in a clock source design to generate a phase-tunable clock waveform that corresponds to the best sampling instance as calculated by the DSTC. A digitally-controlled oscillator (DCO) design concept [9] is applied to the phase-tunable clock generator (PTCG) design to enable this symbol-rate DSTC [10] for low-power wireless applications.
The rest of this paper is organized as follows. Section II presents an overview of the proposed system. Section III then derives the proposed DSTC algorithm. Section IV shows the design of the proposed PTCG. Section V analyzes the system performance and the hardware design complexity of our proposal.
II. SYSTEM OVERVIEW
OFDM signals transformed by an -point discrete inverse Fourier transformation (IDFT) after digital-to-analog conversion (DAC) are expressed as (1) where is an information symbol stream with phase-shift keying (PSK) or quadrature amplitude modulation (QAM) encoded, and is the sample period. In up/down and analog/ digital data conversions, the signal suffers from any nonideal hardware distortion, including every filter response ( and ) from both the transmitter (TX) and the receiver (RX) sides. Therefore, down-converted signals in a receiver are given by where is a sampling phase offset fraction of the sample period, and is an impulse function. Once a packet has been detected, the DSTC is activated to provide commands to the PTCG to generate the optimal clock phase for signal sampling in the ADCs. Then, the signals follow the conventional decoding flows. Fig. 1 depicts the system diagrams and their operations.
III. DSTC
The goal of this algorithm is to determine a signal sampling instance with the sampling rate equal to the symbol rate, , where the intersymbol interference (ISI) associated with filter pulse responses is minimized. Hence, the optimum sampling instance is defined as (4) where is written in a simplified notation as and the ratio is the signal-to-ISI power ratio (SIR). Thus, the is determined when the minimum ISI power sum appears in the denominator of (4). In other words, the SIR of the sampled signals becomes maximized when the optimum sampling instance is chosen. Here, is replaced by a raised-cosine filter impulse response with a roll-off factor of 0.5 as shown in Fig. 2 . A noncalibrated sampling timing error may yield low signal-integrity data even in the absence of noise, implying there is system performance degradation when sampling time is not well-calculated.
To find the optimum sampling time , the ratio given by (4) cannot be calculated directly because both the and the are unknown to any receivers. Therefore, an alternative approach, the maximum absolute-squared sum equivalent to (4) and also hardware realizable, is examined. Accordingly, the absolute-squared-sum of the received signals is jx R;" [ 
where is the band-limited zero-mean additive noise sampled at timing offset with . Notably, is assumed to be independent of transmitted signals, and the expected value of the received signals is (6) where represents the power of the color noise . The absolute-function operation suppresses the CFO factor. Therefore, the expected received signal power is composed of the transmitted signals filtered by the and the band-limited noise power. The effects of on the transmitted signals are expressed as main signal taps and their filter interference . Moreover, the expected power may be assumed to be a constant, say unit power, because every received signal power is adjusted by applying an automatic gain control (AGC) mechanism, thus normalizing the signal power to the dynamic range of the ADC. For simplicity is defined. Equation (6) becomes (7) These information symbols are assumed to be independent, and then . Therefore, (7) reduces to (8) Consequently, the expected absolute-squared value of the received signals is determined by the power of both the filter response and AWGN. Based on the SIR definition, (8) is rewritten as (9) where is the interference power of the filter tail.
is defined as a characteristic function (CF) of the . A sharper CF curve is more easily recognized to calibrate the sampling timing errors. Fig. 3 plots a CF curve that corresponds to the raised-cosine filter of Fig. 2 in a noiseless channel. This finding reveals that the maximum implies the optimum sampling instance . Therefore, the search based on the SIR curve in (4) is transferred to the search of the maximum , i.e.,
Each sample period is planned to be divided into eight phases, as shown in Fig. 3 , for the finite hardware resolution and limited CF value degradation. Therefore, the optimal sampling timing from these eight positions always corresponds to a CF value that approaches the maximum value. The next section describes the design of an 8-phase clock generator.
IV. PTCG
An all-digital PTCG provides eight clock sampling candidates for phase selection, and outputs a specific one according to the calculated in (10) . This PTCG phase-tuning is achieved within a few cycles, and a clock output during this tuning period is glitch-free. Fig. 4 presents the proposed PTCG, which primarily consists of an all-digital pahse-locked loop (ADPLL), a TDC, and a cell-based delay line. Initially, the ADPLL is locked to the target frequency with the period . This generated clock is used as a reference source for multiphase clock generation.
In the earlier delay-locked loop (DLL)-based multiphase clock generation approach [11] , the TDC enables a delay line locked to a single clock period , giving a in each delay stage. In a high-speed cell-based DLL design, however, maintaining such a short delay and a high resolution simultaneously is difficult. Thus, in this design, the TDC measures three periods and makes the DLL lock to . After the DLL is locked, each delay stage presents a delay. Hence, the minimum delay constraint for each delay stage (D) is extended to three times its original value. Moreover, the numbers in the numerator and denominator of the delay fraction 3/8 are not divisible by each other. As a result, the generated phase after each delay cell presents a unique fraction of the period. Fig. 5 shows the proposed TDC design architecture. The TDC takes the input PLL528 from the ADPLL. From this PLL528, a PULSE_IN signal is internally generated with a pulsewidth of as the TDC delay line input. A flip-flop is inserted between each pair of delay elements in the delay line to latch data. The trigger event of all the flip-flops occurs at a PULSE_IN falling edge, and a latched data vector is encoded in a variable RANGE to the PTCG controller. According to the RANGE, the controller determines whether the periods of both PLL528 and PULSE_IN are correctly generated to avoid a false lock in this loop. Then, the phase detector (PD) of the PTCG continues fine tuning the delay of the delay elements to improve the accuracy of the output phase position.
An example is shown here. The delay between the PLL528 and P0 is . Therefore, the P0 phase shift to the PLL528 is . The clocks are generated accordingly. This PTCG takes the estimated timing error , represented by Forward or Backward, from the DSTC to select a proper clock phase for ADC sampling. To avoid glitches in CLK528, a Forward command is converted cyclically to several Backward commands by a glitch-free controller, say a phase rotator block.
V. SIMULATION AND MEASUREMENT RESULTS
The proposed DSTC and PTCG [10] are evaluated in a multiband OFDM (MB-OFDM)-based ultra-wide-band (UWB) system [12] with a low-density-parity-check (LDPC) code for error correction [13] . The signal bandwidth is 528 MHz with quadrature phase-shift keying (QPSK) and OFDM modulations, and the maximum data rate 480 Mbps is selected in the following simulations.
The dynamic timing recovery starts the search right after a packet is detected. Each packet is composed of 21 OFDM symbols at the beginning of each preamble frame (Packet Sync Seq), which is applied to the DSTC as shown in Fig. 6 . With those 21 identical OFDM symbols in the packet sync sequence, each of which gives an absolute-squared sum, and the sampling time is changed in the time slots between OFDM symbols. In other words, the PTCG changes its output clock phase only during the time slots associated with band transitions such that signals in each OFDM symbol are sampled with the same clock phase within an OFDM block period. Fig. 7 plots the overall system performance. The curve denoted in Fig. 7(a) represents the signal-to-noise ratio (SNR) required to reach a packet error rate (PER) of 8%, where whole packets are sampled at a fixed and identical sampling offset . When the DSTC algorithm is applied, the optimal sampling instance is sought during the preamble. Before the end of the preamble, the DSTC decides which timing instance is the best for sampling in terms of system performance. Since the DSTC is operated in a noisy environment, it does not always choose the best sampling instance. Consequently, the curve represents the probability of the final decision made by the DSTC. Therefore, the SNR of our proposed system required to reach is given by On the other hand, the system with the interpolation scheme takes two samples (pair sample) within each symbol period for timing synchronization. Although the signals from the interpolated pair-samples are noise-averaged, one of the pair samples always suffers from stronger ICI effects, leading to degrade the signal quality. Therefore, this interpolation-based approach does not outperform our proposal with signals sampled at the optimal instance. Moreover, the interpolation approaches in the existing literature does not support phase-tunable capability such that the probability function in this case can be regarded as a uniform distribution. Fig. 7(b) plots the system performances of the proposed DSTC-PTCG and the interpolation schemes. Fig. 8 shows both the simulated and measured waveforms from the PTCG design. This PTCG provides eight clock phases operating at 528 MHz, and each consecutive phase is separated by about 237 ps. As shown in Fig. 8(a) , the output CLK528 is initially aligned to P5. When a command Forward is asserted, the selected output clock phase from the multiplexer (PH_SEL) counts down to zero and cyclically rotates back to P7 and P6. As the targeted clock phase is reached, a phase ready signal (PH_RDY) is activated to denote that the clock is updated from a new phase. To further explain the conversion of the Forward into several Backwards commands, P5 is again assumed to be initially selected as the system clock (CLK528), and the value of PH_SEL changes at the rising edge of the system clock, say P5. If is directly updated to before the rise of P6, a glitch may occur. Conversely, a change in CLK528 from to can avoid this glitch problem, except for the duty cycle change of CLK528 in the phase change intervals. The waveform in Fig. 8(b) plots the phase and . The measured RMS and jitters are 30 s and 101 ps, respectively.
The resulting PTCG power is 10.9 mW [10] in the 0.13-m standard CMOS process. Table I presents both the performance and the power reduction in this work. The scheme herein offers an improvement of approximately 1.7-dB SNR over that of the interpolation method. In this MB-OFDM UWB system, the symbol rate is 528 MHz, and the interpolation scheme requires a sampling rate of 1056 MHz in the ADC circuits. The estimated power reduction is from 160 mW 2 to 70 mW 2 (for both I and Q paths) if the ADC circuits in [14] are taken into account. When the baseband processor power 31.2 mW [10] is included, this reduced sampling rate results in a baseband power saving of mW mW if the ADC [14] is calculated together. Note that the proposed symbol-rate synchronization method requires both the DSTC and PTCG circuits with power consumption of 1.9 and 10.9 mW, respectively. Fig. 9 presents a microphoto of this baseband test chip. 
VI. CONCLUSION
In this work, both the DSTC and the PTCG schemes are proposed to enable symbol-rate synchronization to reduce power consumption by preventing high-rate circuit operations. This proposal offers better signal sampling quality and enhances overall system performance compared to those interpolation-based solutions. In addition, this proposal has low design complexity with the low power feature, making it very suitable for realizing cost-effective OFDM-based wireless communications solutions.
