I. INTRODUCTION
Orthogonal Frequency Division Multiplexing (OFDM) modulation has been considered for many wired and wireless broadband applications. Well-known advantages of OFDM systems are spectral efficiency, simple equalization, ISI (InterSymbol Interference) reduction in multipath channels, and immunity to impulse noise. Unfortunately, OFDM systems are extremely sensitive to time and frequency synchronization errors [1] - [2] .
In packed-based applications, the initial frame information provides a synchronization accurate enough to perform a correct demodulation of the pay-load. The initial synchronization requires computational intensive algorithms that consume a significant amount of silicon area. Most of the existing algorithms need an ad-hoc hardware which is only used during the synchronization phase, remaining inactive during the pay-load demodulation [3] . This paper proposes a new architecture which simultaneously provides frequency and time offset estimations using the module and phase of the carriers in the received header. Both module and phase are obtained using a CORDIC (COordinate Rotation DIgital Computer) processor [4] .
With the proposed architecture no significant additional hardware is required. In fact, the largest cells involved in the synchronization process are the Fast Fourier Transform (FFT) processor (in charge of frequency-to-time domains transformation) and two CORDIC processors. Note that the FFT processor is also used during the pay-load demodulation, while CORDIC processors are usually employed in the constellation demapping and in the channel state information (CSI) estimation. This paper is structured as follows. Section II describes the fundamentals of the synchronization in OFDM packed-based applications.
Section III presents the proposed synchronization strategy, addressing the algorithms used to estimate frequency and time offsets. The proposed strategy requires CORDIC processors whose design is addressed in section IV. The complete architecture is detailed in section V. Finally, the optimum design of a CORDIC-based synchronizer is presented in section VI.
II. SYNCHRONIZATION IN OFDM

A. Synchronization errors
Synchronization between transmitter and receiver in OFDM modulation is critical in terms of system performances. Equation (1) shows the expression for the p-th transmitted OFDM symbol, s p,Tx (t) , where the cyclic prefix has been suppressed to simplify the expression. T is the OFDM symbol duration, N is the number of carriers, X p (k) is the transmitted constellation point carried by the k-th carrier (k = 0...N-1) of the p-th symbol, and f c,Tx is the signal center frequency [2] . 
The received signal is down-mixed to the base band using a receiver center frequency (f c,Rx ). The frequency demultiplexed constellation Y p (k) is obtained by means of a FFT algorithm, resulting as similar as possible to the transmitted X p (k).
A frequency mismatch (∆f c =f c,Tx -f c,Rx ) between transmitter and receiver center frequencies causes a lost of orthogonality leading to: a time domain rotation of the constellation, a reduction of the available signal amplitude and, the most important effect, Inter Carrier Interference (ICI). Frequency synchronization errors make OFDM systems orders of magnitude more sensitive to frequency offset than single carrier ones [5] .
The frequency mismatch ∆f c is commonly studied in terms of the frequency-offset (FO) as defined in (2), where ∆f is the sub-carrier separation, and T is the OFDM symbol duration. This frequency-offset FO can be also written in terms of intercarrier spacing, where r is an integer number, and FO' is a fractional number (|FO'|<0.5).
When a frequency-offset FO appears, the received constellation corresponding to the k-th carrier, Y p (k), can be expressed as (3) . The transmitted constellation X p (k) is modified by the discrete-time base-band equivalent channel transfer function H p (k) and by α FO (k), which models the constellation rotation, as well as the amplitude reduction due to frequency-offset. I ICI (k,FO) is the Inter-Carrier Interference due to lost of orthogonallity. An explicit expression of the ICI term is given in [5] .
A time-offset of TO samples in the FFT window at the receiver causes a phase rotation of the received constellation. The received constellation point within the OFDM symbol is rotated by a different angle that increases proportionally to the time-offset TO and the carrier location k. Then, the received constellation Y p (k) can be expressed as (4) where TO is the time-offset, X p (k) is the transmitted constellation and H p (k) is the channel transfer function. If TO is large enough, the samples which do not belong to the transmitted symbol interfere in the received symbol leading to an Inter-Symbol Interference term labeled as I ISI (k,TO) in (4).
B. Synchronization in packet-based OFDM applications
In packet-based applications the synchronization scheme must provide a fast synchronization since the information is coming too close to the packet preamble (see Fig. 1 ). Consequently, a large amount of hardware resources are normally required which are only used during the header processing, remaining inactive during the pay-load demodulation. The synchronization process includes several phases: during the first one, section A provides the packet detection, programmable gain amplifier (PGA) adjustment, and coarse timing. At the end of section A, a rough estimation of the OFDM symbol timing should be granted in order to avoid inter-symbol interference due to time-offset. To this end, an auto-correlation scheme is normally selected [6] . In the second phase, synchronization algorithms are in charge of estimating both fine frequency and time deviations using the section B. Finally, the C-field is normally reserved for channel estimation.
If the packet size is large enough, an additional tracking phase should be included in order to preserve the initial synchronization. In some applications a coarse frequency estimation (i.e., the estimation of r in expression (2)) is also required which can be carried out either using the B or C sections.
Well-known methods to estimate the frequency and time offsets in packed-based OFDM systems are computationally intensive, either in the time or frequency domain ( [5] , [7] - [11] ). Fig. 2 presents the functional diagram of the synchronization strategy [16] whose hardware implementation will be proposed in the following sections. It employs the reference information embedded in one symbol of the section B, Y B (k), to provide, at the same time, both, the Estimated FrequencyOffset (EFO) and the Estimated Time-Offset (ETO) which are carried out by the Frequency-Offset Estimator (FOE) and the Time-Offset Estimator (TOE), respectively.
III. SYNCHRONIZATION STRATEGY
A. Functional diagram
The main advantage of the proposed strategy is that the hardware resources used during the synchronization phase are reused in the demodulation phase. The reused cells are mainly the FFT processor, in charge of transforming the time domain sequence into frequency domain, and the CORDIC processor, usually included to convert cartesian coordinates in polar ones in the demapping operation. Since these cells are included in an OFDM system, the proposed synchronization scheme is implemented without any significant additional hardware. 
B. Frequency-offset Estimation algorithm
In [17] a fine frequency-offset estimator was proposed which operates in the frequency domain. In this section the method in [17] is improved to save hardware resources and to provide an EFO with a linear dependence with the input frequency offset.
The estimator in [17] takes advantage of the fact that, typically, a set of carriers in part B of the header (termed C Null ) are nulled in the transmitter whereas the rest contains a training sequence of known amplitude and phase. In fact, in part B of the header three out of each four carriers are nulled in IEEE802.11a and ETSI-Hiperlan/2 standards, and one out of each two in ETSI-Hiperman and IEEE802.16a standards.
The new frequency-offset estimator is presented in (5), where M F is a metric which depends on the frequency-offset FO, and K Foe is an application dependant constant that converts metric values in frequency-offset estimation.
{ } arct an ;
The imaginary part of M F contains the summation of the module of the carriers in C Null which were nulled in the transmitter, and whose value in the receiver is mainly due to the inter-carrier interference I ICI (k,FO) according to (3) . This term increases with the frequency-offset according to [5] .
The real part of M F contains the summation of the module of the no-null carriers that are attenuated by the frequencyoffset. Simulation results show that the complex number M F exhibits a linear increasing phase versus the frequency-offset FO.
Note that a time-offset (TO) does not affect the frequencyoffset estimation if the inter-symbol interference is avoided in the symbol Y B (k). To this end, the coarse time-offset estimation should precede the fine frequency-offset estimation, if necessary.
This new frequency-offset estimator is different to that reported in [17] , where the EFO was obtained by dividing the imaginary and real part of metric M F . Note that the proposed estimator does not require a divider and, unlike the original estimator in [17] , provides a linear dependence of the estimator with the input frequency offset.
The estimator performance is simulated in terms of its error variance, leading to expression (7) where snr is the input signal-to-noise ratio, and K is a fitting parameter that depends on the application (for instance, in ETSI-Hiperman with N=256 carriers, K=27.78dB, and in ETSI-Hiperlan/2 with N=64, K=22.72dB).
C. Time-offset Estimation algorithm For time-offset estimation, the algorithm of references [18] and [19] has been adapted to the header format of Fig. 1 . The Estimated Time-offset (ETO) is calculated using (8), where K Toe is an application dependant constant shown in (9) that depends on the total number of carriers N, the phase increments used to estimate the time-offset L, and the index difference between two consecutive no-null carriers ∆ (for most of the applications this distance is constant).
The metric M T is calculated according to (10) , where the sequences τ Rx (k) and τ Tx (k) are the received and the transmitted phase increments, respectively, between carriers k and k+∆, with k not belonging to the C Null .
In [18] the metric M T is shown to exhibit a linear dependence with the time-offset TO.
where:
;
The receiver sequence τ Rx (k) is obtained on-line according to (11) , whereas the transmitted sequence τ Tx (k) shown in (12) is computed off-line. θ Rx (k) and θ Tx (k) are the received and transmitted normalized (in the range ±1) phases of the synchronization symbol, respectively.
Statistical performances of this estimator can be found in reference [18] . The variance of the estimation error Var(e Eto ) is rewritten in expression (13) .
IV. PHASE AND MODULE ESTIMATIONS
A. CORDIC Algorithm
The CORDIC algorithm is a well-known method to perform various arithmetic operations using only elementary shift-andadd iterations [4] . In the backward circular rotation mode (described in Table I ), the CORDIC algorithm provides both module and phase estimations of given input coordinates {x 0 ,y 0 }. 
B. SNR due to the approximation error
The approximation error (due a finite number of iterations N Ite ), and the finite arithmetic representation reduce the accuracy of the CORDIC algorithm [20] - [22] . In this paper this accuracy reduction will be modeled as a quantization process (Fig. 3) . According to this model, the CORDIC output, both module M and phase θ , can be approximated by the ideal output plus an additive quantization noise, i.e., M = M + e Q,M , and θ = θ + e Q,θ. This paper is focused on the approximation error. Consequently, the quantization noise is assumed to be exclusively due to the finite number of iterations (N Ite ). Note that the number of iterations is an important design parameter, as it determines the CORDIC processor throughput in a serial architecture, or the initial latency in a pipelined one.
Like other quantization processes, a signal-to-noise ratio SNR Cordic can be defined as the ratio between the signal power, either module (M) or phase (θ), and the noise power due to, either the phase approximation error (e Q,θ ) or the module approximation error (e Q,M ). Comparing the SNR Cordic obtained for a given number of iterations N Ite to the SNR defined for a conventional quantization process, an equivalent number of bits can be independently derived for the module and phase estimations.
Assuming infinite precision arithmetic, the phase angle is approximated in [4] by a linear combination of N Ite microrotations controlled by a decision chain d(i), plus an approximation error due to the limited number of iterations. The approximation error is upper bounded by the last microrotation a(N Ite -1) and it is not correlated to the input signal. If N Ite is large enough, the arctan() function is approximated by its argument leading to a(N Ite -1)≈ 2 -(NIte-1) . It allows us to derive an estimated signal-to-noise ratio due to approximation error snr A , given in (14) , where the error and signal variances have been calculated assuming them to be uniformly distributed in the intervals ±2
-(NIte-1) and ±1, respectively. The resulting expression for the signal-to-noise ratio in dB is shown in (15) . 
Note that an uniformly distributed random variable in the interval [-1,+1] quantized with N Cordic bits yields a signal-tonoise ratio due to quantization error given by SNR Q ≈6 N Cordic dB. Therefore, assuming the quantization error in a CORDIC processor to be only determined by the number of iterations of the CORDIC algorithm and, according to (15) , a CORDIC processor with N Cordic equivalent bits requires N Ite ≈ N Cordic +1 iterations. This approximation will be used in the rest of the paper and verified by simulation, when convenient.
For the module estimation, the approximation error can be shown to be correlated to the input signal and no an explicit expression for the signal-to-noise ratio can be easily derived in this case. However, simulation results show that, for a given number of iterations, the signal-to-noise ratio obtained in the module estimation is larger than that obtained in the phase estimation, so that it is normally the phase estimation who determines the minimum number of iterations required for a given signal-to-noise ratio. Fig. 4 shows the architecture proposed to compute the estimated frequency-offset (EFO), and the estimated timeoffset (ETO) required by the synchronization strategy introduced in Section III. The architecture includes three CORDIC processors. In the upper path of Fig. 4 the ETO is obtained using expressions (8) to (12) . With only the no-null transmitted carriers, the Cordic-1 processor estimates θ Rx (k), the normalized phase of the synchronization symbol Y B (k). By a proper choice of N, L and Δ, the constant K Toe can be selected to be a power of two number so that the final product by K Toe does not require a multiplier
V. CORDIC-BASED ARCHITECURE
A. Architecture description
The lower path of Fig. 4 provides the EFO. Using every received carrier, the Cordic-2 processor provides an estimation of the module of the received symbol Y B (k).
The resulting output sequence is split in two different flows: those carriers belonging to the C Null set are accumulated to obtain the imaginary part of M F , while the rest of carriers provides the real part of M F , as shown in expression (6) . Finally, the Cordic-3 processor is responsible for providing an estimation of the phase of the complex value M F , which is finally scaled by the factor K Foe to obtain the EFO.
According to Table I , the module estimation in the CORDIC algorithm requires the CORDIC output to be scaled by the factor 1/A Ite , introducing a significant overhead when compared to the phase estimation. However, as this scaling factor A Ite only depends on the number of iterations N Ite-2 , the real and imaginary part of M F are scaled in the same way, so that the phase of M F is not affected by the value of A Ite . Therefore, the scaling operation inherent to module estimation in the CORDIC algorithm is not required in the proposed architecture.
B. Time-offset estimation path
In the architecture of Fig. 4 , the accuracy of Cordic-1 (determined by N Ite-1 ) in the angle estimation determines, in turn, the accuracy in the time-offset estimation, which is commonly measured in terms of the synchronization error probability (P E ) versus the input signal-to-noise ratio (SNR) in an AWGN channel.
According to the architecture of Fig. 4 , there are two error sources in the time-offset estimation: the estimation error due to the AWGN channel (e Eto ), whose power is expressed in (13) , and the quantization error (e QT ) due to the approximation error in the CORDIC processor. In a first order approach both error terms can be considered to be independent.
The approximation error in the angle estimation of Cordic-1 determines an increase of signal-to-noise ratio required to preserve a given error probability. This increment is labeled as d in (16) where snr Eto is the signal-to-noise ratio required to reach a given error probability P E with an ideal angle computation in an AWGN channel, and snr QT is the signal-tonoise ratio due only to the approximation error in a CORDICbased implementation of the time-offset estimator. d is defined as the signal-to-noise ratio degradation due to approximation error in the CORDIC processor. The relationship between the quantization error power in the time-offset estimation and the phase error power of the Cordic-1 processor is given in (17) , where K Toe and L are application dependant factors, and e Q,θ is the approximation error in the phase estimation of Cordic-1. 
From (16) and (17) a theoretical expression for the degradation d can be derived (18) , where SNR Cordic-1 is the signal-to-noise ratio due to the approximation error obtained when the Cordic-1 processor estimates the angle using N Cordic-1 equivalent bits. The time-offset estimation path has been modeled in Matlab using a fixed-point phase representation of N Cordic-1 bits. The measured synchronization error probability P E is depicted in Fig. 5 , confirming the results obtained with the proposed error model.
C. Frequency-offset estimation path
The precision of both, Cordic-2 in the module estimation, and Cordic-3 in the angle estimation, determines the accuracy in the frequency-offset estimation, commonly measured in terms of the estimation error variance Var(e Efo ).
1) Cordic-2 tuning
The non-linear relationship between the module approximation error, due to the limitation in the number of iterations in Cordic-2 processor, and the frequency-offset estimation error makes difficult to propose an analytical error model. Therefore, in this case, the number of iterations N Ite-2 has been selected by simulation.
The frequency-offset path of Fig. 4 has been simulated, including a model of the CORDIC processor which provides the module estimation with a configurable number of iterations N Ite-2 , using a generic reference symbol with N = 64 carriers and a simplified structure for the B part of the header X B (k) = {1,0,..1,0}.
Simulation results presented in Fig. 6 show the relationship existing between the estimated frequency-offset (EFO) and the input frequency-offset (FO) when no additional noise is added at the receiver input. The ideal case (N Ite-2 → ∞) is also shown for comparison purposes. From Fig. 6 it can be seen that only one iteration in the Cordic-2 processor (N Ite-2 = 1) would be enough to get a result similar to the ideal case. 
2) CORDIC-3 tuning
The Cordic-3 processor provides the angle estimation of the complex number M F as expressed in (5) . For the sake of simplicity, two independent error sources are now considered in the frequency-offset estimation: a) an uniformly distributed quantization error originated in the CORDIC angle approximation error, and b) the estimation error due the AWGN channel. Then, the total error power in the proposed CORDIC-based architecture is the sum of the error variance due to the AWGN, expressed in (7), plus the error variance due to the CORDIC-based implementation Var(e QF ).
Expression (19) gives the relationship between the angle estimation error variance in Cordic-3, Var(e Q,θ ), and the total frequency offset error variance Var(e QF ) in the CORDICbased implementation, where the constant K Foe is the application-dependant phase to frequency-offset scaling factor.
( ) 
The SNR degradation d is calculated from (7) and (19) in the same way as it was done in the previous section. The results are shown in Table III . A quantized phase representation of N Cordic-3 equivalent bits in a commonly working range of SNR Foe = [0-10] dB has been taken into account.
Once again, for a SNR degradation lower than 0.25, dB N Cordic-3 = 8 equivalent bits are required. As a matter of fact, in the selected SNR range, with N Cordic-3 = 8 bits the expected degradation obtained by simulation is lower than 0.1 dB.
The frequency-offset estimation path was modeled using the architecture of Fig. 4 with a quantized angle (N cordic-3 bits) . The simulations results presented in Fig. 8 show the error variance in the frequency-offset estimation versus SNR for different angle quantization levels. Simulation results confirm that the frequency-offset error obtained with an 8 bitquantized phase features very close to the ideal phase estimation. 
VI. CORDIC IMPLEMENTATION RESULTS
Sections IV and V have shown the minimum accuracy required to the CORDIC processors in Fig. 4 , in terms of equivalent number of bits, not to exceed a signal-to-noise degradation larger than 0.25 dB. As a result, the time-offset estimator (Cordic-1) requires N Cordic-1 = 6 bits in the phase approximation, whereas the frequency-offset estimator (Cordic-2 and Cordic-3) requires at least N Cordic-3 = 8 bits in the phase estimation. Regarding to the module approximation given by the Cordic-2 processor, simulation results have shown that N Ite-2 = 4 iterations are enough.
The proposed architecture has been simulated using a CORDIC processor model with a configurable number of iterations for Cordic-1 and Cordic-3 processors. Fig. 9 shows the time-offset synchronization error probability obtained when the Cordic-1 processor in the TOE path is configured with N Ite-1 = 5, 6 and 7 iterations, whereas Fig. 10 presents the frequency-offset estimation error variance when the Cordic-3 processor in the FOE path is configured with N Ite-3 = 7, 8 and 9 iterations. Selecting N Ite-1 = 7 and N Ite-3 = 8 the results achieved are very close to those obtained with an ideal phase computation.
To evaluate the complexity of the hardware required by the proposed architecture, different CORDIC processor were implemented using a pipelined architecture in a Xilinx-FPGA Virtex2, with a variable number of iteration in the range [5] [6] [7] [8] [9] . The CORDIC processor design rule of reference [23] has been used. In Table IV the synthesis results using Sinplicity v7.2 are presented in terms of Look-Up- Table ( LUT) and flip-flop (REG) usage, along with the estimation latency (T CLK is the system clock period). Additionally, the relative area occupancy using a XC2V1000 device is presented in parenthesis.
In Table IV the selected values for Cordic-1 and Cordic-3 processors are marked in a grey background. Regarding Cordic-2 processor, as stated before, 4 iterations are required, consuming less than 1% of the selected device. Finally, to reduce hardware consumption even further, since Cordic-1 estimates the phase of a set of carriers, and Cordic-3 provides one estimation at the end of the synchronization symbol, one single multiplexed CORDIC processor with 8 iterations can be used to implement both algorithms. 
VII. CONCLUSIONS
A new CORDIC-based architecture to simultaneously estimate time and frequency offsets in OFDM packet-based systems which processes the received signals in the frequency domain has been proposed. For the proposed architecture an error model has been derived in order to determine the required number of iterations of the CORDIC processors. A new approximation error model has been proposed, which provides a new signal-to-noise ratio expression that simplifies the selection of the required number of iterations of the CORDIC processors.
The frequency-offset and time-offset estimation paths were theoretically analyzed and simulated to derive the optimal design of CORDIC processor. As a result, the CORDIC-based architecture proposed in this paper for both, time and frequency offset estimations, only requires one 8-iteration CORDIC processor to estimate the phase, and one 4-iteration CORDIC processor to estimate the module.
