Abstract. Carrier synchronization is a vital part of every inner receiver in wireless communications. Frequency offset and phase offset of the received burst must be estimated and the received samples must be corrected according to the estimation. An approximation of the maximum likelihood estimation of frequency offset is given by the Fast Fourier Transform (FFT). The accuracy of the estimation as well as the hardware complexity and throughput depend on the number of FFT points. We demonstrate three techniques to improve the accuracy of the FFT based frequency offset estimation respectively to reduce the number of needed FFT points. These three techniques as well as the reference algorithm without improvements combine phase offset estimation with frequency offset estimation. We present the communications and FPGA implementation performance of these techniques.
Introduction
The transmission over a wireless channel results in timing, frequency and phase offsets. A sophisticated synchronization is mandatory, to circumvent the severe losses of communications performance by these offsets. In this paper we will focus on the frequency and phase estimation and correction (carrier synchronization) of bursts with BPSK/QPSK modulation. In our targeted system these bursts are given in the complex base band with one sample per symbol after timing synchronization is properly carried out. An approximation for the maximum likelihood estimation of frequency offset is the FFT, see e.g. (Mengali et al., 1997) . The implementation complexity as well as latency and throughput of the FFT depend on the number of points of the FFT. Usually zero padding with a factor of 2 or even 4 is used for the Correspondence to: U. Wasenmüller (wasenmueller@eit.uni-kl.de) FFT to reach the needed accuracy for the communications performance.
We analyze three different techniques for reducing the number of needed points for the FFT to reduce the implementation complexity while still obtaining the required communications performance. These techniques are extensions of a FFT based algorithm for the combined estimation of frequency and phase offset with the objective to improve the accuracy of the frequency offset estimation.
-The first technique is a smart technique of sample rate reduction.
-The second technique uses an interpolation technique between the bins of the FFT result.
-The third demonstrated technique uses a decision directed approach after a FFT based coarse synchronization.
Additionally all techniques also combine the estimation of frequency and phase offset, which makes a separate module for phase offset synchronization obsolete. We present the communications performance of these three techniques for different signal to noise ratios. We make heavely use of optimizez Xilinx IP cores. Hence implementation complexities and results are given based on Xilinx devices and the trade-off between communications performance and implementation performance is analyzed. The paper is structered as follows. In Sect. 2 our FFT based reference algorithm for the combined frequency and phase offset estimation is shortly described. In Sect. 3 the essential ideas of the three techniques for reducing the number of FFT points are presented. The additional computational effort for these techniques is derived. Sect. 4 describes the common hardware architecture and implementation details. Communications performance and implementation results are given in Sect. 5. Section 6 concludes the paper. 
Base carrier synchronization algorithm
We assume that timing synchronizaiton is properly carried out and the received sample sequence r is given with one sample per symbol. This symbol sequence r is given in the complex baseband according to Eq. (1):
The sample sequence r with L elements is based on MPSK symbols s with one sample per symbol; i.e. for BPSK M equals 2 and for QPSK M equals 4. The sequence r is disturbed by a noise sequence n, which represents an Additive White Gaussian Noise (AWGN). The frequency offset f o and phase offset are considered fixed during the transmission of a burst. The symbol duration is denoted by T and the frequency offset f o in Eq. 1 is given as a fraction of the symbol rate 1/T . The problem consists in estimating the frequency offsetf o and phase offset˜ . The received sample sequence r has to be corrected according to the estimation of frequency and phase offset.
In the following the base algorithm is shortly described according to (Brack et al., 2005) . For the estimation of frequency offset the effect of modulation by the symbol sequence s has to be removed. We restrict our investigations to the so called non data aided estimation, where the unknown parameters of frequency and phase offset are estimated with the only help of the unknown data symbols. In general, modulation removal is carried out with a power of M operation on the received sample sequence r as shown in Eq. 2:
In (Wasenmüller et al., 2004) we proposed an improved modulation removal scheme for MPSK based on investigations in (Wang et al., 2003) . This technique obtains much better communications performance. Modulation removal is performed as shown in Eq. 3:
with M being the modulation index, andr describing the determined received sequence without modulation. An approximation for maximum likelihood frequency estimation is given by application of the FFT algorithm to the sequencer. The FFT implements the Discrete Fourier Transformation formula with N points according to Eq. 4:
The estimated frequency offsetf o is given by spectral analysis of the FFT output. In general the FFT bin corresponding to the estimated frequency offset is given by:
Equation 5 yields the index of the FFT bin with the maximum amplitude; the frequency offsetf o is simply calculated as:
Each bin X(k) of the FFT represents a range of (M · N · T ) −1 Hz, not only one distinct frequency. It is obvious, that the accuracy of the FFT based frequency offset estimation depends on the number N of FFT points. The maximum frequency offset, which can be estimated with the described method, is given by:
This limit is given by the Nyquist criterion for the sequencer. If the range of the real occurring frequency offset is known in advance, we can significantly improve the communications performance by limiting the spectral analysis with a window. This windowing technique is given by Eq. (8):
The parameters w u and w l define the frequency range in which the frequency offset has to be found. Setting w u to N/2 − 1 and w l to N/2 respectively describes the FFT bin k f as given in Eq. (5).
For the estimation of the phase offset a sample sequence without frequency offset is needed and the effect of the modulation has to be removed again from the frequency corrected burst. As shown in (Brack et al., 2005 ) the phase offset can be estimated by the FFT bin with the index k f
Equation 9 uses the result of Eq. (4) for the bin k with k=k f . Thus frequency and phase estimation can be efficiently combined. Note that the estimated phase offset has a M times ambiguity, which must be resolved in later processing step. Finally the carrier synchronized sequence r c is calculated by:
Improvements
As shown in the previous section the resolution of the estimated frequency offset depends on the number of FFT points. The computational complexity for the FFT operation is proportional to N lg 2 N. Thus area and achievable throughput of a FFT building block are directly related to the number of FFT points. To get an efficient implementation the number of FFT points should be reduced while still obtaining a sufficient communications performance. In the following subsections three techniques are presented, which allow to reduce the number of FFT points.
Technique 1: Sample Rate Reduction (SRR)
The resolution of the estimated frequency can be obviously increased by a reduction of the sample rate. A reduction of the sample rate is equivalent to an increasing of the parameter T in Eq. (7). For the estimation there are only the L elements of the received burst sequence r available. Thus a sample rate reduction by a factor of D implies that the number of available samples for the estimation is reduced to L/D. However a reduction of the number of symbols for the estimation leads to a performance degradation. To circumvent this degradation the information of the received symbols is maintained by a averaging operation over D symbols of the modulation removed sequencer given by:
The operation of frequency and phase offset estimation is done with the new sequencer s according to Eq. (4) to Eq. (9). The sample rate reduction (SRR) technique according to Equation 11 implies another interpretation for the bins of the FFT. The frequency range of a FFT bin is now given by (M·N·T ·D) −1 Hz. The maximum frequency offset, which can be estimated is decimated by the factor D.
The phase estimation according to Eq. (9) yields the phase according to the sequencer s . This failure must be corrected; thus the estimated phase offset is given by:
Technique 2: Interpolation (INT)
As mentioned in the explanation of the base algorithm for the FFT based carrier synchronisation the resolution of of the frequency estimation is limited by the number of FFT points. To improve the resolution of a given discrete fourier transformation the information of the neighbouring FFT bins of the bin k f are taken into account too. Our technique uses a parabolic interpolation, which uses three bins of the discrete Fourier transformation. The behaviour of the Fourier transformation around the bin with the maximum energy is approximated by a parabolic function. The parabolic function is determined by the energy E C of the bin k f , the energy E L of the left neighbour bin, and the energy E R of right neighbour bin. The argument of the maximum value of this parabolic function is given by:
By construction, the value of will be in the range
This implies that the improvement of the estimation of the base algorithm will be within in the range of half a bin. For the correction of the frequency offset a virtual bin k f + will be used. The estimation of the phase must also be adapted to the new virtual bin. The Fourier transformation of the new virtual bin is calculated by linear interpolation between the corresponding bins. For a positive the value of virtual bin X(k f + ) is calculated as:
Applying the arctan function to X gives the estimationφ of the phase offset for the interpolation technique.
Technique 3: Decision Directed Approach (DD)
The idea of any decision directed approach for parameter estimation in synchronization is to use an estimation of the unknown symbols. The investigated decision directed technique of this paper derives estimation of the unknown symbols by demodulation of the carrier synchronized received samples. In the further processing the residual frequency and phase offset of the carrier synchronized received sample sequence is estimated and the sample sequence is corrected. The estimated symbols are treated as known symbols and are used to assist the estimation. The estimation of frequency offset with known symbols could be done with several well known techniques for frequency estimation, see e.g. (H. Meyer et al. , 1997) . For the the base algorithm the residual frequency offset |f o − f o | will be quite small. A technique for carrier synchronization can be applied, which is also used in the so called turbo synchronization (Alles, M. et al. , 2007) . The estimation of the transmitted symbol sequence s will be denoted bys. The modulation removal for the further estimations is done in a data aided way for the carrier synchronized sequence r c .
The additional frequency and phase estimation is carried out with the sequencer c . By an averaging process over the first half of the sequence and over the last half of the sequence a measure for the phase offset of the first half and of the last half of the sequencer c is calculated:
Applying the arctan operation to z 1 and z 2 gives an estimate for the phase of the first half and for the phase of the second half of the sequencer c . The frequency offset will be estimated by the difference of the phases of the first and second half:
The new phase estimation is given by the averaging over the whole sequencer c according tõ The algorithm outlined in Eq. 17 to Eq. 19 needs as a prerequisite, that the residual frequency offset is quite small, i.e., the phase increase over L/2 elements of the sequence r c must be less than π. If this condition is not fullfilled this algorithm produces considerable variations of the frequency estimation value.
Hardware architecture and implementation

Architecture
The base hardware architecture of the analyzed techniques is shown in Fig. 1 . As mentioned we assume, that the processing steps of matched filtering and timing synchronization are properly carried out. As well an automatic gain control unit (AGC) is required to adapt the input sample bitwidth to the core input bit width without limiting dynamic range. The back-end must deal with the M-times phase ambiguity introduced by Eq. (9). The building blocks of the core correspond directly to the base algorithm described in Sect. 2. The modulation removal component realizes Eq. (3) and provides the input for the FFT operation, which is given in Eq. (4). The spectral analysis component provides the estimates of frequency and phase offset according to Eq. 5 and Eq. (9) . The RAM buffer is used to garantee constant throughput by hiding the FFT la- 
Implementation
We used synthetizable VHDL for implementing the architecture shown in Fig. 1 for the investigated techniques. For rapid development and to minimize debug effort we relied heavily on IP cores included in the XILINX Core Generator 9.1. We also used specific XILINX resources like the internal multipliers (MULT) and block RAM (BRAM) available on the Spartan3-E FPGA. The modulation removal in Eq. 3 uses polar coordinates. The input samples r however are given in Cartesian coordinates. The calculation of arg (r) and |r| can be done with the CORDIC algorithm (Volder, J., 1959) which is available as an IP core. We utilized the fully pipelined version of this CORDIC IP core to achieve a throughput of one sample per cycle. To realize the technique of sample rate reduction (see Equation 11) additionally D samples with modulation removal must be added. This is accomplished by some control logic, two adders and additional BRAM storage. The resulting sample sequencer without modulation has to be transformed back to Cartesian coordinates for the FFT calculation (see Eq. 4) for all techniques. This is accomplished with a sine-/cosine-look-up-table (SCL) also available as an IP core and internal multipliers to realize the e j x operation. The FFT needed for frequency and phase estimation is realized using another XILINX IP core configured to sustain the selected maximum burst length. For implementation of the FFT three XILINX IP cores are available, which differ in throughput, latency and needed resources. Because again the desired throughput was one sample per cycle, the resulting core is fully pipelined and covers most of the area. The windowing feature proposed in Equation 8 for evaluating k f w is realized in the spectral analysis block, using a limited maximum absolute value search on the determined FFT bins. This can be implemented in a very efficient way by using only counters and comparators. After determining the frequency bin k f w , the spectral analysis block estimates the phase offset according to Eq.9. Again, the argument function needed for this equation is realized with a CORDIC IP core. Because only one argument calculation per phase estimation has to be performed, we can use a rather small serial CORDIC core without the ability to calculate |r|. The implementation of the INT technique requires some additional computations between search of the bin k f and the argument calculation for phase estimation. Additionally the right and left neighbour bin of k f w must be selected and stored. The realization of Eq. (14) requires a division, which is also available as an XILINX IP core. Furthermore some adders and control logic is required to implement Eq. (15).
In the frequency and phase correction component rotation of the samples as described in Eq. (10) is accomplished with complex number multiplications and a second SCL. The decision directed technique requires additional hardware in this 
Results
Communications performance
In this section we present communications performance simulations based on a bit true C model. All graphs are obtained using an addiditive white Gaussian noise (AWGN) channel model. A reference graph ideal is provided to allow comparison of the simulation results to a perfect carrier synchronization reflecting the performance limits of the AWGN channel. Figure 2 and Fig. 3 show the bit error rate (BER) graphs for a burst with 300 QPSK modulated samples obtained by the base algorithm with a 512 point FFT and a 1024 point FFT, respectively. Two graphs result from a uniformly distributed frequency offset in the range [0.01 · · · 0.02]. The other two graphs with the suffix identifier WC demonstrate the BER for a worst case szenario with frequency offset equal to a frequency exactly between two FFT bins. In high SNR region the 512 point FFT features a degradation up to 0.4 dB; in low SNR region a degradation of 0.1 dB can be observed. These simulations demonstrate the necessity to improve the results of the 512 point FFT. Figure 4 and Fig. 5 demonstrate the performance of the DD and the INT technique. Both techniques offer with a 512 point FFT the same performance as the base algorithm with 1024 point FFT. A very slight performance degradation in comparison to the base algorithm can be observed in low SNR ranges. This behavior results from the fact, that both techniques can only improve the frequency offset estimation in a small range as explained in Sect. 3. In low and high For performance evaluation of the SRR technique the lower maximum frequency offset correction capability of this technique must be regarded. Hence we applied a window to the FFT results of the reference technique according to Eq. 8. Fig. 6 shows the performance of the SRR technique with D=2 for frequency offsets of 0.01 and 0.03. A severe degradation for the higher frequency offset can be observed. This behavior results from the fact that succesional modulation removed sample posses a higher phase difference, such that the average operation of Eq. (11) reduces the energy of the resulting sample sequencer s .
Implementation results
Target device was a low cost XILINX Spartan3-E FPGA device. Synthesis as well as place and route was carried out with Xilinx ISE 9.1 software tools. Table 1 shows the XIL-INX resources for the different techniques using a 512 point FFT building block and a 1024 point FFT building block respectively. The achievable clock speed of 95 MHz is identical for all techniques. The throughput of all investigated techniques depends linearely on the number of FFT points. Hence the throughput for the designs with 512 pint FFT is doubled in comparison to the 1024 point FFT based designs. In Table 1 the Xilinx resources of the reference technique with 1024 point FFT are compared with the resources for the other techniques based on a 512 point FFT.
Conclusions
The SRR technique exhibits the smallest implementation complexity of the analzed techniques. However the usabilty of this technique is limited by maximal occuring frequency offset and the performance requirements of the application. The other two techniques offer a comparable communications performance and implementation complexity as the reference technique. However both techniques offer a doubling of the achievable througput and thus provide clearly better efficiency. The DD technique requires more implementation ressources than the INT technique and provides only a negligible better communications performance. With the three demonstrated techniques the design space for FFT based carrier synchronization is enlarged, that a fitting of the implementation to the throughput requirements and commucation performance requirements can be obtained.
