Abstract-This paper deals with the analysis and design of efficient non-data-aided clock and carrier (frequency/phase) synchronization algorithms intended for use in satellite digital video broadcasting systems employing turbo-coding techniques to enhance power efficiency. The above issue appears quite challenging in view of both the extremely low signal-to-noise ratio, typical of turbo-codes operation, and the very short time allocated to the acquisition of synchronization parameters. These constraints rule out most conventional clock/carrier recovery schemes and demand a careful search for specific, highly efficient algorithms. In the paper, we propose and analyze a clock/carrier synchronization scheme capable of operating at values of 0 as low as 1 dB with lock-in delay not exceeding 50 ms.
I. INTRODUCTION
T HE GROWING demand for digital satellite communication services has in recent years fostered the development of modulation and coding techniques with enhanced spectrum and power efficiency. Focusing on satellite broadcast services, a major concern for system designers and manufacturers is the implementation of domestic receivers, to be deployed at the users' premises, capable of operating at progressively lower and lower signal-to-noise ratios (SNRs). Any reduction in the operating SNR cuts down on the cost of the receiving equipment, since easier to install, smaller size rooftop antennas can be envisaged, as well as less demanding front-end radio-frequency (RF) circuitry. In this perspective, recently devised turbo-codes [1] , [2] can be regarded as good candidate coding techniques for application in the above-mentioned scenario, since they allow for substantial power saving with respect to other conventional coding techniques, at the expense of an increased system complexity and a larger decoding delay. As is known, the latter aspect may have a nonnegligible impact in systems featuring a bidirectional data flow, such as those envisaging the transmission of voice. In the application at hand, however, where interaction between the user and service provider is either absent or reduced to a minimum, the presence of the above delay is to be regarded as a minor drawback.
An important issue related to the use of turbo-codes pertains to the possibility of recovering accurate clock and carrier Manuscript received February 13, 2001 ; revised June 16, 2001 . The authors are with the Dipartimento di Ingegneria della Informazione, Università di Pisa, Pisa I-56122, Italy (e-mail: antonio.damico@iet.unipi.it).
Publisher Item Identifier S 0733-8716(01)10304-5.
(frequency/phase) references at the receiver. The extremely low signal-to-noise ratio typical of their operation rules out in fact many conventional synchronization algorithms, since they fail to attain a reasonable tradeoff between the time required to estimate the synchronization parameters (denoted in the following sequel as lock-in or acquisition time) and estimation accuracy. In particular, when considering satellite digital video broadcasting (DVB) service, which is the focus of this paper, the acquisition time is to be strictly upper-bounded to get rid of the unpleasant delays that may arise as the user switches between different channels. On the other hand, when acquisition is over, residual synchronization errors must be kept small enough so as not to significantly impact the receiver performance. The above two conflicting requirements push toward the search for maximally efficient synchronization algorithms, or, borrowing some terminology from parameter estimation theory, to schemes whose accuracy lies close to the Cramér-Rao bound (CRB) [3] . This paper is concerned with the analysis and design of the whole clock/carrier synchronization section of a receiver operating in the above outlined scenario. In particular, we assume quaternary phase-shift keying (QPSK) as the modulation format and as code rate, and we choose for a very low value (around 1 dB), typical of turbo-codes operation. In this framework, we search for synchronization schemes that may jointly provide acquisition times not exceeding a small fraction of a second and, in the meantime, a tracking accuracy ensuring a negligible impact on the bit error rate (BER) at the turbo-decoder output.
We show that a possible solution to the above issues consists of cascading three specific synchronization schemes taken up (with some modifications) from the literature, the first of which, used for clock recovery, is similar to that proposed in [4] , whereas the other two provide carrier frequency and phase recovery, respectively, and are adapted versions of the algorithms discussed in [5] and [6] . We show how the three mentioned synchronization devices can be designed so as to jointly provide the target overall synchronization accuracy at the nominal operating point. In particular, we discuss the mutual interactions arising among the three algorithms, namely, the effect exerted by clock errors on carrier recovery as well as the impact of residual frequency errors on phase estimation. As is shown in the following, consideration of the latter aspects leads to the formulation of some useful criteria on how the total available acquisition time can be partitioned among the various synchronization tasks. In the next section, we outline the signal and channel models. We also set out the basic assumptions on both operating conditions and target performance. In Section III, we discuss the clock/carrier recovery schemes and specify the criteria to be used for selecting the key parameters in each of the three synchronization blocks. In Section IV, we assess the performance of the above algorithms in terms of acquisition time and synchronization accuracy and also discuss their impact on the receiver BER. A brief summary concludes this paper in Section V.
II. SYSTEM OUTLINE AND BASIC ASSUMPTIONS Fig. 1 shows the functional block diagram of the transmitter. The source data rate is assumed to be 2 Mb/s and the turbo encoding scheme is the one described in [2] , whose structure is illustrated in Fig. 2 . It consists of a standard connection of two identical binary rate-1/2 recursive systematic convolutional (RSC) encoders, an -bit random interleaver (where ), and a puncturing block. Both the recursive encoders have generator matrix , where and . The puncturer deletes all even-indexed parity bits from the top RSC encoder and all odd-indexed parity bits from the bottom one. The resulting overall code rate is 1/2. The encoder output bits are mapped onto QPSK channel symbols belonging to the alphabet . Under the above assumptions, the resulting channel symbol rate is 2 Mbaud. Using complex envelope notation, the transmitted signal is (1) where is the transmitter impulse response, with Nyquist root-raised-cosine spectrum with rolloff factor , and is the symbol spacing. From the above assumptions, is fixed at 0.5 s. The noise-corrupted received waveform can be written as (2) where denotes the residual frequency offset after baseband conversion, is the carrier phase, is the channel delay, and is complex-valued noise with independent Gaussian components, each having two-sided spectral density . The operating signal-to-noise ratio is set at 1 dB. We also assume that the frequency offset does not exceed 10% of the baud rate, i.e., . This assumption is commonly met in practical implementations of DVB.
The receiver section involving the circuits for the estimation of the parameters , , and is shown in Fig. 3 . The output of the anti-aliasing filter (AAF) is sampled at the rate and the resulting sequence is fed to the timing recovery block. The reason for designating clock recovery as the first task to carry out is twofold: on one hand, it is suggested by the existence of symbol synchronizers capable of accurate operation even in the presence of uncompensated frequency offsets up to 10-20% of the baud rate. On the other hand, this also allows us to resort to very efficient clock-aided frequency estimators operating on samples at symbol rate.
The timing recovery subsystem accomplishes two different tasks: first, it estimates the delay parameter by means of a timing error detector (TED), and second, using this estimate, it interpolates the sequence to produce a new sequence at twice the symbol rate. The interpolated sequence feeds the matched filter MF , which is implemented as a 2-spaced finite impulse response filter. The matched filter output is denoted . We point out that MF need only be operated at baud rate. We also note that due to the presence of the frequency offset, this filter is not exactly matched to the input signal. However, as far as frequency recovery is concerned, it can be regarded as approximately matched as long as the offset does not exceed the limits specified above.
The samples from MF are processed by the frequency estimator (FE) whose purpose is to measure the residual frequency offset . The estimate of is used to compensate for frequency errors in the samples before they are passed to the second matched filter MF . The matched filter MF produces in turn a symbol-rate sequence , which feeds the phase estimation subsystem. The estimates produced here by the phase estimator (PE) undergo an unwrapping operation before being employed for phase compensation in the signal samples entering the decoder.
Some comments on the use of two matched filters are in order. The first matched filter is necessary to ensure proper operation of the frequency estimator, which in principle requires ISI-free samples. As mentioned earlier, this condition is not exactly met due to the presence of the frequency offset, which introduces a slight mismatch in the filter. However, this mismatch was observed (via simulation) not to affect frequency recovery in a significant manner. The second matched filter has been introduced in order to recover the above mismatch before the frequency-compensated samples are fed to the turbo decoder. Indeed, it is worth noting that, depending on the system operating point, the turbo decoder operation may be significantly affected by small variations (on the order of a fraction of dB) of the signal-to-noise-plus-interference ratio. We also notice that the aggravation in system complexity introduced by the second matched filter is marginal. 
III. SYNCHRONIZATION ALGORITHMS

A. Clock Synchronization
As illustrated in Fig. 3 , the TED and the interpolator are connected in a feedback loop arrangement. To carry out interpolation, a third-order polynomial is utilized, using the procedure outlined in [7] and [8] , which is briefly reviewed hereafter. The interpolator task is to produce the sequence (3) where is called the basepoint index (4) the fractional interval (5) with , and denotes the th interpolation epoch. The coefficients are third-order polynomials in , whose expressions can be found in [8] .
The TED produces an error signal , which is used to compute the sampling epochs . The error signal is generated according to the Gardner algorithm [4] Re (6) It is worth noting that two different time indexes and are used for and , respectively. This reflects the fact that the error signal is computed at symbol rate, whereas the interpolation epoch is updated at twice that rate according to (7) where the auxiliary error signal is defined as (8) and is the step-size. The step-size and the loop noise bandwith are related as (9) where is the slope of the S-curve at the stable equilibrium point. Usually, is much less than unity and (9) reduces to (10) Equation (10) shows that the loop noise bandwidth is approximately proportional to the step-size , whose value must be properly selected in order to ensure an acceptable acquisition time with a small timing variance.
The sequence in (7) needs further processing to produce a control signal for the interpolator. In fact, from (7), it is seen that increases unboundedly with , and this may lead to numerical overflow. A method to circumvent this problem is described in [7] . . It is seen that the S-curve is barely affected by the frequency offset even when . Fig. 5 shows the timing mean-square error (MSE) plotted versus for the same values of . The equivalent noise bandwidth of the timing loop is set at (corresponding to ). As in the previous case, we find that the impact of the frequency offset on the timing loop performance is negligible. We observe that for dB, the standard deviation of the timing error is about 1% of the symbol period . The impact of such a small jitter on the phase and frequency estimator performance and on the receiver BER can be considered negligible.
In Fig. 6 , the timing acquisition transients are shown for both of the above values of . Here we have set , , and
, where denotes the initial value of . We see that the acquisition time amounts to approximately 2.5 10 symbols, corresponding to 12.5 ms.
From the previous discussion, it can be concluded that a loop noise bandwidth could be adequate to ensure good timing accuracy and short timing acquisitions.
B. Frequency Offset Estimation
Estimation of the frequency offset is the most critical problem insofar as the accuracy of the FE strongly affects the operation of the phase synchronizer. The starting point for the design of the FE is to set an upper limit to the overall rms phase error, so as to ensure a negligible BER degradation. The above limit is established with the aid of Fig. 7 , which shows the BER performance of the turbo decoder in the presence of random phase jitter. The latter is modeled as a zero-mean Gaussian random variable with variance . The figure shows plots of the BER as a function of , for equal to 0 (i.e., ideal phase recovery), 2 , and 3 . We notice that the curve pertaining to is almost overlapped with that of the error-free case, while with we observe a slight degradation (around 0.1 dB at BER ). Thus a proper choice of the above limit to rms phase tracking errors is . As will be shown in the next section, achieving an rms phase tracking error limited to a few degrees at dB entails the use of quite long observation intervals for the phase estimator, typically on the order of thousands of symbols. Furthermore, to make the impact of the frequency synchronizer on the overall phase error acceptable, the residual rms frequency error is to be strictly bounded to avoid excessive phase rotation during the PE observation interval. As a rule of thumb, we require the frequency error not to exceed, say, one-hundredth of the inverse of the PE observation length. When the latter is thousands of symbols long, the above condition leads to an FE accuracy around 10 . At this point, we are faced with the issue of selecting a frequency estimation algorithm capable of attaining the above tracking accuracy with an observation length as short as a few tens of milliseconds. None of the closed-loop FEs described in the literature [9] - [16] can satisfy such strict requirements. With feedback loops, in fact, an rms frequency error of 10 would only be achieved with an equivalent noise bandwith smaller than 10 . This, in turn, would entail an acquisition time largely exceeding the limit indicated in our design requirements.
In conclusion, a feedforward estimator is called for, arranged as shown in Fig. 3 , where the samples processed by the FE are synchronous with the transmission clock. As mentioned above, if the frequency offset is smaller than 0.1 , the intersymbol interference due to the receive filter mismatching can be neglected and, in consequence, we have (11) where is white Gaussian noise. The FE outputs a frequency offset estimate based on the observation of a block of samples . We now briefly outline the processing steps leading to this estimate. First, paralleling the approach proposed in [6] , the modulation is wiped out from by means of the nonlinear transformation (12) where (13) (14) (15) and is an integer. In the following, we assume the value , which was found to yield the best estimation accuracy at low signal-to-noise ratios.
The sequence is now further elaborated to estimate . Many of the algorithms known in the literature [17] - [21] do not work properly at low values of because of the threshold phenomenon. This consists of the occurrence of large, spurious frequency errors (outliers) when the signal-to-noise ratio drops below some threshold. The phenomenon is illustrated in Fig. 8 , which shows the frequency error variance as a function of for the Kay algorithm [17] . We see that the algorithm works properly as long as dB, and in fact it achieves the modified Cramér-Rao bound (MCRB) [22] . However, for dB, the algorithm performance deteriorates considerably because of the presence of outliers. The simulations have been run with an observation interval of 64 symbols. Unfortunately, the threshold of the Kay algorithm cannot be lowered by increasing the number of -spaced signal samples involved in the estimation process.
Improved performance is provided by the Rife and Boorstyn (R&B) method [5] , which has a threshold as low as desired provided that the observation interval is sufficiently long. This is confirmed by the simulations in Figs. 9 and 10, which show that the threshold can be lowered below 1 dB provided that . Moreover, it is noted that use of the latter observation length ensures an rms frequency estimation error lower than 10 . The R&B algorithm is based on the exhaustive search for the maximum of a periodogram. More specifically, the discrete Fourier transform (DFT) of the sequence is first computed (16) and then that value of , say , is sought that maximizes the function . The frequency offset estimate is finally obtained as (17) The search for the peak of the periodogram is carried out in two steps. The first step (referred to as coarse search) involves the calculation of over values of , with uniform spacing , i.e., at the radian frequencies (18) This calculation is conveniently carried out by means of a fast Fourier transform (FFT). The four largest values of (say, those with indexes , , , and ) are retained for further processing.
The second step (fine search) begins with the calculation of at the radian frequencies (19) which are located at the left and the right side of . To see how this can be done without substantial complexity aggravation, let us concentrate on . Its calculation may be carried out using a well-known [23, pp. 394-398] interpolation formula, which allows to express in terms of its samples (20) where (21) Performing some algebraic manipulations, (20) can be written as (22) where the notation denotes " modulo ." An advantage of (22) over (16) is that the former can be approximated by truncating the summations to the first few terms. In other words, we use as a good approximation of the following expression: (23) where is much smaller than . A similar approach leads to (24) Once the values of , , are available, the maximum among the 12 quantities , , and , , is looked for. Suppose that the maximum is obtained for Then, a parabola is drawn through , , and and the location of the maximum of the parabola is computed. It results that is given by (25) Actually, simulations indicate that the above parabolic interpolation is not strictly necessary to attain good accuracy. Quite often, it is sufficient to set (26) Simulation results have shown that the estimation accuracy attained choosing in (23) and (24) is comparable with that achievable through zero-padding the sequence (as recommended in [5] ) with 8192 zeroes and taking, in the coarse search, a 16 384-point FFT, but it allows a substantial computational savings.
C. Carrier Phase Estimation
The th sample at the input of the phase estimator has the form (27) where is complex-valued white Gaussian noise and is the residual frequency error. The task of the estimator is to compute the argument of the complex exponential in (27), i.e.,
One of the most efficient techniques for this purpose is the Viterbi and Viterbi (V&V) algorithm [6] . The output from MF is first fed to a nonlinearity, yielding (29) where (30) (31) and represents a nonlinear transformation of . As was the case with frequency estimation, a good choice for is
The are used to estimate the residual phase offset according to the V&V algorithm (33) where 2 1 is the number of -spaced samples involved in the estimation process.
Such an estimate is to be referred to the middle instant in the observation interval or to the central element in the sequence , , . The performance of the algorithm (33) depends on the signal-to noise ratio , the duration of the observation interval 2 1 , and the residual frequency error . In Fig. 11 , some curves of the phase error variance versus are illustrated for different values of and for dB. The curves have been plotted using the theoretical results from [6] . We see that for a fixed value of , the variance decreases as increases, reaching a minimum for an optimal value (which depends on ).
The choice of is based on the following considerations. It turns out that, at dB, the BER performance of the turbo decoder is barely affected by a phase error if its standard deviation is on the order of 2-3 (see Fig. 7 ). This means that a phase error variance of about 3 10 rad would be adequate. Since the maximum frequency error is not larger than 10 , it is easily inferred from Fig. 11 that the choice will ensure the desired phase error variance. Accordingly, in the following, we assume that . As the function in (33) takes on values belonging to the interval , the phase estimate is "wrapped" in 4 4 , giving rise to a phase ambiguity problem [24] . This phase ambiguity can be solved by means of the phase unwrapping procedure depicted in Fig. 12 [25] . As is seen, the output of the phase unwrapper is given by
where is a sawtooth nonlinearity which reduces to the interval 4 4 . The unwrapped phase estimate is used to correct the phase of the samples prior to decoding. It is seen from (33) that the computation of a single phase estimate requires 2 complex additions. The computational load due to phase estimation can be reduced if the estimate is computed block-wise rather than sample-by-sample, i.e., the th estimate is used to correct the phase of a whole block of signal samples , , , where is a design parameter. In practice, the sequence is partitioned into partially overlapping (2 1)-long blocks, as shown in Fig. 13 , with an overlap region of length samples. Phase estimation is carried out based on the samples 2 1 belonging to each block to attain the specified accuracy, but is used to correct only 2 1 samples in each block. It is apparent that the phase error variance will be maximum for the first and the last sample of the subblock (both at a distance from the central sample). The expression of the maximum variance is (35) where is the phase error variance for the central sample and is the frequency error variance. In our case, is about 3 10 rad , whereas is smaller than 10 . Thus, a reasonable choice for is . The resulting maximum variance is less than 3.1 10 rad .
IV. PERFORMANCE ASSESSMENT
The receiver performance in the presence of synchronization errors has been assessed by computer simulation. In particular, the effect of the clock synchronizer on the frequency and phase estimators has been investigated (the results of Sections III-B and -C were obtained under the assumption of an ideal clock reference). Also, the impact of synchronization on the error performance of the turbo decoder has been established. The following assumptions have been made.
1) The AAF is an eight-pole Butterworth filter with bandwidth .
2) The equivalent noise bandwidth of the timing loop is .
3) The frequency observation length is . 4) The phase observation length is . Computer simulations have shown that the presence of the timing synchronizer does not substantially affect the performance of the frequency estimator as compared with ideal timing. A similar conclusion holds for the phase estimator, as confirmed by Fig. 14 , in which the dots represent simulation results (obtained in the presence of both the clock and frequency synchronizers) and the solid line is the theoretical jitter variance of the phase estimator with ideal clock and frequency recovery. Fig. 15 shows the simulated BER performance for two different values of . For comparison, the same figure shows the BER of the turbo decoder under the assumption of ideal clock, frequency, and phase references. As expected, the choice guarantees negligible BER degradation with respect to coherent detection. On the other hand, simulation results show that the decoder performance becomes poor when we reduce below 500. Furthermore, shortening the phase observation interval too much may be dangerous because it could increase the cycle slip rate of the phase synchronizer [26] above acceptable limits.
Finally, some comments on the total acquisition time of the system are in order. We have seen that for , the clock acquisition takes place in a time interval . Frequency estimation requires that the clock estimator has reached a steady-state condition. Thus, frequency acquisition will occur after the interval . Phase recovery can be accomplished when frequency and clock references are available, and it needs the time 2
1 . The total acquisition time is then . With the parameter values indicated above, is smaller than 3.5 10 signaling intervals, which correspond to 17.5 ms (much smaller than the 50 ms indicated in the design requirements).
V. CONCLUSIONS
We have demonstrated the feasibility of non-data-aided carrier and clock synchronization in the extreme operating conditions where dB and the lock-in time is constrained not to exceed 50 ms. The above scenario is likely to be envisaged in near-future satellite-based DVB systems employing turbo-codes to enhance transmission power efficiency. The structure we propose consists of the cascade of three functional blocks, the first of which is devoted to clock recovery and is based on a first-order feedback control loop with error detector of the type proposed by Gardner in [4] , while the other two blocks are for carrier frequency and phase synchronization, respectively, and use the algorithms discussed in [5] and [6] . The performance of the above schemes has been accurately assessed in terms of residual rms error in the estimation of the synchronization parameters, as well as of their impact on the turbo decoder error rate.
