Abstract-In traditional receiver architectures, symbol acquisition and tracking are performed using phase lock techniques that are independent of the channel-code decoding process. In [1] feedback from the constraint-node side of a bi-partite graph is used to estimate symbol frequency and timing offset in a baseband pilotless transmission. In [2] soft information feedback from an LDPC decoder is used to recover carrier phase information under the assumption of perfect symbol timing. In this paper we address the problem of joint carrier-phase and symbol timing recovery. The proposed system is able to perform within 0.3 [dB] of the code performance with perfect knowledge of carrier phase and symbol timing.
I. INTRODUCTION
Recent advances in iteratively decoded channel codes such as LDPC codes make it possible to operate at capacityapproaching SNRs. This places more stringent requirements on the timing and phase recovery portions of receivers, which must successfully acquire and track symbols and carrier information at these lower SNRs. Acquisition and tracking have traditionally been performed independently of channel decoding. However, the LDPC decoding process provides information that can be used by a timing recovery circuit to enable significantly improved performance relative to a system where no such information is present.
The idea of coupling LDPC decoding with timing recovery has been explored in the past [3] , [4] . Previous treatments in the literature addressing joint LDPC decoding and timing recovery has focused on the use of output codewords produced as the iterations progress. By contrast, we exploit the information available from the metrics computed at the constraint nodes of an LDPC code during the decoding process. In addition, we use a waveform model that more directly captures the distortions induced by relative transmitter/receiver motion and other receiver-side timing errors. This model was introduced in [1] under the assumption of perfect carrier information.
A significant research effort is underway in the area of joint decoding and carrier phase estimation. As clearly explained by Noels et al. [5] two somewhat distinct groups of joint decoding and synchronization algorithms have evolved. The first group approaches the parameter estimation problem by
The research in this paper was performed with the support of the Office of Naval Research (Contract number N00014-06-1-0253), the NSF (Grant number CCR-0120778 and CCF-0541453), ST Microelectronics and the State of California through UC Discovery Grant COM 103-10142. modifying iterative detection/decoding algorithms and the corresponding Tanner graphs to include parameter estimation. A partial list of work on this approach includes [6] - [9] . Of particular interest has been the work of Colavolpe et al. [7] where phase-tracking processing nodes were introduced in the iterative decoding graph. Dauwels et al. [9] also investigated specially adapted message-passing update rules. The second group of algorithms interchanges messages between an independent phase estimation block and an essentially unmodified iterative decoder. The resulting architectures are often said to employ turbo synchronization. Noels et al. [5] have done a careful study of the mathematical interpretation of turbo synchronization algorithms by means of the expectationmaximization (EM) algorithm. Algorithms of this type can can be found in [10] - [13] .
In [6] the authors show that pilotless techniques are more efficient at lower SNRs where the pilot insertion loss is considerable. In this work we use the pilotless turbosynchronization technique described in [2] and present a carrier recovery circuit that is able to handle cases of imperfect symbol timing information. The proposed technique has the potentially attractive feature that little modification is required with either the iterative decoder or the carrier and timing recovery blocks. For carrier phase synchronization, the work leverages the fact that LDPC symbol estimates can 'wipeoff' modulated symbols in a decision directed carrier recovery loop to enhance the carrier information such that a classic residual carrier phased-lock loop (PLL) is able to provide increasingly accurate phase estimates over LDPC iterations. The proposed method incurs a latency penalty (by way of increased iterations) as carrier phase and timing information is acquired. However, complexity in terms of system description and area (in the case of a real-time implementation) remains similar to that of state of the art residual carrier and timing recovery techniques currently used in NASA's deep-space network.
The rest of this paper is organized as follows. The next section provides a detailed description of the transmitter and receiver models and gives an overview of the joint parameter estimation process. In Section III, the circuit for symbol timing estimation is introduced. A digital implementation of the carrier synchronization circuit is illustrated in Section IV. Section V presents numerical results derived from a simulation of the BPSK scheme with a particular LDPC code. Finally, Section VI documents our conclusions.
II. TRANSMITTER AND RECEIVER MODELS
On the transmitter side, we consider a baseband signal comprised of N root raised-cosine pulses h RRC (t), transmitted at multiples of a symbol interval T and scaled d i ∈ {±1}:
Multiplication by a sinusoidal carrier signal yields the transmitted waveform:
where P is the signal power. 
where T s is the sampling period and the frequency offset F P P M is measured in parts per million. The received waveform can be modeled as:
where
θ c is the carrier phase and n(t) is a bandpass AWGN process. A block diagram of the decoding circuit along with the BPSK digital implementation are shown in Fig.1 . The input signal y Rx (t) is converted to baseband and low-pass filtered to remove frequencies at 2w c which yields:
3) The In-phase/Quadrature (I&Q) signal components in (3) are then sampled and matched filtered resulting in two digital signals:
The "symbol timing" recovery process described in Section III is now initialized. After the symbol-timing block corrects time delays, random walks and sampling frequency errors, parameter information is interchanged in an iterative fashion with the "carrier synchronization" block described in Section IV to complete the iterative parameter estimation process.
III. SYMBOL TIMING RECOVERY
In Fig. 2 we illustrate the receiver architecture which exploits feedback from the LDPC decoder to manage symbol timing errors. The received waveform is initially sampled at intervals of T s and stored into a buffer. The interpolator computes interpolants at intervals of T i using linear interpolation, which are then used for the matched filtering process [14] . In this work, we use T i =T /2 and T s =T /4, whereT is the receiver-side assumption of the transmitter symbol periodT (i.e. the symbol period that would be seen by the receiver in the absence of any timing perturbations).
The timing recovery circuit from Fig. 2 consists of two loops. Loop 1 is first executed to iteratively recover constant time phase and symbol-frequency offsets. The phase error estimator provides the interpolator (after the matched filter) with a time offset, which is used to correct the constant time delay. The symbol-frequency estimator provides a frequency control word which is resampled at a rate of 1/T s and fed to the numerically controlled oscillator (NCO).
Both the constant time delays and sampling frequency offset estimation processes use information from the iterative channel decoder based on the percentage of satisfied LDPC constraints. The utility of this metric as a feedback mechanism is illustrated for the case of symbol-frequency offsets in Fig. 3 , which shows the average percentage of satisfied constraints as a function of frequency estimation error for different SNRs (E b /N 0 ) and numbers of LDPC iterations. A similar plot, with similar tradeoffs, can be constructed for the relationship between the constant time delay estimation error and satisfied LDPC constraints.
In [1] phase and symbol-frequency estimates are generated in an iterative fashion using a window search method. An initial window and step size are chosen and a fixed number of LDPC iterations are performed at each hypothesis point. For example, in order to estimate a symbol-frequency offset of ±2000 ppm (i.e. ±0.2%) an initial step size of 400 ppm is without a significant performance degradation. As long as the frequency offset is contained within the initial search window, the algorithm will converge with an accuracy that increases with increasing SNR. The complexity of this method grows linearly with the width of the range of frequency offsets contained in the initial search window. It is possible to track waveforms where both time delays and symbolfrequency offsets are present by means of a two-dimensional search strategy. For the purposes of this paper, when a time delay is imposed we assume it is limited to ±0.5T . This is effectively the same as assuming that some other mechanism has provided frame synchronization.
After large-scale phase and frequency errors have been identified in loop 1, loop 2 is used to handle random walks, correct residual time delay and sampling frequency errors, and to perform the remaining LDPC decoding. A conventional first-order PLL-based circuit with a decision-directed MuellerMüller timing error detector (M&M TED) [15] is used in loop 2. After every LDPC iteration, the M&M TED is provided with the symbols decoded by the LDPC decoder, analogous to the approach of Barry et al. [3] .
At this point, an updated version of the signals z c and z s is sent to the carrier-phase recovery loop to produce a new estimateθ c . As shown in Fig. 1(a) , this information is then fed to the LDPC decoder to continue with the iterative parameter recovery process. From this point forward, every update from the carrier-phase estimation loop is followed by an update from "loop 2" in the symbol-timing circuit in an iterative fashion.
IV. CARRIER PHASE SYNCHRONIZATION
The carrier recovery circuit for BPSK modulation used in this work is the decision-directed carrier synchronization (DDCS) circuit originally proposed in [2] . This circuit converts the received modulated carrier to an unmodulated carrier (pure tone) before applying it to a phase-tracking loop. This is done by multiplying z c 
which is then input to a second order digital PLL whose NCO produces an estimate of the carrier phase denoted bŷ
, respectively, and then differencing the results of these products provides the error signal:
where as before
denotes the phase error in the loop.
The performance of carrier-phase synchronization loops is commonly expressed as a function of the "loop SNR" (L SN R ). For a PLL based system, this can be expressed as:
where P c is the carrier power, N o is the noise PSD and B L is the loop bandwidth [16] . The degradation of L SN R performance in the case of BPSK is represented by a quantity called the "squaring loss", which is a measure of the degradation of the receiver signal-to-noise (SNR) ratio and is associated with the mean-squared phase error of the loop. At low symbol SNR, the squaring loss of an I&Q loop, such as the Costas loop, can be severe enough to prevent tracking:
where P t is the total transmitted power, N o is the noise PSD, B L is the loop bandwidth and R d is data SNR at the input of the receiver. Note that (5) is independent of the iteration process.
If the data sequence and its timing parameters were completely known, then a BPSK signal could be converted to a pure tone simply by multiplying the BPSK signal by the data waveform. One could then track the unmodulated carrier with improved performance by use of a PLL, which from (4) we see that it does not exhibit squaring loss. Short of complete knowledge of the data waveform and in the presence of noise, the best approximation of a pure tone could be obtained by feeding back an estimate of the data waveform corresponding to tentative decisions on the data symbols.
Although initially available data-waveform estimates (ŷ[k]) are generally of low quality, they can be used to initiate the carrier synchronization process by reducing the number of data transitions at the input. Once phase lock is achieved, the improved phase estimates can be fed back to the data detector, yielding improved symbol estimates for feedback, and thereby achieving even better phase tracking. This iterative process eventually leads to virtual elimination of squaring loss, so that the performance of the system approaches that of a phaselocked loop operating on an unmodulated carrier signal. For the proposed system we have that:
where A 2 /σ 2 represents the decoder soft-estimate of the data SNR.We can see from (6) that as the iteration proceeds, the estimated data SNR increases and likewise the squaring loss decreases. By comparison, for a Costas loop, the expression for the squaring loss in (5) remains fixed, independent of the iteration process, for a given symbol SNR.
Another important difference between these two circuits is that unlike the Costas loop, the DDCS circuit operates at baseband. This greatly simplifies the circuit complexity since high-frequency processing of the received signal is not required. Fig. 4 compares the L SN R performance for both loops under the assumption of perfect symbol information [2] , using a rate-1/2 irregular LDPC code of length n = 1944. An integrator was added to the output of the traditional Costas circuit to reduce the jitter in the phase estimates. For the DDCS system channel observations are updated on every iteration. On the other hand, the Costas loop is independent of the decoder's decisions. This implies that for the Costas case, the horizontal axis of Fig. 4 in fact represents the number of times that each block (of size n) is processed by the loop. For the DDCS circuit, steady state is reached after 10 iterations (10 × 1944 = 19440 total symbols processed). The Costas loop converged to its steady state operation after overprocessing each block of 1944 symbols approximately 20 times (for 38880 total symbol observations). The speed of convergence is highly dependent of the gains of the loopfilter shown in Fig. 1(b) . A second order filter with transfer function 
V. NUMERICAL RESULTS
We have evaluated the performance of the all-digital BPSK approach, assuming perfect knowledge of the carrier frequency and simulating the signals in (3). Joint parameter estimation and decoding was performed using a rate-1/2 (1944, 972) irregular LDPC code developed in [17] and currently in the IEEE 802.11n standard. After a complex rotation to resolve phase ambiguity (discussed below), the signals z c and z s are multiplied by the decoder outputŷ to form u c and u s . As described in previous sections and shown in [2] , if the PLL input has a small fraction of total modulated symbols in a block successfully removed, then it can begin to produce a reasonable phase estimate, even at relatively low SNRs. We have found that the estimation/decoding process can be successfully started by assigningŷ to the signal z c or z s with the highest energy (Subsequent iterations deriveŷ from the decoder). After this assignment, the PLL in the carrier synchronization loop operates once across all symbols in a codeword. LDPC decoder log-likelihood ratio inputs are then produced by combining the updated PLL phase estimates with z c and z s :
. In order to remove residual timing errors, "loop 2" from the symbol timing circuit in Fig.2 is updated after a new carrierphase estimate has been generated.
We conclude this section by noting that phase ambiguity (for offsets greater than ±π/2 can be resolved by first measuring the average power across a single codeblock of the signals z c and z s . If the sine component (z s ) has average power greater than the cosine component (z c ), then these two components are swapped. This procedure may leave (or induce) a remaining error of π radians. To resolve this ambiguity we run a single PLL pass followed by several (up to 4) LDPC iterations. The orientation that produces the maximum number of satisfied odd-degree check equations is selected and the decoding procedure is reinitialized 1 . Similar techniques are proposed in [10] , [11] .
Results in Fig.5 for a carrier phase offset φ = θ−θ = π/4, a symbol-frequency offset of ±2000ppm, a time delay of ±0.5T and a random walk of σ d /T = 0.5% shows a degradation smaller than 0.3 dB from the code performance where carrier phase and symbol timing are known perfectly. VI. CONCLUSION We have demonstrated a means for improving the symbol timing and carrier-phase estimation for iterative decoded 1 Even degree checks remain satisfied under a rotation of all inputs by π. BPSK using information derived from an LDPC decoder. For carrier synchronization, the signal modulation is removed prior to the carrier tracking operation. The motivation for doing this is to overcome the penalty in noisy reference loss attributed to the large squaring loss at low SNRs that is characteristic of the traditional BPSK carrier sync loops such as the Costas-type loop. The scheme described in this paper makes use of soft-decision information and does not require estimating the decoder error probability. A pilotless symbol timing recovery architecture for tracking time delay, frequency offsets and random walks using LDPC feedback was also presented. The complexity of this window search method siginificantly reduces the number of iterations needed in [1] . Performance within 0.3 dB of the "genie-aided" performance can be achieved for large time delays, frequency timing offsets and any carrier phase offset.
