Abstract: This paper describes an adaptive software aided technique for phase error reduction in digital carrier recovery (CR) for high-order Quadrature Amplitude Modulation (QAM). Simulations and analytical results illustrate a phase error variance improvement of at least 30 dB compared to conventional CR, leading to more than 3 dB processing gain enhancement for high-order QAM, using the simple 4 th power phase estimation technique. The new technique can be employed to improve the performance of any CR loop, at the cost of slight complexity overhead independent of the modulation order.
Introduction
High-order Quadrature Amplitude Modulation (QAM) has become a common choice for high capacity digital communication systems which require very bandwidth efficient modulation techniques. For these types of applications, an accurate carrier recovery (CR) with a low phase error is required to allow coherent demodulation.
New phase estimation schemes with reduced phase error are introduced in [1, 2] for high-order QAM. In [3] a nonlinear least mean square (LMS) phase and frequency estimator is used in feed-forward carrier recovery architecture. In [4] a modification in the constellation diagram of 128-QAM is utilized to reduce the phase error.
This paper presents a software aided technique for reducing the phase error in digital CR architectures. Analytical and simulation results are given for high-order QAM, illustrating an improvement of phase error variance by at least 30 dB compared to the conventional CR loops based on 4 th power phase estimation. This leads to more than 3 dB enhancement in the system processing gain at the cost of slight complexity.
Proposed CR architecture
The proposed CR architecture is shown in Figure 1 (a) in which a direct digital synthesizer (DDS) is utilized to generate the orthogonal carriers. Extracting I and Q components of the received signal, the phase error of the complex base-band signal is estimated to compensate the carrier frequency offset after filtering. In the proposed architecture, we utilize a software aided limit estimator and a limiter to decrease the variations of the loop filter output. Further, the limiter reduces the output range of the phase estimation (PE) block, which significantly improves the phase error performance.
In this CR loop we have utilized the simple well-known 4 th power method for phase estimation. Since this method introduces a large phase variance for high-order QAM schemes, a constellation subset which contains a fraction of input symbols is often used for phase estimation [5] . To minimize the effect of the low SNR input symbols on the phase estimation accuracy, we utilize the set of 12 outermost constellation points.
Considering the limiter, there is no need for a very complicated loop filter for the proposed architecture. Thus, a simple single pole IIR filter with unity DC gain is employed in the CR loop as follows.
where, p is the single real pole place. The DDS block is also characterized by the following equations:
where, f s is the sampling frequency, n is the number of bits of T W , the DDS tuning word, and Δf out is the frequency resolution of the DDS output.
The CR architecture in Figure 1 (a) is obtained by inserting a non-linear limiter block into the conventional loop. This limiter can be regarded as a digital quantizer with an interval of ΔL defined by:
where, n q is the number of quantization bits. The main role of the limiter block is phase error improvement in the CR loop, considering that this block is carefully controlled by a software aided limit estimator. 
where, x(n) is the input symbols to the mean calculator and sv(k) and mv(k) are the sum and mean values of N mc input symbols, respectively. The limit estimator uses a two-state adaptive algorithm to derive the respective upper limit (UL) and lower limit (LL) of the limiter block. In the first state, the algorithm waits for an initial lock (I-Lock) of the loop, without using any upper or lower limits. For this state a variation of ±100 ΔL in the consecutive calculation steps is allowed. To detect the I-Lock state, the algorithm needs the variations of the mean values to remain below 100 ΔL for 50 consecutive steps. When the I-Lock state is established, the algorithm estimates the carrier offset frequency (Δf c,est ) by averaging the mean values of the previous steps using (2) as follows:
where, k is the DC loop gain and mv is the average of the mean values calculated during the past 50 steps.
In this state the DC loop gain is increased temporarily so that a wider frequency offset range is supported for compensation. After tuning the DDS center frequency, the algorithm enters the second state in which an adaptive lock (A-Lock) is acquired. In this state, the limit values are set as:
in which, N th , the limit coefficient, determines the threshold span which is a multiplicand of ΔL. In the beginning of this state, limit values are set according to:
In this state, the limit thresholds are adaptively changed to be settled to an optimum level. In each 50-step iteration, the maximum absolute value of mv, (mv max ) is calculated to have an estimation of the carrier frequency estimation error variance. If this value is less than N th ΔL, then the new limit coefficient is calculated as below:
Using this equation, the limits are gradually reduced to reach an optimum value considering the frequency error variance. It should be noted, however, that if in an iteration mv max exceeds 2N th ΔL, the N th reduction process rolls back using the following equation:
A flowchart for the proposed algorithm with more details is depicted in Figure 2 . The algorithm guarantees negligible output phase error for the
Fig. 2. Proposed lock detection and limit estimation
algorithm flowchart loop even in low signal-to-noise conditions when an initial stable lock fails to establish for high-order QAM due to high phase error, as reported in [5] .
Performance analysis of digital CR loop
In this section, we analyze the performance of the conventional digital CR loop shown in Figure 1 (b) . Then based on this analysis, the proposed CR architecture will be analyzed and compared to the conventional one. In this figure, PE is the phase estimator with DC gain of k P F D , LPF is the loop low-pass filter, and k 1 is the trimming loop gain. The transfer function of this loop is:
The DDS is modeled as a simple integrator with the following transfer function:
We consider a single-pole low-pass filter for the loop:
where, α is the distance of the filter single pole from the unit circle. By substituting (12) and (13) in (11), we have:
This shows a second-order behavior for the loop. The frequency response of the loop can also be obtained by setting z = e jω in (14):
As discussed in previous sections, the main source of phase error is due to PE block output variations, which can be modeled as an independent additive noise source, x n , shown in Figure 1 (c) .
The noise power is calculated from the probability distribution function (PDF) of the constellation subset points obtained by 4 th power transformation. Assume the points are x n k , k ∈ {1, 2, . . . , q}, where q denotes the number of the points. The noise power can be computed from:
where, f Xn (x n k ) represents the PDF and x n k denotes the mean of the points. Now the output phase error power spectral density around the center frequency can be calculated using (15):
Based on the analysis for the CR loop of Figure 1 (b), we will next calculate the power spectral density of the output phase error for proposed algorithm shown in Figure 1 (a) . In this figure, the noise generated at the output of PE which is modeled by the additive noise, x n , after passing through the limiter is much less than the conventional architecture. The noise decreases as the loop passes through the adaptive lock states where the upper and lower bounds of the limiter get closer. Finally, in the A-Lock state the noise power in (16) is very small. Considering Figure 1 (c) , we can derive an equation for the output phase error using (15).
where, x n−lim and σ 2 xn−lim denote, respectively, the limiter output and its power.
Simulation results
A Simulink model was designed for a complete transceiver loop with the CR architecture in Figure 1 Performance simulation results for 256-QAM system are shown in Figure 3 (b) . In this figure, more than 6 dB system gain improvement can be Although the simulation results are presented for square QAM, the proposed technique can be employed for Cross QAM by using polarity decision based algorithms for phase estimation as shown in [6] .
Summary and conclusions
This paper presents the design and analysis of a new adaptive architecture for reducing the phase error of digital carrier recovery algorithms (CR) based on software aided technique for high-order QAM constellations. The results of analysis and simulations show a reduction of more than 30 dB for the phase error variance, which leads to more than 3 dB processing gain improvement for high-order QAM schemes. The hardware complexity overhead is little, independent of the CR loop filter and modulation order. This technique can be employed for any other CR loop architecture to improve the phase error characteristics for high-order modulations.
