ABSTRACT
Introduction
Coherent communications has the advantage of being able to provide superior error-rate performance [1 Chap. 5] , with the tradeoff of increased complexity at the receiver, which contains various PLLs. With the advent of faster and better A/D converters, FPGAs, and ASICs, attention in the past two decades has turned from analog to digital implementations [2] [3] [4] [5] -i.e. where sampling of the received IF or near-baseband signal is performed, and the entire demodulation process is done digitally. However, many PLLs have neither completely analog nor completely digital implementations. Looking for example at the carrier PLL, a hybrid implementation is shown in Fig. 1 . Hybrid implementations are particularly suited for high-datarates because the sampling rate of a hybrid receiver can be as low as 1 sample/symbol, whereas completely digital demodulation needs at a minimum 2 samples/symbol and more likely around 2.5 samples/symbol for adequate performance [5] . Moreover, in hybrid receivers the downconversion is done in the analog domain which simplifies the receiver's digital portion.
As for advantages of the hybrid topology vis-à-vis analog implementations, they include: (a) superior repeatability and filter specifications; (b) arbitrarily complicated phase detector and filter structures; (c) the long-term stability and phase noise characteristics achievable with DDS chips are difficult to attain using a VCO; (d) enhanced testability and probing. This paper focuses on fixed-point hardware implementations. The reason is that, first, such implementations will always have distinct performance advantages since they can always be made to operate faster than any software and/or floating point implementation. Secondly, more intriguing challenges are present when trying to design a fixed-point hardware system: whereas a software system could be implemented using a high-level language, when implementing a fixed-point hardware system the designer must explicitly address such issues as scaling and overflow, logic resource usage, implementation of mathematical operations, etc.
This paper is a shortened and revised summary of an invited workshop [6] , [7] , to which the reader is referred for a more detailed treatment. In this paper treat only the carrier PLL, though the derivations are general and can be applied to other PLLs [6] , [7] .
Analog PLL Theory
As a preliminary step before discussing hybrid PLLs, we now establish a step-by-step design procedure for analog PLLs. Subsequently, we shall use this procedure as a basis upon which we shall develop a step-by-step procedure for hybrid PLL design. The PLL's purpose is to produce an output signal y o (t) that is in frequency and phase synchronization with the input signal y i (t) The fact that both signals contain the term ω i t doesn't necessarily imply that their frequencies are equal; we have simply incorporated the frequency-error induced phase difference, defined as t ω ∆ ⋅ , into 0 ( ) t θ (see [8 Sec. 3 .1]). 
2cos(2 )
If we assume that the loop filter eradicates the double frequency term, the effective phase detector output is:
Eq. (6) is problematic because it is a nonlinear differential equation (due to the function PD). For the current analysis, we assume the phase detector has a sinusoidal function, i.e. that:
K d is called the phase detector gain (for example for a multiplier phase detector we have from (2) that K d =0.5AB). As we shall see, we suffer no true loss of generality from the assumption of a sinusoidal phase detector function, since linear-model analysis assumes that the PLL is locked, and then the phase detector curve shape is irrelevant and the only important [6] parameter is K d .
The loop filter
The majority of PLLs are 2 nd -order loops, for which the loop filters are extremely simple [9 Sec. 2.4] . We assume that the loop filter is:
For discussion of other loop filters see [7] , [6] , and . The filter of (8) is easily implemented as a resistorcapacitor network [9 Chap. 2] , but we will not discuss this implementation since we are interested in a digital implementation (see Fig. 1 ).
Nonlinear PLL equation
We can arrive at a nonlinear differential equation for the analog PLL. Define for convenience the loop gain as
We can then use (6)- (8) along with the equivalence of the Laplace variable to the derivative operator in the time domain (that is,
) to arrive at [8 Sec. 3.5] :
Linearized PLL model
Eq. (9) is a nonlinear differential equation that is difficult to solve theoretically (although it is very easy to simulate). For theoretical analysis, we assume that the PLL is locked and that the phase error is small enough, i.e. 
Solutions of the linearized PLL equation
we can solve (10) in the time domain [9 Chap. 4] , [8 Chap. 5] , but it is more instructive is to solve in the Laplace domain. In the Laplace domain, the VCO transfer function is (from inspection of (5))
The open loop function is (see Fig. 4 ):
The closed loop function of the PLL linear model (Fig. 4) 
This corresponds to the transfer function of a second order system, whose denominator is often written in the form of: 2 2 ( ) 2 n n Q s s s ζω ω = + + (13) where ζ the damping ratio and ω n is the natural (radian) frequency. ζ and ω n have been extensively studied in control literature [9] [10] [11] [12] . Comparing (12) and (13), we find that:
and using (12) , (13) and (14):
and by plugging (14) and (15) into (12) we get:
As is customary, we assume a high-gain loop, as defined by:
There is no true loss of generality incurred by making this assumption, as virtually all PLLs (indeed, all control loops) are designed to have a high loop gain. This is because a high loop gain makes the loop bandwidth and damping factor relatively insensitive [10 Chap. 14] to variations in the gains K d , K a , and K V . The magnitude of the relationship (17) is usually also determined by the allowable steady-state error [6] , [8 Chap. 5] . For example, for a 2 nd -order PLL with F(s) of (8) 
which is a transfer function form that has been analyzed extensively in PLL and control systems literature [9] [10] [11] [12] .
Choice of Analog PLL Parameters 3.1. Choosing ζ
The choice of ζ is determined by our desire to achieve the fastest PLL response but with minimal overshoot. Moreover, we would like ζ to be optimal in terms of noise performance. We choose ζ=0.95 because: (a) it is a good compromise between the optimal values for several optimization criteria [9 
Finding the optimal ω n
The choice of ω n in coherent wireless communications is usually determined by the desire to achieve minimal phaseerror jitter 2 e θ . This is because phase-jitter directly affects the error rate [ 
Optimization for minimal jitter is to choose n ω so that this is minimized. For detailed discussions see [15] and [8 Chap. 8 ].
A Step-by-Step Analog PLL Design Procedure
We now present a step-by-step analog PLL design procedure. Assumed given are the input carrier phase noise characteristics and the input SNR (Signal to Noise Ratio) denoted [16 Chap. 6] , [6] as SNR i . Clearly, the PLL must operate at various SNR i , so we should design the PLL for the worst-case value (i.e., the lowest SNR i ). The procedure is shown in Fig. 5 . The most difficult step is determination of ω n , since this involves addressing the input carrier phase noise and the effects of thermal noise [6, 8 Chap. 8, 15 Chap. 10] . Fig. 5 was developed with no specific modulation in mind, but it is applicable to any modulation type. The only modulation-dependant variables are (a) the phase detector and its gain K d , and (b) the squaring loss (which influences [9 Sec. 11 [17] , [18] , [19] , [2 Chap. 5, 6] .
Hybrid Implementation of the PLL 5.1. Preliminary assumptions
This paper's goal is to develop a procedure for the design of fixed-point hardware hybrid PLLs. We assume a high data rate (e.g. above 1 MHz) since (as discussed in the introduction) it is for such high data rates that hybrid receivers are a particularly attractive design choice. A phase detector in digital communications usually produces a new phase estimate for every symbol [2] , [3] , [20] , and thus the phase detector output sample rate is assumed to be the same order of magnitude as the symbol rate. Thus, denoting the phase detector output sample rate as f p =1/T p , we have that 1/T p ~1/T where 1/T is the symbol rate and "~" denotes equal orders of magnitude. Again, this rate is assumed to be at least 1 MHz. Carriers that are used in coherent communications generally have phase noise whose content can be assumed non negligible up to a distance from of at most several KHz. For example in the DVB-S2 standard [21] , the specification is -68 dBc/Hz at 1 KHz offset from the carrier. The PLL's natural frequency in Hz f n =ω n /2π is in general the same order of magnitude as the bandwidth of the significant phase-noise content of the received carrier. While f n is dependent upon the receiver, we can safely say that the order of magnitude for f n is at most several KHz. An important conclusion is that we can safely say that: (19) is crucial, since it is why the analysis of the PLL discussed in this paper differs from that of just any hybrid control loop.
The Direct Digital Synthesizer (DDS)
The Direct Digital Synthesizer (DDS) in Fig. 1 is a chip that accepts a digital control word (the tuning word) at its input, and outputs a continuous-time sinusoidal waveform whose frequency corresponds to that tuning word. The DDS consists of a Numerically Controlled Oscillator (NCO) followed by a Digital to Analog Converter (DAC), followed by smoothing filter that rejects the image frequencies that appear at the DAC output [14 App . A], [22] . The frequency waveform produced by the DDS is always phase-continuous [22] , regardless of the frequencies through which its output transitions. The linearized PLL model for Fig. 1 is shown in Fig. 6 . An example DDS chip is the AD9851 manufactured by Analog Devices [22] . As seen in Fig. 7 , the tuning word controls the output frequency by determining by how much the phase of the output wave will increase each rising edge of the reference clock. Denoting this clock rate as f D =1/T D , the number of bits in the phase accumulator register as N and the tuning word as D w , the output frequency is: (20) is only valid if the Nyquist criterion is observed, i.e. that there are more than two samples of each period of the output sinusoid. This means that (20) holds if:
(21) Thus the largest tuning word that can produce a unaliased sinusoid will be w D =2
N-1 -1, and that waveform will have a frequency of
However, this limit cannot be achieved in practice because the post-DAC filter needed to suppress the first image frequency (which is [22] at f D -f w ) will be unrealizable. A more reasonable limit is f w,max ≈0.4f D .
Modeling of the DDS
The tuning word of the DDS is updated at a rate of f u =1/T u Hz that is much slower than f D , because several samples of the NCO must by traversed between updates of the tuning word for those updates to be expressed at the DDS output. Thus we may write:
The operation of the DDS within a PLL is illustrated in Fig.  8 . The mathematical model of the DDS is shown in Fig. 9 . To see why this model is valid, we compare the behavior of the systems in Fig. 9 and Fig. 8 (right) , and realize that that behavior is identical. It is emphasized that neither the DAC nor the VCO in Fig. 9 are in fact present in the DDS; they are purely virtual mathematical constructs. As a consequence, we can define the parameters of the DAC and VCO as we wish, so long as we insure that the input and output of the model in Fig.  9 behaves identically to that of the DDS. With that constraint in mind, we now characterize the virtual DAC and VCO. First, regarding the DAC, since the DDS is updated at a rate of f u that is the conversion rate of the virtual DAC in Fig. 9 . Furthermore, since the DDS output frequency remains constant between tuning word updates, the DAC in Fig. 9 must be of the ZOH (Zero Order Hold) type [11 Chap. 11] . We further decide that the DAC has unit gain, i.e. its output (which has the dimensions of Volts) is simply w D Volts when it has the tuning word D w at its input. Since the DAC is updated every T u seconds, we have [6] , [7] :
The DAC receives a digital signal as its input and outputs a continuous time signal, and hence strictly speaking it cannot To circumvent this obstacle, the DAC is modeled by a system with a transfer function of (23) 
Taking the Laplace transform of both sides of (24), we have:
Remembering that we have set the DAC in Fig. 9 Comparing this to the transfer function form of a VCO, which is K V /s we have by inspection: (23) and (26) allow us to express Fig. 9 in Laplace terms, shown in Fig. 10 . Hybrid PLL analysis is now facilitated by inserting the model of Fig. 10 into Fig. 6 which results in Fig. 11 . There is an unresolved issue in Fig. 11 : the sampling rates at the input and output of B(z) are f p and f u , respectively, and these aren't necessarily equal. To establish a relationship between f p and f u , recall that f p >>f n . Thus, we make the assumption that: f p ≥ f u >>f n (27) The magnitude discrepancy expressed in (27) can be quite dramatic. For example, consider a 100 MHz symbol rate data system (f p =100•10 6 Hz). Typical values for f n are f n~2 000 Hz. The DDS update rate (as we shall see shortly), would be, say, f u =2•10 6 Hz. Given the magnitude difference between f p and f u , we desire two things. First, we wish to find the minimum f u that allows the PLL to function acceptably. Secondly, we wish to implement the loop filter at the much lower clock rate of f u (rather than at rate f p ). It is thus evident that the sampling rate of the data at the input of B(z) must be reduced, through use of decimation. This is shown in Fig. 12 . The decimation ratio in Fig. 12 is: Fig. 12 we assume implicitly that the spectral content of ( ) 
Calculation of the Digital Loop Filter
The models obtained in previous sections allow us to now compute B(z) so that a desired closed loop transfer function for the PLL is achieved. In this section we assume that the virtual DAC (which is part of the DDS equivalent model), the decimation filter, and the decimator are all ideal (later we derive conditions that ensure that these assumptions are appropriate). Under those assumptions, the series of operations sampling at rate f p -decimation filter-decimator by M is Fig. 12 to Fig. 13 .
Comparing Fig. 13 to Fig. 4 it is seen by inspection that ˆ( ) F s in Fig. 13 corresponds to F(s) in Fig. 4 . Since all other loop components are equal, if we can design a filter B(z) so that ˆ( ) ( ) F s F s = this will imply that the closed loop transfer function of both systems will be identical.
Consequently, from inspection of the structure of ˆ( ) F s it is evident that B(z) may be deduced from F(s) by using the bilinear transformation method [23 Sec. 7.1] . We now proceed to find B(z). 
Now let us define ( )( ) . The direct-form II implementation [23 Chap. 6] is shown in Fig. 14. 
Determination of DDS Update Rate
The derivations of the previous section assumed that the (virtual) DAC converter in Fig. 9 is ideal. However, that DAC is not ideal, and actually that DAC is of the ZOH type (Fig.  10 ). This profoundly affects the PLL and constrains the rate f u above a certain minimum, which will now be calculated. Modifying Fig. 13 to account for the DAC's non-ideality, while still assuming ideal decimation, results in Fig. 15 . While amplitude interference from aliases passed by the non-ideal virtual DAC will cause a degradation in performance, it will not in general lead to PLL instability [6] . In contrast, the effect upon the phase response of the signals in the PLL is much more important and will lead to a higher necessary minimum f u . We analyze the effect upon the phase response by considering the effect on the Phase Margin (PM) of the PLL.
We now assume that the aliases contribute negligibly to the signal at the output of the DAC (see [6] for justification). Furthermore, we assume that the ZOH magnitude distortion of the primary reconstructed signal is negligible (this is true due to f u >> f max ; see Sec. 6 and [6] ). Under these assumptions, we can reduce Fig. 15 to Fig. 16. A comparison of Fig. 16, Fig.  13, and Fig. 4 , shows that in Fig. 16 the open loop transfer function will be:
The phase margin of a PLL is defined as the number of degrees above -180 that the phase of the open loop response possesses when its magnitude is unity [13 Chap. 2] . For the systems of Fig. 13 and Fig. 4 
DDS Equivalent Model
Sequence to Impulse Train Converter rate = 1/T u that is, the crossover frequency remains unchanged.
Evaluating the phase margin of ˆ( ) G s , we have:
We thus observe a decrease in the phase margin by 180•f C T u degrees. A decrease in the phase margin is detrimental, since it may cause an underdamped response of the PLL or its outright instability [13 Chap. 2] . Thus we must find a condition on f u so that that decrease is tolerable. If we are willing to tolerate a phase margin decrease of d A degrees, from (34): (37) Exact linear-systems analysis of models which include terms of the type exp(-sτ) is impossible due to the nonpolynomial nature of this term [10 Sec. 7.12] . Instead, we compute the effect on the PM and then using the relationship ζ≈0.01•PM we define an effective ζ, denoted as ζ eff , defined as ζ eff 0.01•PM. We can then think of the PLL with as a secondorder system with damping factor ζ eff . The allowable phase margin degradation will be determined from the values of ζ eff that the designer is willing to accept (usually 0.5≤ζ eff ≤0.75). Hence the design starts by determining the acceptable range for ζ eff , which then (taking into account other PM degradation sources (such as latency, see Sec. 10) and considering that the initial PM is at most 75 o ) determines the allowable PM degradation due to the DDS, denoted d A , which in turn determines the DDS update rate via (37). Acceptable [9 Chap. 7] values of ζ are 0.8≤ζ≤1.3 so we can't play much with ζ in (37) but only with f u . For example, to limit the degradation to d A =3 o then, for ζ=0.95, using (37) we get f u >114f n .Once f u is determined, the phase margin is computed and the system is considered a second-order system with natural frequency f n and a damping factor of ζ eff =0.01•PM.
Implementation of the Decimation Filter
We shall now analyze the decimation filter and find a structure for its efficient implementation in hardware. The decimation filter ( ) DF h n is an FIR filter [24] whose length we shall denote as L, and thus its output y(k) as a function of its input x(n) is:
The direct-form [23 Chap. 6] realization of (38) often requires too many resources for it to be implemented in fixed-point hardware, because it has many multipliers and addition circuits that need to operate at rate f p . Some improvements can be attained by using alternate topologies [24 Figs. 6.28, 6 .31] , but even those structures often require excessive resources. To make efficient hardware implementation possible, consider the unitgain rectangular window filter, that is:
which has the transfer function [6] , [7] :
which has a (one-sided) passband of [0,2π/L], so for a passband of [0,π/M] we choose L=2M, and from (40):
there is some distortion in the passband, and the sidelobes are relatively high. However, it can be shown [6] that these effects cause an increase of at most 10.9% (=0.5 dB) in the noise power inside the PLL, which for nearly all applications is acceptable. Moreover, dramatic savings in logic resources can be attained if (39) is implemented wisely, as we now show. Combining (38) and (39), and since L=2M: however if 2M is chosen to be a power of 2, then division by 2M can be approximated by discarding the lower log 2 (2M) bits, which is a trivial operation. A simplified diagram of the implementation of the Integrate and Dump (IAD) module is shown in Fig. 18 . Note that the division by 2M is done within the IAD module (in order to reduce the number of bits required for the output register). Also, the output rate is 1/(2T u ), not 1/T u since the IAD will be used in conjunction with a second IAD module (as in Fig. 17 ), which will produce an output sample at a rate of 1/(2T u ) Hz halfway between each pair of samples of the first IAD module. The "control logic" cloud tells the IAD module when to "integrate" and when to "dump", and thus must work in tandem with the adjacent IAD module. Hence it is advantageous to implement the structure of Fig. 17 as a single module. The "control logic" cloud is essentially a carefully controlled counter that controls the timing of signals within the IAD modules, and is easy to implement in hardware.
In conclusion, we see that we can implement a nearly ideal decimation filter with only two adders (one per IAD), 4 registers (2 per IAD), and some simple control logic. This is a huge reduction in logic resources as compared to the directform implementation.
Effects of Implementation Latency
In Section 8 we found that the update rate of the DDS has a measurable impact on the PLL's phase margin. This is actually a special case of the effect of a delay element within a control loop on the latter's phase margin.
Assume we have designed the decimation filter as per (41). We can incorporate this filter into the model of Fig. 12 as shown in Fig. 19 . The effects of the decimation filter magnitude response on the loop are negligible (see Sec. 9 and [6] ) and thus neglected here. Regarding the delay introduced by the decimation filter, with transform exp(-jΩ•0.5(2M-1)), it must be taken into account. Taken at a rate of f p =1/T p this delay in the Laplace domain is exp(-sT p •0.5 (2M-1) ). We thus have the PLL model of Fig. 20 . We also insert into the open loop transfer function a pure delay exp(-sT I ) that models any implementation latencies totaling T I seconds (e.g. delays associated with FPGA or ASIC data path, etc.). The revised model is in Fig. 21 . The impact of the delays in Fig. 21 is best analyzed as it diminishes the phase margin. Define the total implementation delay as:
(44) Similarly to (31)-(37), it is easily shown that to bound the latency's impact on the PM by d L degrees, we must have: [6] , [7] .
Procedure for Designing a Hybrid PLL
We are now in a position to outline a step-by-step procedure for designing a hybrid PLL. The procedure is shown in Fig.  23 . The step-by-step procedure for designing a hybrid PLL is shown in Fig. 23 . The steps in Fig. 5 and Fig. 23 can be easily performed by a computer program, and the values can be directly incorporated into the design of the ASIC or FPGA. A schematic of the of the digital logic inside the FPGA or ASIC is shown in Fig. 24 . Some explanation may be in order regarding the addition of w CEN to the output of the loop filter. This is because the PLL provides the correction to the DDS that is needed in order to maintain lock. The tuning word w CEN is the tuning word that corresponds the center frequency of the DDS (= the center frequency of the VCO in the DDS equivalent model of Fig. 9 ). For a detailed example of the application of the design procedure to the design of an actual receiver, the reader is encouraged to look at [6 Chap. 9] . Finally, further analysis can be found in [6] and [7] , which may be obtained by contacting the author.
