Bang-bang phase-locked loops (BBPLLs) are inherently nonlinear due to the hard nonlinearity introduced by the binary phase detector (BPD). This paper provides an exact statistical analysis of the steady-state timing jitter in a firstorder BBPLL when the reference clock is subject to accumulative jitter. By elaborating on the analogy of viewing a first-order BBPLL as a single-integration delta modulator (DM) in the phase domain, we are able to relate hunting jitter and slew-rate limiting in a BBPLL to granular noise and slope overload in a DM. The stochastic timing-jitter behavior is modeled as a signdependent random walk, for which we obtain the asymptotic characteristic function and analytical expressions for the first four cumulants. These expressions are applied to the BBPLL to statistically analyze the static timing offset and the RMS timing jitter, including the effect of a frequency offset. The analysis shows that the RMS timing jitter is constant for small RMS clock jitter and grows quadratically with large RMS clock jitter, and that there exists an optimal bang-bang phase step for minimum RMS timing jitter. Computing the kurtosis reveals the effect of the BPD nonlinearity: the timing jitter is largely non-Gaussian.
I. INTRODUCTION
B ANG-BANG phase-locked loops (BBPLLs) have been widely used for clock and data recovery (CDR) in serial data links [1] , primarily due to their high-speed capabilities and inherent sampling phase alignment [2] . A typical implementation based on the charge-pump (CP) architecture is shown in Fig. 1(a) . The distinct feature of BBPLLs is the binary phase detector (BPD) which binary quantizes the phase difference between input data and voltage-controlled oscillator (VCO) clock, generating only early/late phase-error information for the loop filter (LF). To suppress patterndependent jitter in a CDR application, the BPD is usually a tristate realization such as the Alexander topology [3] . BBPLLs have also been demonstrated for high-bandwidth digital frequency synthesis [4] , [5] . (DBBPLL) implementation is shown in Fig. 1(b) , which employs a D flip-flop as BPD [4] . Instead of the CP-based LF driving the VCO, a digital LF tunes the frequency of a digitally controlled oscillator (DCO) by a digital control word. Although the bang-bang loop principle has been successfully applied, a thorough understanding of the loop behavior is far from complete. The main reason is that the binary quantization introduced in the PD makes the loop inherently nonlinear, thus complicating its analysis. Traditionally, a wide class of PLLs can be linearized in steady state, allowing linear transfer functions to be applied in the analysis [6] . In BBPLLs, however, the hard nonlinearity introduced by the BPD causes limit cycles in steady state. In this case, a nonlinear analysis has been applied to derive conditions on the loop stability [7] , and to investigate slew-rate limiting when the reference clock is frequency-modulated [8] . In practice, phase noise on the clock sources causes jitter on the clock edges, mainly in the form of non-accumulative jitter (white phase noise) and accumulative jitter (random-walk phase noise) [9] , [10] . Since jitter effectively smoothes the binary PD characteristic [11] , it is common to linearize the nonlinear loop [12] and apply linear transfer functions in the analysis [13] . Unlike a linear PD, the gain in a linearized BPD depends on the timing jitter at its input. For non-accumulative jitter, initial investigations considered the BPD as a stand-alone device [14] , and only recently have the loop dynamics been taken into account to obtain a more accurate BPD gain expression [15] .
Despite its widespread use, linearizing the binary loop precludes a thorough understanding of the nonlinear phasejitter behavior. This is true particularly if clock jitter in the loop is small, in which case limit cycles and jitter interact strongly. The limits of a linear BBPLL analysis were pointed out in [13] , which motivates our nonlinear stochastic analysis.
A. Contribution of this Paper and Relation to Other Work
Recently, Markov theory has been applied to more accurately model the timing jitter in a first-order BBPLL [15] - [18] , an approach known from early investigations into digital PLLs [19] . For a reference clock with non-accumulative jitter, Da Dalt [15] modeled the timing jitter as a Markov chain and derived a general BPD gain expression. By modeling the loop as a delayed Markov chain, Chun and Kennedy [16] gave an extension to a DBBPLL with nonzero loop delay and evaluated the timing-jitter performance. For a reference clock with accumulative jitter, our approach in [17] was to model the timing jitter as a discrete-time Markov process. A numerical solution of the Chapman-Kolmogorov equation allowed us to compute the steady-state timing jitter probability density function (PDF) and reveal its non-Gaussianity. An extension to a DBBPLL with nonzero loop delay was given in [18] .
The aim of this paper is to provide an exact statistical analysis of the steady-state timing jitter in a first-order BBPLL when the reference clock is subject to accumulative jitter, thus analytically verifying our results in [17] and complementing previous work on the non-accumulative jitter case [15] , [16] .
The first contribution is to elaborate on the analogy between BBPLLs and delta modulation, which has appeared in the literature [1] , [20] , [21] but has never been fully exploited. In Sec. II a detailed comparison of the difference equations of both systems shows why and to what extent a first-order BBPLL can be viewed as a single-integration delta modulator (DM) in the phase domain. The analogy provides an intuitive description of the nonlinear loop behavior, and allows us to relate dither/hunting jitter and slew-rate limiting in a BBPLL to granular noise and slope overload in a DM.
The main contribution of this paper is the analysis of a sign-dependent random walk (SDRW) and its application to a first-order BBPLL. In Sec. III we formally define the SDRW as a RW whose step distribution depends on the sign of the RW's current position; a similar model was considered in [22] , [23] . It will be shown that the SDRW is a suitable model for the loop's statistical timing-jitter behavior. The analogy with a DM enables us to apply and generalize existing theory on delta modulation. In particular, extending Fine's results in [24] , we investigate the limiting behavior of the SDRW by deriving its asymptotic characteristic function, from which analytical expressions for the first four cumulants will be obtained. In Sec. IV the cumulant expressions will be applied to statistically analyze the timing jitter. We will show how varying the RMS clock jitter and the frequency offset influences the static timing offset; that the RMS timing jitter is constant for small RMS clock jitter and grows quadratically with large RMS clock jitter; and that there exists an optimal bang-bang phase step for minimum RMS timing jitter. Computing the kurtosis reveals that the timing jitter is largely non-Gaussian.
II. FIRST-ORDER BBPLL AS DM IN THE PHASE DOMAIN
This section elaborates on the analogy of viewing a firstorder BBPLL as a single-integration DM in the phase domain. We compare in detail the difference equations of both systems and use delta modulation terminology [25] - [27] to provide an intuitive description of the nonlinear BBPLL behavior.
A. Delta Modulation: Principle and Discrete-Time Model
A DM operates a 1-bit quantizer, a sampler and an integrator inside a feedback loop to provide a staircase approximation of an oversampled analog signal, as shown in Fig. 2 for a sinusoidal signal x(t) [25] . At every sampling period T s , the staircase signalx(t) that approximates x(t) is increased or decreased by the quantization step size δ. In this manner, a DM tracks an analog signal by changing the steps in the direction of the signal's slope, a behavior also exhibited by a BBPLL.
The performance of a DM is limited by two types of distortion [26] . Quantization distortion (granular noise) is caused by the granularity of the quantizer and occurs wheñ x(t) hunts around x(t). Slope overload distortion is due to the DM's limited tracking speed and occurs when the slope of x(t) exceeds the slope δ/T s ofx(t). For a given sampling period T s , a small δ will reduce granular noise but easily lead to slope overload; a large δ will allow the DM to track a fast varying signal but at the cost of increased granular noise. Minimizing both distortions results in a trade-off in selecting δ, a trade-off that also exists in a BBPLL.
Although implemented mostly with analog circuits and thus operated in continuous-time, a DM can be equally described at the sampling instants by the discrete-time model shown in Fig. 3 [27] . Given the sequence of input samples x n = x(nT s ) for n ≥ 0, taken from x(t) every T s seconds (oversampling), a DM generates the staircase approximation recursively bỹ
wherex n is the staircase value for nT s ≤ t < (n + 1)T s and the quantizer is modeled by the signum function, defined as sgn x = 1 for x ≥ 0 and sgn x = − 1 for x < 0.
The difference between the current input sample x n and the previous staircase approximationx n−1 is binary quantized into ±δ, and added tox n−1 to form the approximation for the next input sample. The accumulation of the quantizer output values is represented by the discrete-time integrator with transfer function z −1 /(1 − z −1 ), the additional delay being part of any sampled feedback loop. Viewing the valuex n−1 that is subtracted from x n as a predictionx n =x n−1 , we can define Discrete−time integrator the prediction error
which expresses the amount by which the input may be predicted exactly. Recursively, the sequence of predictions {x n } is produced bŷ
and the corresponding prediction error sequence {e n } by
Equation (4) illustrates the dependence of the prediction error on the discrete-time derivative 1 x n+1 − x n and thus on the slope of the analog signal x(t). In particular, slope overload is prevented if the no-overload condition [26] max
is satisfied, a condition that determines slew-rate limiting in the BBPLL.
B. Phase-Domain Model of a First-Order BBPLL
For a first-order BBPLL, both architectures in Fig. 1 may be represented by the block diagram in Fig. 4 , assuming a 100% data transition density for the CDR loop in Fig. 1 (a) [2] . We now rederive the difference equation governing the phase error in the phase domain [2] , [7] , following a derivation similar to the one in [19] . To emphasize its sampling nature, the BPD is represented as a sampler whose input signal (the reference clock) is sampled by the VCO clock. Since the loop is first order, the LF consists of a proportional path with gain coefficient K P . The VCO is modeled as a linear block, with nominal frequency f 0 and frequency gain K F .
The reference clock signal is a square wave of the form v r (t) = sgn[sin(ω 0 t + θ r (t))]
which alternates between +1 and −1. We assume that its frequency be equal to the nominal VCO frequency ω 0 = 2πf 0 (locked loop); any phase and frequency deviations will be incorporated into the excess phase θ r (t), such as random phase noise and an actual frequency offset between both clocks. The reference clock is sampled by the VCO clock; the sampling instants are the times of occurrence of the rising VCO clock edges. Denoting the time of the nth sampling instant by t n , the sample value taken at t n is the BPD output
where θ r,n = θ r (t n ) is the sampled reference clock phase. The BPD output is either +1 or −1, indicating whether the reference clock has been sampled late or early with respect to its nth rising clock edge; no attempt is made to measure the actual time deviation from the clock edge, a feature significantly different from a linear PD [1] . The BPD output drives the VCO through the LF gain K P , changing the VCO frequency so as to bring the sampling instants closer to the rising reference clock edges. Since the BPD values are binary, the VCO toggles between the two frequencies f 0 +f bb and f 0 −f bb which are set by the bang-bang frequency step f bb = K P K F . During the nth update period-the time between the consecutive sampling instants t n and t n+1 -the VCO operates at the frequency f 0 + f bb ε n , and so produces the nonuniform sampling instants
for n ≥ 0. In a practical application, the bang-bang frequency step f bb is much smaller than the nominal VCO frequency f 0 , in a CDR application typically around 0.1% [2] . Therefore, f bb /f 0 ≪ 1, and with the approximation 1/(1 + x) ∼ = 1 − x for x close to zero we can write (8) as
where θ bb = 2πf bb /f 0 is the bang-bang phase step of the loop. Writing (9) as a sum, assuming t 0 = 0, and plugging the result into (7) yields ε n = sgn sin θ r,n − θ bb
The term involving the sum is the VCO phase θ v,n during the nth update period. Writing it recursively as
shows that the BPD output ε n causes the VCO phase to ramp up or down by θ bb during every update period. Now define the phase error φ n as
so that ε n = sgn φ n for −π ≤ φ n ≤ π from (10). Thus, with (11) and (12) we obtain the difference equation which governs the phase-error evolution as a function of the reference clock phase samples. The BBPLL operation in the phase domain, as expressed by (11)- (13) , is summarized by the discrete-time model in Fig. 5 .
C. First-Order BBPLL as DM in the Phase Domain
Comparing the models in Fig. 3 and Fig. 5 shows that for the given approximation, a first-order BBPLL can be viewed as performing single-integration delta modulation in the phase domain. In a DM, the sequence {x n } produced by (3) forms a sequence of predictions for the input samples {x n }. Changes in the prediction occur in steps of the step size δ, and the introduced error is the prediction error e n . By analogy, in a first-order BBPLL, the sequence of VCO phases {θ v,n } produced by (11) forms a sequence of predictions for the reference clock phase samples {θ r,n }. Changes in the prediction occur in steps of the bang-bang phase step θ bb , and the phase error φ n is the equivalent of the prediction error e n in the DM. Furthermore, the sampling frequency f s in the DM corresponds to the nominal VCO frequency f 0 in the BBPLL. Since a practical DM contains a zero-order hold circuit in its feedback loop [25] , the staircase signal in Fig. 2 must be replaced by a signal with ramps, so that the behavior of both systems also correspond in continuous-time (compare Fig. 8 in [2] ).
Referring to [28] , Walker [2] provides an analogy between a second-order BBPLL and a sigma-delta modulator (SDM), showing that the proportional path performs first-order SD modulation on the frequency offset. The analogy with a DM has previously only been mentioned. Greshishchev et. al. [1] point out the similarity of a second-order binary PLL to a double-integration DM with prediction. In a slightly different context, an analogy with a DM has been established for the delay/phase-locked loop with BPD proposed in [20] . Interestingly, Muller and Leblebici [21] note that the formula for the onset of slew-rate limiting in a first-order BBPLL, which is known from [2] , corresponds to the slope overload condition in a DM, which we will explicitly show below.
Although a SDM and a DM are intimately related [29] , the argument to interpret the BBPLL in favor of a DM is the tracking behavior that is fundamental to both systems. Clearly, tracking performed by a DM in the voltage domain corresponds to tracking performed by a BBPLL in the phase domain. This implies that the distortions due to quantization described in Fig. 2 also occur in a BBPLL. Here, granular noise is commonly called self-generated hunting jitter [2] , [21] or dither jitter 2 [14] , [30] , referring to the VCO phase hunting or dithering around the reference clock phase. Slope overload refers to slewing or slew-rate limiting [20] , [31] , usually in the context of tracking a modulated reference clock. From the analogy with a DM, slewing in a first-order BBPLL is prevented if the derivative of θ r (t) satisfies condition (5):
As an example, consider a sinusoidally phase-modulated reference clock with excess phase θ r (t) = A sin(2πf m t), where A is the modulation amplitude and f m is the modulation frequency. Using (14) , slewing is prevented if A < f bb /f m , which is the formula given in [2] , [8] , [21] . Similar to the trade-off in selecting the step size δ in a DM, there is a tradeoff in selecting the bang-bang phase step θ bb . A small θ bb will reduce hunting jitter, but it will also restrict the maximum modulation frequency in order to avoid additional jitter from slope overload; a large θ bb will allow a higher modulation frequency, but it will incur increased hunting jitter due to the coarser phase updates. In Sec. IV-D we will see that this tradeoff also exists in a statistical sense-in choosing the optimal phase step to obtain the minimum RMS timing jitter.
D. Stochastic Difference Equation
Having elaborated on the analogy between a BBPLL and a DM, we now consider the stochastic difference equation that describes the stochastic phase-jitter process in the presence of accumulative reference clock jitter, complementing previous work on the non-accumulative jitter case [15] , [16] . As in these references we assume ideal PLL building blocks, but we also consider a frequency offset ∆f between reference clock and VCO clock, which almost always occurs in practice [6] . Hence, the excess phase of the reference clock is θ r (t) = ∆ωt+ω 0 α(t), where ∆ω = 2π∆f . The phase-noise term α(t) is a (nonstationary) Wiener process with linearly increasing variance, which gives rise to the Gaussian accumulative jitter [10] . Sampling θ r (t) at the nonuniform time instants yields
where θ r,n = θ r (t n ) as before. Although α(t) has stationary increments [10] , the nonuniform sampling instants cause the increment process α(t n+1 ) − α(t n ) to be nonstationary. But since the deviations from the uniform sampling instants are small, we can use the uniform sampling approximation t n+1 = t n + T 0 [2] to approximate the increment process by
where the jitter random variable (RV) ξ n is Gaussian distributed with mean zero and variance σ 2 (the standard deviation σ will be called RMS clock jitter) and the sequence {ξ n } is independent, identically distributed (IID). Replacing the increment process in (15) by ξ n , substituting (9) for t n+1 − t n and plugging the result into (13) gives the stochastic difference equation describing the phase jitter:
Here θ ′ bb = (1 + ∆f /f 0 )θ bb is the modified bang-bang phase step; neglected in [2] , [7] , it accounts for the phase update due to the frequency offset ∆f and was considered by Hsieh and Yang [32] (see also [19] for the zero-crossing DPLL). Nonetheless, since ∆f is typically much smaller than f 0 [6] we assume θ ′ bb ∼ = θ bb in the following.
To be consistent with the aforementioned literature, the remainder of this paper treats the equivalent timing-jitter model. Since the timing jitter at the nth sampling instant is ∆t n = φ n /ω 0 , we obtain from (16) the stochastic difference equation
with ∆T = ∆f /f 2 0 being the period deviation. The period quantization of the VCO clock is K = K P K T (loop gain), where the VCO period gain [9] . For convenience we use the terminology introduced for the phasedomain model and call ∆T the frequency offset and K the bang-bang phase step.
Given the stability condition |∆T | < K [7] , we assume without loss of generality that ∆T is positive and irrational; the latter assumption is justified because the natural tolerances in an implementation make the exact frequency of the reference clock and the free-running VCO clock unknown, and with probability 1 they will be irrational [6] , [7] . For simplicity we take ∆T ∈ [0, K) and let ∆T = 0 denote the case when both clock frequencies are practically identical, i.e., when ∆T is a small irrational number.
Given the stochastic difference equation (17), the statistical analysis of the steady-state timing jitter will be based on the SDRW theory developed in the following section.
III. SIGN-DEPENDENT RANDOM WALK THEORY
The SDRW studied in this section is defined as a RW whose transition probability (or step distribution) depends on the sign of the RW's current position. To investigate the asymptotic behavior of the SDRW, we derive its limiting characteristic function (CF), from which analytical expressions for the first four cumulants will be obtained.
A. Sign-Dependent Random Walk Model
First, we recall the definition of a RW [33] . Let ξ, ξ 1 , ξ 2 , . . . be a sequence of IID RVs with distribution function F . Let S 0 = 0, and consider the nth partial sum of the RVs, recursively defined as
for n ≥ 0. The sequence of RVs {S n } is called a RW starting at the origin, and ξ is called the step RV (or step) with the step distribution F . Now let ξ ± , ξ ± 1 , ξ ± 2 , . . . be two sequences of IID RVs with respective distribution functions F ± . Let U 0 = 0, and define in accordance with (18) a sequence recursively as
for n ≥ 0. The sequence of RVs {U n } is called a signdependent RW starting at the origin because the RV realizing the next step depends on the sign 3 of U n . Accordingly, ξ + and ξ − are called the positive and negative step RVs and correspond to the steps taken in the positive and negative halfline, respectively. Throughout this section we assume that the distribution functions F ± (i) are continuous, and satisfy 0 < F ± (0) < 1 so that the steps taken in either half-line may be both positive and negative; (ii) possess moments of sufficiently high order so that the cumulants derived later exist; (iii) have mean values satisfying µ + < 0 and µ − > 0 (drift conditions) so that the SDRW exhibits a convergent limiting behavior.
Although no proof for the statement in (iii) is given, the assumed drift conditions may be intuitively justified as follows. A RW has the property (see Theorem 1 in Appendix A) that if the mean of the step ξ is positive (µ > 0), the RW will drift to ∞ with probability 1. Similarly, if the mean of the step is negative (µ < 0), the RW will drift to −∞ with probability 1. This behavior implies for the SDRW that if µ + < 0 and µ − > 0, the SDRW will drift to −∞ when in the positive halfline, and to ∞ when in the negative half-line, so that it will eventually fluctuate around the origin, exhibiting a convergent limiting behavior.
We mention that two similar models emphasizing the sign dependency have recently been treated in the literature. Carlsund [22] considers a random walk on the integers with signdependent transition probabilities. In a similar work [23] , Lefebvre treats a Wiener process with infinitesimal parameters (mean and variance) depending on the sign of the process, a model he calls an asymmetric or sign-dependent Wiener process. Both models are different from the SDRW defined above, and only the first-passage time problem is studied.
Finally, to place the BBPLL in the context of the SDRW, consider the timing-jitter model (17) and split the signum function according to whether ∆t n is nonnegative or negative:
Since the jitter RV ξ n has zero mean, the model (20) can be obtained from the SDRW model (19) by setting µ + = ∆T − K, µ − = ∆T + K and σ 2 ± = σ 2 , where the latter is the variance of the Gaussian reference clock jitter. Given these parameters, we may therefore view the timing-jitter process as performing a sign-dependent Gaussian random walk (SDGRW)-that is, a SDRW with Gaussian step distributions. This relation leads us to investigate the limiting behavior of the SDRW in the following, and apply the obtained results to the steady-state behavior of the timing jitter in Sec. IV.
B. Limiting CF
The limiting behavior of the SDRW is obtained from the convergence of the sequence of RVs {U n } to some limiting RV U as n → ∞. Given the various modes of convergence, we will consider convergence in distribution. Denoting the distribution of U n as F Un , the sequence {U n } is said to converge in distribution to U if the sequence of distributions {F Un } converges to some limiting distribution F U [33] . As in [24] , it is more convenient to investigate the convergence of the sequence of CFs {φ Un } to some limiting CF φ U , where the CF φ Un of the SDRW at epoch n is defined as
with z being a real variable and E denoting statistical expectation. (For notational simplicity, the argument z of a CF will be omitted throughout.) We now derive the limiting CF φ U by extending Fine's result for a DM [24] . In the following, RVs with subscript ′′ + ′′ ( ′′ − ′′ ) will refer to a RW with step RV ξ + (ξ − ). (For example, M + + is the maximum of a RW with step RV ξ + ). To obtain a recursion for φ Un , write (19) as
andǓ n = min{0, U n } is the negative part of U n with the CF
we have 
a recursion for φ Un . Because we are interested in the limiting behavior of the sequence {U n }, we let n → ∞ in (26) to obtain the limiting CF φ U . As shown in Appendix A, the limiting CF is given by
where
As in [24] , because φ U is a product of CFs, the limiting RV U can be decomposed into the sum of independent RVs
where M + + is distributed as the maximum of a RW with step RV ξ + , and M − − is distributed as the minimum of a RW with step RV ξ − (see Appendix A).
An illustrative interpretation of X can be given if ξ ± have identical distribution functions up to a shift of mean. Then, by writing φ ξ± = exp(izµ ± )φ ξ , where the step RV ξ has zero mean, the CF φ X in (28) becomes
Since the fraction is the CF of a uniform distribution [34, p.1880], we have the decomposition X = η + ξ, where η is uniformly distributed on [µ + , µ − ], so that finally
If, in addition, we assume µ + = −K and µ − = K where K > 0, then η is uniformly distributed on [−K, K] and we obtain Fine's (31) (after subtracting ξ) [24] . Besides providing insight into the limiting behavior of the SDRW, the decomposition into a sum of independent RVs simplifies the computation of the cumulants in the sequel.
C. Cumulants of the SDRW
Moments and cumulants describe the statistical properties of a RV or, equivalently, its distribution. Given a RV X with CF φ X , its kth moment EX k is defined as
and its kth cumulant c k,X as
where log φ X is the cumulant generating function 
and hence, by the cumulant definition (33) ,
Here, the last equality is due to the fact that E((Ŝ + n ) k ) is the kth moment ofŜ + n ({S + n } being a RW with step RV ξ + ) whose distribution is concentrated on [0, ∞). Similarly, from (72) the cumulant generating function of the minimum
from which we find the cumulants
Here, the last equality is due to the fact that E((Š − n ) k ) is the kth moment ofŠ − n ({S − n } being a RW with step RV ξ − ) whose distribution is concentrated on (−∞, 0].
The cumulants c k,X will be obtained from the corresponding moments EX k . In particular, evaluating the kth derivative of φ X in (28) at zero and using the moment definition (32), we obtain
Note that the first k moments of X exist if the first k + 1 moments of ξ ± exist (see also assumption (ii)). Inserting (39) into the moment-cumulant transformations [34, p.371] , and the result into (34), gives the following expressions for the first four cumulants of the SDRW:
• First cumulant (mean):
• Second cumulant (variance):
• Third cumulant (third central moment):
• Fourth cumulant:
Equations (40)-(43) express the first four cumulants in terms of the moments of the step RVs ξ ± and the cumulants of the maximum and minimum RVs M + + and M − − , which are given by (36) and (38).
D. Cumulants of the SDGRW
In general, given arbitrary step distributions F ± , it appears difficult to further simplify (40)-(43) because the n-fold convolution F ± n (x) = P (ξ ± 1 + . . . + ξ ± n < x) in (36) and (38) may not be expressible in simple terms. For Gaussian step distributions, however, the convolution is easily performed, leading to the explicit cumulant expressions derived in the following. The Gaussianity of the step distributions also suggests the name SDGRW introduced above, in analogy with the widely used Gaussian random walk [33] .
To derive the cumulants of the SDGRW, let ξ 1 , ξ 2 , . . . be a sequence of IID RVs with Gaussian distribution function F . Given the mean µ and the variance σ 2 , the n-fold convolution F n (x) = P (ξ 1 + . . . + ξ n < x) is Gaussian with mean nµ and variance nσ 2 , and so the integrals in (36) and (38) can be evaluated explicitly. Since F has the density
we have from (36) that
and from (38), after the change of variable x → −x, that
and
with erfc being the complementary error function. The Gaussian distribution has the property that higher-order moments can be expressed in terms of the mean µ and the variance σ 2 since the latter two uniquely describe the distribution. In particular, the first five moments of the Gaussian distributed RV ξ are given by Eξ 2 = µ 2 + σ 2 , Eξ 3 = µ 3 + 3µσ 2 , Eξ 4 = µ 4 +6µ 2 σ 2 +3σ 4 and Eξ 5 = µ 5 +10µ 3 σ 2 +15µσ 4 [34, p.713]. Substitution of these moments together with (45)-(47) into (40)-(43) yields the following expressions for the first four cumulants of the SDGRW:
These expressions generalize those in [24] , [29] , [35] .
IV. STATISTICAL ANALYSIS OF FIRST-ORDER BBPLL USING SIGN-DEPENDENT RANDOM WALK THEORY
The SDRW theory developed in the previous section will now be applied to statistically analyze the steady-state timing jitter in a first-order BBPLL. As mentioned above, the timingjitter model (20) can be recovered from the SDRW model (19) by assuming the step RVs ξ ± to be Gaussian distributed with means µ + = ∆T − K and µ − = ∆T + K and equal variances σ 2 ± = σ 2 . Assuming ideal PLL blocks, the timing-jitter process may therefore be viewed as a SDGRW, and the requirements µ + < 0 and µ − > 0 for the SDGRW to exhibit a convergent limiting behavior agree with the condition |∆T | < K for the loop to be stable. Hence, we can immediately apply the cumulant expressions (53)-(56) in the analysis. In particular, the first cumulant (mean) and the second cumulant (variance) correspond to the static timing offset and the RMS timing jitter, respectively, and quantify the timing-jitter performance; the fourth cumulant is used in the definition of the kurtosis and quantifies the non-Gaussianity of the timing-jitter PDF.
The analytical cumulant expressions will be compared against Monte-Carlo simulation of (17), verifying also our numerical results in [17] . For the each simulation, 10 7 realizations of length 100 were generated and the statistics (histogram and cumulants) were computed for the last time instant. The mean, variance and kurtosis were computed using the corresponding MATLAB built-in functions. To illustrate the following discussion, Fig. 6 plots several timing-jitter PDFs obtained from the numerical method described in [17] .
A. Decomposition of the Steady-State Timing Jitter ∆t
We begin the analysis by recalling that (31) decomposes the limiting behavior of the SDRW into a sum of statistically independent contributions. Applying this decomposition to the steady-state timing jitter provides valuable insight into its nonlinear statistical behavior. More specifically, the steadystate timing jitter ∆t can be decomposed as
where the RV η is uniformly distributed on [∆T −K, ∆T +K] and the jitter RV ξ is Gaussian distributed with mean zero and variance σ 2 . The RV o is distributed (in the sense of equality in distribution) as the sum of the maximum and minimum of a RW as discussed above. Similar to the interpretation for a DM given in [24] and [29] , the decomposition (57) suggests interpreting the timing jitter in terms of the Gaussian reference clock jitter and additionally introduced static and dynamic jitter components, as further discussed in Sec. II-C. In particular, due to its uniform distribution we refer to the RV η as self-generated hunting jitter (static component), a term which was used by Walker [2] but without giving a precise definition. Hunting jitter is introduced by the coarseness of the binary PD characteristic and is characterized by the VCO phase hunting randomly around the jittered reference clock phase. In line with the term overload noise used in DM theory [25] , we refer to the RV o as overload jitter (dynamic component). Overload jitter is introduced by the inability of the VCO phase to track the reference clock phase faster than in steps of K, and is characterized by a sequence of equal VCO phase updates. Note also that the hunting jitter η does not depend on ξ, whereas the overload jitter o does depend on ξ (via the RW maximum and minimum, both of which depend on ξ). This implies that the timing jitter ∆t will be dominated by hunting jitter for small σ and by overload jitter for large σ. Moreover, η being independent of ξ also means that ∆t = η for σ = 0.
In other words, for a jitter-free reference clock, the timing jitter is uniformly distributed on [∆T − K, ∆T + K]. In the following we will repeatedly refer to the decomposition (57) to attribute observed statistical properties to the underlying RVs.
B. Static Timing Offset ∆t stat
A PLL synchronizes, in both phase and frequency, the feedback clock with the reference clock by bringing their clock edges in close alignment. The average time (phase) difference between the two clocks in the locked state is the static timing offset (static phase error). A static timing offset is a significant problem in a BBPLL-based CDR circuit because it results in the incoming data no longer being sampled at the center of the data eye, thus increasing the bit error rate [3] .
In our first-order loop, a static timing offset is caused by a frequency difference ∆T > 0 between the reference clock and the VCO clock; the LF needs to have a nonzero output to tune the VCO to its correct average frequency. When an analog LF is used, a static timing offset may be caused by a mismatch between the two charge-pump currents, which can be considered as an additional frequency offset. Moreover, the presence of reference clock jitter causes any static timing offset to increase from its zero-jitter level [6] . We define the static timing offset ∆t stat as the statistical expectation of the timing jitter in steady state, for which we obtain from (53) the expression
where G 1 is given by (48) and (49). Figure 7 plots ∆t stat of (59) as a function of ∆T with parameter σ. The agreement between theory and simulation verifies the obtained analytical expression.
For ∆T = 0, the static timing offset is zero independent of σ; the VCO clock samples the reference clock equally likely before and after the reference clock edges. The corresponding timing-jitter PDFs (0; σ) in Fig. 6 are symmetric about zero and thus have zero mean.
For ∆T > 0 and σ = 0, the static timing offset equals the frequency offset, as shown by the dashed line ∆t stat = ∆T ; a larger frequency offset requires a larger average BPD output to tune the VCO to its correct average frequency. The result for ∆t stat also follows from (58) since the mean of a RV uniformly distributed on [∆T − K, ∆T + K] is ∆T .
For ∆T > 0 and σ > 0, the combined effect of frequency offset and clock jitter causes the static timing offset to increase from its jitter-free level, as seen by the curves exceeding the dashed line. For small-enough σ, though, the static timing offset does not increase notably, even for a moderate frequency offset; this may be attributed to the dominant hunting jitter in this case. Note that since K = 1 in the figure, the loop is stable for |∆T | < 1, which explains the increase of the curves as ∆T tends to 1.
C. RMS timing jitter σ ∆t
For the loop design, a quantity of main interest is the timingjitter variance σ 2 ∆t ; in practice, it is common to consider the standard deviation σ ∆t and speak of the RMS timing jitter. Design questions to be addressed are how much jitter is transferred from the reference clock, how much jitter is generated by the loop itself and what is the minimum RMS timing jitter. An answer to these questions is given by (54), from which we obtain the expression
where G 2 is given by (48) and (50). Although this formula is exact, its dependency on the parameters K, ∆T and σ via the infinite series G 2 complicates an intuitive explanation. An approximate formula providing a simple rule of thumb for the loop design can be derived by upper bounding the infinite series. As shown in Appendix B, the result is
A comparison of the exact expression (60) with the approximation (61) and the simulation results is shown in Fig. 8 . The agreement between theory and simulation verifies the obtained analytical expressions, and suggests the use of (61) for the following explanations.
The timing-jitter behavior in different regions of the plot can be interpreted using the decomposition (57). For small σ, the hunting jitter dominates so that σ ∆t is approximately constant. Indeed, as σ tends to zero, σ ∆t approaches the small-σ asymptote K/ √ 3 which corresponds to the standard deviation of the uniformly distributed RV in (58). Increasing σ causes σ ∆t to rise, since the effect of the Gaussian clock jitter and the resulting overload jitter becomes gradually apparent. For large σ, overload jitter dominates, and σ ∆t shows a linear increase with σ on the logarithmic scale. Because the last term in (61) is dominant in this case, we obtain the largeσ asymptote which is shown by the dashed lines. Interestingly, σ ∆t increases proportional to the clock jitter variance σ 2 , with the proportionality factor depending on K and ∆T . In particular, zero frequency offset gives the asymptote σ 2 /( √ 2K) reported in [18] . Contrasting this asymptote with the asymptote σ for the non-accumulative jitter case [16] shows that the nonlinear loop reacts differently to different clock jitter. In either jitter case, however, the RMS timing jitter σ ∆t is independent of the loop delay in this large-σ regime (compare Fig. 7 in [16] with Fig. 6 in [18] ). In other words, even in the presence of a loop delay, the SDRW model does capture the stochastic timing jitter dynamics in this regime, and the timing jitter performance can be predicted using (63).
D. Optimal BB Phase Step for Minimum RMS Timing Jitter
In Fig. 8 the RMS timing jitter σ ∆t was plotted as function of σ for a normalized bang-bang phase step (K = 1). In practice, the RMS clock jitter σ is given, and the designer has to choose K such that the loop fulfills the timing-jitter specifications, possibly achieving minimal jitter. Formula (61) demonstrates the limits to the minimum RMS timing jitter: for any fixed σ and ∆T , there is a fundamental trade-off between hunting jitter and overload jitter. Clearly, a small K will yield small hunting jitter but large overload jitter, while a large K will yield small overload jitter but large hunting jitter.
This trade-off is illustrated in Fig. 9 , which plots σ ∆t of (60) as a function of K for zero and nonzero frequency offset. (Recall from Sec. II-D that a zero frequency offset means a small irrational number close to zero). It can be seen that for any fixed σ and ∆T , there exists an optimal bang-bang phase step K opt that gives the minimum RMS timing jitter σ ∆t,min (symbol markers); the minima were obtained by numerically finding the minimum of (60). Predominance of hunting jitter and overload jitter can be ascribed to different regions of the plot. For K > K opt , the timing jitter is dominated by hunting jitter, the fundamental lower bound of which is K/ √ 3 (dashed asymptote). For K < K opt , the timing jitter is dominated by overload jitter and increases rapidly as K approaches the stability boundary at ∆T (vertical dashed lines), obtained from the stability condition |∆T | < K. The figure also shows that Fig. 9 . Trade-off between hunting jitter and overload jitter leads to an optimal bang-bang phase step for minimum RMS timing jitter (symbol markers).
for small σ, the minima are more sensitive to changes in K than for large σ; this may be crucial in a DBBPLL where changes in K due to quantization of the DLF gain K P can degrade the timing-jitter performance. Notice finally that the minima become independent of ∆T with increasing σ, in which case σ ∆t,min is limited only by the clock jitter and not by a frequency offset.
To further illustrate the ultimate limits to the timing-jitter performance, we can derive an approximate closed-form formula for K opt which allows us to analytically predict σ ∆t,min . In particular, setting the derivative of (61) with respect to K to zero yields gives an analytical prediction for σ ∆t,min , which is not explicitly expressed here but which is plotted in Fig. 10 as a function of σ. The numerical (exact) results were again obtained by numerically solving for the minimum of (60), and show good agreement with the theory. The figure illustrates that as σ becomes smaller, σ ∆t,min attains the value ∆T / √ 3 of the hunting jitter (dashed asymptotes). This value can be derived by setting σ = 0 in (64) and plugging the resulting optimal step size K opt = ∆T into the hunting jitter formula K/ √ 3. This result also shows that, as a function of σ, the ultimate lower limit on the minimum RMS timing jitter can be obtained from the case ∆T = 0. Setting the derivative of (62) with respect to K to zero gives K opt ≈ 1.107σ, which means that for zero frequency offset, the optimal bang-bang phase step is approximately equal to the RMS clock jitter. Substituting this result into (62) gives the lower limit σ ∆t,min ≈ 1.348σ, which is just the line ∆T = 0 in the figure.
E. Kurtosis of ∆t
To provide further insight into the loop's nonlinear statistical behavior, we now compute the kurtosis of the timing jitter ∆t. The kurtosis measures the degree by which a PDF deviates from Gaussianity, and is positive for a pointy PDF and negative for a flat-topped PDF. Since the clock jitter is assumed to be Gaussian distributed, a nonzero kurtosis of ∆t means a non-Gaussian timing-jitter PDF, and thus reveals the influence of the BPD nonlinearity.
For the timing jitter, the kurtosis is defined as [34, p.1009 ]
where the fourth cumulant c 4,∆t is obtained from (56) as
given by (48) and (52). The kurtosis kurt ∆t as a function of σ is plotted in Fig. 11 , showing again good agreement with the simulation results.
For small σ, the kurtosis is negative and independent of ∆T ; this is due to the uniform-like (flat-topped) PDF caused by the dominant hunting jitter. Indeed, as σ tends to zero, the kurtosis tends to the value −1.2 for a uniform distribution [34, p.1881 ]. The slight dependence of the kurtosis on σ is due to the Gaussian jitter smoothing the edges of the uniform-like PDF, as shown by the PDF (0; 0.2) in Fig. 6 .
For large σ, intuition suggests that the large reference clock jitter will linearize the binary PD characteristic so that the loop behaves linearly. The positive kurtosis in this region proves the opposite: the loop dynamics is still nonlinear, making the timing-jitter PDF non-Gaussian (see the PDF (0; 3) in Fig. 6 ). Contrast this with the non-accumulative jitter case in [15] , [16] , for which it can be shown that the timing-jitter PDF is indeed Gaussian in this regime, leading to a linear 4 loop behavior.
Since the timing-jitter PDF is symmetric for ∆T = 0, it also follows from the figure that for one particular value σ * , the timing-jitter PDF will in fact be Gaussian, namely for the value of σ giving zero kurtosis (the zero crossing of the corresponding curve). In this case, the loop behaves effectively linearly, the Gaussian clock jitter with RMS σ * being transformed into the Gaussian timing jitter with RMS σ * ∆t . For ∆T = 0, finding the zero of (66) numerically and 4 Recall that a linear system transforms a Gaussian PDF into another Gaussian PDF. Fig. 11 . Kurtosis of the timing jitter, kurt ∆t , versus RMS clock jitter σ for K = 1.
plugging the result into (60) gives σ * ≈ 0.83K and the RMS timing jitter σ * ∆t ≈ 1.02K. As a rule, the timing-jitter PDF is well approximated by a Gaussian PDF for σ close to σ * , but it differs significantly for all other values of σ.
V. CONCLUSIONS
Viewing a first-order BBPLL as a single-integration DM in the phase domain allows us to explain the bang-bang loop behavior using existing delta-modulation terminology and theory, linking hunting jitter and slew-rate limiting in a BBPLL to granular noise and slope overload in a DM.
When the reference clock is contaminated by accumulative jitter, the stochastic evolution of the timing jitter can be modeled as a SDRW. Analytical expressions for the first four cumulants of the SDRW have been obtained and applied to the BBPLL to statistically quantify the steady-state timing jitter. The main results can be summarized as follows:
• The steady-state timing jitter can be decomposed into three statistically independent components: accumulative jitter of the reference clock, hunting jitter and overload jitter. The latter two are static and dynamic components that are due to the binary phase-error quantization and that also occur in a DM [24] . • The static timing offset described by (59) equals the frequency offset for a jitter-free reference clock. Although the presence of clock jitter causes the timing offset to exceed its jitter-free value, the increased timing offset stays within limits, even for a moderate frequency offset, when the clock jitter is not too large; this behavior is due to the self-generated hunting jitter. • Formula (61) explains intuitively how the RMS timing jitter depends on the bang-bang phase step, the RMS clock jitter and the frequency offset. For a fixed bangbang phase step, the RMS timing jitter is constant for small RMS clock jitter and grows quadratically with large RMS clock jitter. For fixed RMS clock jitter, the hunting jitter is proportional to the bang-bang phase step, while the overload jitter is inversely proportional to it-a behavior also displayed by a DM [24] .
• The opposing dependence of hunting jitter and overload jitter on the bang-bang phase step entails a trade-off in choosing the optimal phase step for minimum RMS timing jitter. For zero frequency offset, the optimal phase step is approximately equal to the RMS clock jitter. • Computing the kurtosis has revealed the effect of the BPD nonlinearity: the timing-jitter PDF is Gaussian-like for σ close to some value σ * (for which it is in fact Gaussian), but it is distinctively non-Gaussian for all other σ values. The advantage of having treated the SDRW in the general form (19) is that non-ideal factors can be considered in the analysis. Mismatches between the two charge-pump currents or between the two output levels of the BPD can be accounted for by assuming appropriate mean values for the step RVs. Different reference clock jitter distributions can be taken into account by choosing appropriate step distributions. Flicker noise, which corresponds to accumulative jitter with correlation [9] , can be modeled by assuming each sequence of step RVs to be correlated, but separate cumulant expressions need to be derived in this case. Non-zero loop delay cannot be fully modeled by the SDRW, but as was pointed in Sec. IV-C, the model does give a good prediction of the RMS timing jitter when the RMS reference clock jitter is large. To conclude, although the obtained analytical expressions enable an exact statistical loop analysis, the SDRW model is limited in that it only applies to a first-order loop subject to accumulative reference clock jitter.
The analysis of the present paper and of [15] , [16] , [18] may be extended into several directions. First, since a practical oscillator exhibits both non-accumulative and accumulative jitter [9] , a more accurate loop analysis needs to consider the combined effect of both types of jitter, as well as the jitter from the VCO. Second, since BBPLLs are typically implemented as second-order loops [4] , [5] , the dynamics due to the LF integral path must be taken into account, particularly when the loop stability factor [2] is small. Further to this point, elaborating on the analogy between second-order BBPLLs and double-integration delta modulation with prediction [1] may turn out to be fruitful, in that existing delta-modulation theory [25] could also be applied to second-order loops. Despite the current restriction to first-order BBPLLs, the application of Markov theory has demonstrated a more accurate timing-jitter description by revealing statistical features that remain hidden from a linear analysis.
APPENDIX A DERIVATION OF THE CHARACTERISTIC FUNCTION (27)
The CF (27) derived in the following generalizes the CF given by Fine in [24] . Because the derivation largely follows his approach, we present only those parts that lead to the generalization.
We begin by recalling some facts about the RW defined in (18) [33] , [36] . To introduce the RVs used in the derivation, consider the sequence of points (n, S n ) for n ≥ 1, and define the first ascending ladder point (T + , H + ) as the first term in this sequence for which S n ≥ 0. Then, the first ascending ladder epoch T + = min{n > 0 : S n ≥ 0} marks the epoch at which the RW first enters the non-negative half-line, and the first ascending ladder height H + = S T + marks the point of first entry. Similarly, define the first descending ladder epoch T − = min{n > 0 : S n < 0} and the first descending ladder height H − = S T − , with the analogous interpretation of entry into the negative half-line. Because either half-line does not need to be entered at all, the above defined RVs are possibly defective, with the defects defined as
Furthermore, let M + = max 0≤n<∞ S n be the (possibly infinite) maximum of the RW, and M − = min 0≤n<∞ S n the (possibly infinite) minimum. The following two types of RWs are of interest to us. 
Finally, two identities for the CF of the maximum M + and the minimum M − will be required. 
whereŜ n = max{0, S n }. Similarly, provided the minimum M − > −∞, we have
whereŠ n = min{0, S n }.
To begin the derivation, consider the recursion for the CF in (26) . Using φ Un+1 = φ + Un+1 + φ − Un+1 and assuming that φ U exists and that φ + Un → φ + U and φ − Un → φ − U as n → ∞, we obtain
This equation reduces to Fine's (20) if φ ξ+ = exp(−izK)φ ξ and φ ξ− = exp(izK)φ ξ , corresponding to the model (20) with ∆T = 0. Since (73) is a Wiener-Hopf equation, the general procedure for its solution is the product factorization of the factors 1 − φ ξ± . As in [24] , by applying the Wiener-Hopf decomposition [33, p.571] , these factors can be written as
where φ H + ± and φ H − ± are the CFs of the first ascending and descending ladder heights of a RW with steps ξ ± , respectively. Defining the complex functions
where X ± ≡ X ± (z), the CF φ U can be expressed as [24] 
where X + (0) = −iEH + − /p + + and X − (0) = −iEH − + /p − − , with the defects p + + and p − − defined in (67) and (68). For the denominator in (76), the latter two equations yield
(77)
Consider now a RW with step RV ξ + , for which µ + < 0. By Theorem 1, the RW drifts to −∞ and p + + EH − + = Eξ + = µ + from (69). Similarly, a RW with step RV ξ − , for which µ − > 0, drifts to ∞ and p − − EH + − = Eξ − = µ − from (70). Thus, we can write (77) as
For the numerator in (76), substitute the decomposition (74) into (75) to obtain
.
Finally, plugging (78) and (79) into (76) gives the limiting CF of the SDRW in (27) , which generalizes Fine's (28) [24] . The fact that the second and third term in (27) are the CFs of the RW maximum M + + and minimum M − − follows from Theorem 2.
APPENDIX B DERIVATION OF THE APPROXIMATION FOR THE RMS TIMING JITTER IN (61)
The idea to obtain the approximation (61) is to use an integral test. is finite, in which case the series is upper bounded by f (1)+L.
Since G 2 in (48) is of the form (80), we can upper bound the variance σ 2 ∆t in (60) by applying the integral test as follows:
∞ n=1 g 2 (n, x) < g 2 (1, x) + To show that g 2 (t, x) is monotonically decreasing in t on [1, ∞) for every x > 0, define y = n/2x > 0 and write (50) as g 2 (y) = y 2 + 1 2 erfc (y) − y √ π e −y 2 .
Differentiating this equation with respect to y (denoted by prime) and using erfc ′ (y) = −2 exp(−y 2 )/ √ π gives g ′ 2 (y) = 2y erfc (y) − y √ π e −y 2 .
The bound erfc (y) < y exp(−y 2 )/ √ π for y > 0 [34, p.562] implies that g ′ 2 (y) < 0 and thus that g 2 (t, x) is monotonically decreasing in t on (0, ∞) for every x > 0. Formal integration in (82) yields
(85) so that the variance (60) can be upper bounded by
(86) Simulations show that this upper bound is very tight; it is therefore convenient to simplify (86) further. The simple approximate formula for σ 2 ∆t is obtained by using the asymptotic expansion [34, p.562] 
Replacing the second erfc in (85) by the first two terms of this expansion shows that H 2 can be approximated by
Replacing H 2 in (86) by (88) and omitting erfc gives the approximation for the variance in (61).
