Abstract-Data-dependent jitter limits the bit-error rate (BER) performance of broadband communication systems and aggravates synchronization in phase-and delay-locked loops used for data recovery. A method for calculating the data-dependent jitter in broadband systems from the pulse response is discussed. The impact of jitter on conventional clock and data recovery circuits is studied in the time and frequency domain. The deterministic nature of data-dependent jitter suggests equalization techniques suitable for high-speed circuits. Two equalizer circuit implementations are presented. The first is a SiGe clock and data recovery circuit modified to incorporate a deterministic jitter equalizer. This circuit demonstrates the reduction of jitter in the recovered clock. The second circuit is a MOS implementation of a jitter equalizer with independent control of the rising and falling edge timing. This equalizer demonstrates improvement of the timing margins that achieve 10 12 BER from 30 to 52 ps at 10 Gb/s.
I. INTRODUCTION
T IMING jitter is a serious signal integrity issue in highspeed digital design. Bandwidth demands are driving circuit speeds above conventional package and transmission line bandwidths. Increasingly, the channel behavior must be considered and compensated appropriately to reach the highest information capacity. Significant attention has recently been dedicated toward pushing data rates over legacy FR-4 toward 10 Gb/s [1] - [3] . Noise considerations dictate the choice of equalization technique. While traditional choices for channel compensation manage the effect of intersymbol interference (ISI) on the data eye [4] , this work focuses on the generation and compensation of data-dependent jitter (DDJ). Jitter is timing deviations of the data transitions when compared to a reference clock and reduces the horizontal opening of the data eye. Jitter closes the data eye, increasing the bit-error rate (BER).
Timing jitter is categorized into random jitter (RJ) and deterministic jitter (DJ). RJ results from the translation of random voltage noise into timing fluctuations due to buffering [5] or phase noise of the transmitter and receiver [6] , [7] . On the other hand, DJ has distinct circuit origins and is correlated to limited bandwidth, signal reflection, duty cycle distortion, or power supply noise [8] . Depending on the source, DJ is classified into subcategories. Data-dependent jitter (DDJ) is a prominent form of DJ caused by a particular pattern in the transmitted data symbols. Insufficient bandwidth and signal reflection preserve memory of the previous data and affect future data transitions. Studies on various DJ properties are presented in [8] - [13] . To illustrate the impact of DDJ on the data eye, Fig. 1 demonstrates 10-Gb/s data transmitted over RG-58 cable. The cable bandwidth reduces with increasing length. After 60 inches of cable, the data suffers from both ISI and DDJ. The root mean square (rms) jitter is 3.70 ps after 60 inches and increases to 5.50 ps after transmission over an additional 24 inches. The additional length reduces the bandwidth of the link and increases the jitter. Qualitatively, the DDJ in the two eyes demonstrates a particular structure. In the first plot, the jitter has two dominant peaks. In the second plot, the spread consists of four distinct peaks. This paper reviews the generation DDJ [9] - [11] .
Ultimately, DJ degrades the BER of a serial link. BER requirements in modern high-speed serial links compel limiting the accumulation of jitter [14] . Furthermore, a clock and data recovery (CDR) circuits, commonly implemented as phase-locked loops (PLLs), transfer jitter to the sampling clock [15] . The feedback mechanism of the CDR converts any DJ into RJ on the sampling clock and this sampling disturbance causes an additional BER penalty. The impact of RJ on charge-pump PLLs has been previously studied [16] . Jitter can be rejected by the dynamics of the feedback loop and several authors have addressed characterization and measurement of the PLL jitter [17] - [21] . Recent work on phase-locked loop circuits has demonstrated loop optimization for minimizing the clock jitter [22] , [23] . At 10 Gb/s, CDR designs have focused on low-jitter for SONET applications [24] and phase detector jitter generation [25] .
Instead, this work offers a circuit technique to minimize DDJ. The behavior of DDJ for general pulse of responses is reviewed in Section II and the analysis of DDJ suggests leveraging the deterministic timing to dynamically adjust the receiver response. We relate DDJ to BER in Section III. The impact of DDJ on the phase-locked loops is discussed in Section IV. Our proposed deterministic jitter equalization technique is presented in Section V. In Section VI, two silicon implementations of the DDJ equalizers are demonstrated that operate at 10 Gb/s. The first is part of a charge-pump PLL that compensates the effect of DDJ and lowers the timing jitter of the recovered clock [26] . In the second implementation, independent control of the DJ on rising and falling edges of the data is demonstrated and the BER improvement is measured [27] . Testing and performance results for both integrated circuits are presented in Section VII.
II. ANALYSIS OF DATA-DEPENDENT JITTER
The response of a causal system with finite bandwidth to an NRZ sequence is determined not only by the current bit but also the previous sequence of bits. Effectively, the system integrates the received signal and retains some memory of the previous bits. At each transition, the sequence of previous bits shifts the output amplitude and changes the relative time the signal crosses a decision threshold. This effect is illustrated in. This timing deviation is unique to a particular data sequence and is referred to as data-dependent jitter (DDJ). We separate the nomenclature for DDJ from ISI. Though the origin of DDJ and ISI is related to the bandwidth and signal distortion, the effect on the data eye is independent since the ISI and DDJ are particularly to voltage and timing margins, respectively, and occur at different times. It is possible to generate data eyes with no ISI and a large amount of DDJ and vice versa.
The aggregate response of the channel includes the transmitter and receiver and determines the timing deviation due to the previous data sequence. In the absence of noise, a received NRZ data signal, , comprises the convolution of data symbols, , and the received pulse response, , with pulse-width . The threshold crossing time, , is determined for arbitrary sequences of previous bits from (1) where is the decision voltage threshold. The implicit dependence of in the argument of the pulse response complicates the general solution of the DDJ behavior. In the following section, we summarize the first-order response for which (1) can be solved analytically [9] . Additionally, DDJ estimation for an arbitrary pulse response is demonstrated.
A. Single Pole Response
Often, a first-order low-pass response characterizes the channel or amplifier. In this case, the response is described by (2) where is the cutoff frequency of the filter, . In [9] , we solve for the variation in the threshold crossing time as a function of the pole location, the voltage threshold, and the particular bit sequence. The threshold crossing time is (3) where is a parameter that relates the cutoff frequency and the bit rate, . Fig. 2 graphs the threshold crossing times in (3) for different bit sequences. Near the origin, the bandwidth of the system is wide and the threshold crossing times occur together. As increases, splits into different groups depending on the sequence. This self-similar splitting process continues until the bandwidth is constrained such that the response does not reach the voltage threshold in . Note that if is the current bit, the previous bits correspond to . To reduce the infinite number of in (3), the bit sequences that cause the significant variation of are determined. Notably, the exponential dependence on implies that the penultimate bit has the most significant impact on the threshold crossing time. Therefore, the sequences 010 and 101 arrive early and the sequences 001 and 110 arrive later. These two distinct sets of sequences approximate the DDJ in a first-order system at most practical bandwidths. In [9] , further reduction in bandwidth is shown to split each group of threshold crossing times into additional groups. This dissolution of the threshold crossing times from two into four groups is observed in Fig. 1 .
To develop an expression for the separation between the two dominant DDJ peaks, we calculate the mean for several data sequences using (3) . The DDJ is quantified by this separation and is referred to by .
where and are the average arrival times for the 101 (010) and 001 (110) sequences, respectively. This value is demonstrated qualitatively in Fig. 3 . 
B. Arbitrary Response
For general channel responses, (1) cannot be solved implicitly for in a closed form solution. Linearization techniques such as the Taylor series expansion approximate the DDJ [9] , [10] . A perturbation technique is presented in [11] . A first-order Taylor series is sufficient for smooth pulse responses. (5) where is the derivative of with respect to time. The estimated threshold crossing time, , is derived from the time the step response reaches . To simplify notation, we define and . The relationship between these coefficients and the pulse response is illustrated in Fig. 4 . Substituting (5) into (1), the DDJ is (6) The waveform shape of determines the DDJ properties. Qualitatively, the denominator contains the slope and the numerator contains the pulse value. Equation (6) simplifies if we consider only the penultimate bit as for the first-order response. For 001 and 110 sequences, is zero since, by definition, . For the 101 and 010 sequences 1 (7) This expression is useful for predicting the DDJ peaks when the penultimate bit has the strongest impact on the threshold crossing time. However, this assumption depends strongly on the channel. For instance, reflections might cause the dominant bit occur several bits later. An involved study of the dominant bit is demonstrated in [11] . The following sections describes the statistical impact of DDJ on clock and data recovery. III. DDJ IMPACT ON DATA RECOVERY Ultimately, concern about jitter is motivated by the BER of the system. Consequently, our deterministic model for the data transition timing is used to derive the statistical impact of DDJ on bit errors. Several different sources of jitter contribute to reducing the eye timing margins. Since DDJ is deterministic, it can be decomposed from other sources of jitter. In this paper, all data sequences are assumed equally probable and the data transitions jump approximately between the and . When two peaks are dominant, the DDJ probability density function (pdf) consists of the double dirac function as modeled in [8] - [13] (8)
When the impact of additional bits is significant, the pdf in (8) can be extended to additional discrete peaks using the threshold crossing times calculated in (3) or (6) . If all patterns are equiprobable, the magnitude of each peak is identical.
For the pdf in (8), the rms, , and peak-to-peak (pp), , values for the jitter are useful statistical metrics. is also called absolute jitter since it is compared to a reference [28] . Cycle-tocycle jitter is another useful metric and can be calculated with Markov chain modeling [29] . The rms jitter is a measure of the expected eye opening that achieves a particular BER [8] . gives the absolute range for the transition arrival. Clearly, the is always a smaller than the . From (8), the rms and peak-topeak DDJ is (9) The eye closure is directly limited to the amount of DDJ introduced in the communication link.
A. Jitter Statistics With Random Jitter
Uncorrelated sources of jitter (thermal, noise, phase noise, shot noise, buffer noise) are associated with rms and peak-to-peak jitter statistics. These random jitter sources are often characterized with a Gaussian distribution. The Gaussian pdf has a clearly defined the standard deviation but is unbounded. Therefore, peak-to-peak values also become unbounded. When RJ and DDJ are independent pdfs, the total jitter (TJ) pdf is the convolution of the RJ and DDJ pdfs (10) Therefore, the RJ is mapped onto each DDJ peak, as illustrated in Fig. 3 . Now, the statistics for DDJ and RJ are compounded to increase the BER. For instance, the rms and peak-to-peak TJ is (11) The BER reduction due to DDJ is illustrated using a bathtub curve, shown in Fig. 5 . Near the data transition, the BER is extremely sensitive to jitter. This plot consists of curves with and without DDJ. A sampling point that achieves 10 BER when no DDJ is present will be degraded by six orders of magnitude when DDJ is present. Experimental results describing the relationship between DDJ and BER are described in [13] .
B. Jitter Statistics With Voltage Threshold Offset
In real receivers, there exist voltage offsets and device mismatches that do not allow exact sampling at the voltage threshold. Presumably, the definition of DDJ in (3) and (6) is sensitive to the voltage threshold. In the presence of an offset, and will increase as seen in Fig. 3 . The spread of transition times at the threshold is larger when a voltage offset is present. This voltage threshold offset is duty-cycle distortion (DCD).
In Appendix A, we find the pdf for DDJ in the presence of an offset voltage threshold, . The voltage offset creates an additional source of jitter that is independent of DDJ. The standard deviation and peak-to-peak values of this DCD jitter are (12) Comparing (12) and (9), the relative impact of DCD jitter and DDJ can be compared. If is given by (7), then the DCD jitter dominates if is much smaller than . A small voltage threshold offset can dominate the impact of DJ. IV. JITTER IN CLOCK AND DATA RECOVERY CIRCUITS CDR circuits lock the phase of the received data to a local clock with a PLL shown in Fig. 6 . The phase deviations due to DDJ become random timing jitter of the output clock. Alternatively, CDR circuits are implemented with delay-locked loop (DLL) when the frequency is known or is not needed. Both systems rely on phase-locking to the data transitions and are, therefore, sensitive to DDJ. The impact of DDJ on PLL performance depends on the feedback dynamics and contribution of other jitter sources. To study the effect of DDJ on PLL/DLLs, we consider the behavior of DDJ in the frequency domain and derive the power spectral density (PSD) for the input jitter.
The jitter PSD for the DDJ is derived in Appendix B and is expressed as (13) The influence of this jitter PSD can be applied to the PLL response to determine the output jitter. The closed loop transfer function from the phase detector input to the PLL output is (14) where and for a charge-pump PLL [19] . These dynamics describe a low-pass filter. The transfer function in (14) and jitter PSD determine the PLL output jitter. For jitter at the input of the phase detector, the output jitter PSD is (15) To relate the jitter PSD to the output timing jitter, the Wiener-Khinchin theorem translates the jitter PSD to the time domain as described in [30] . (16) where represents the growth in the variance after an initial timing edge. Analytical results for (16) are possible if the loop dynamics are constrained. In the following equations, we assume that the loop is critically damped, i.e.,
. For other damping factors, the timing jitter can be solved numerically. For the contribution of DDJ, the variance of the PLL output timing jitter, , evolves according to (17) The output timing jitter due to DDJ is bounded and depends on the bandwidth of the loop and input DDJ (18) Typically, jitter specifications for a CDR circuit determine the acceptable cut-off frequency. Additionally, higher-order filters are used to attenuate the input jitter [22] . Nevertheless, reducing the input timing jitter is useful in many applications. To put this jitter into perspective, the phase noise of the VCO has also been shown to have a bounded variance in [19] and [28] . For a chargepump PLL, the output jitter due to the white noise in the voltage controlled oscillator approaches (19) Consequently, the contribution of the DDJ dominates the phase noise of the local oscillator if (20) Often, careful VCO design implies that the right-half side is small and this condition holds. On longer time scales, transmitter flicker noise dominates the jitter variance since this source of jitter is actually unbounded [28] .
V. DATA-DEPENDENT JITTER EQUALIZATION
The relationship between the data sequence and DDJ suggests that the effect of the channel might be mitigated. Consequently, jitter penalties can be substantially reduced. In this section, we propose the implementation of deterministic jitter equalization (DJE) schemes for DDJ.
Removing the presence of DDJ involves minimization of (3) and (6) for any bit sequence. A simple solution to this problem for decision feedback equalization (DFE) is presented in [4] . The pulse response, in this case, is studied at the threshold crossing voltage. Eliminating DDJ implies setting the numerator of (6) to zero. Consequently, (21) where are the compensation coefficients. Assuming , we approach DDJ compensation by detecting transitions as opposed to particular bits. Therefore, a bit-wise operation can be performed on the previous bits to determine when transitions occurred. Using an exclusive-or (XOR) operator , transitions are detected and the response for complementary bit sequences is identical as discussed in Section II (i.e., 101 and 010 are compensated identically). We define a transition coefficient,
. Notice that and, additionally, implies that a transition occurs at the current bit. Applying this operation to (21) (22) By construction and the signal is compensated with the voltage values (23) This DDJ compensation scheme is reminiscent of DFE algorithms [4] except that the coefficients are calculated at the transitions instead of the center of the data eye. However, an important distinction of DJE is that the compensation can be applied in the time domain. This offers the advantage of just affecting the phase characteristics of the signal. Furthermore, time compensation is necessary in situations where the linearization of the signal near the threshold crossing is not valid. For DDJ described in (6) (24) Now the equalization scheme generates a time delay adjustment. Perfect compensation in this situation is impossible because we have one parameter to adjust for two unknowns, and . A useful approximation is derived when the slope decays rapidly (i.e., , for ) and the coefficient can be calculated as (25) A DJE architecture based on this construction is shown in Fig. 7 . This scheme generalizes the circuit discussed in [26] . Transitions are calculated through a cascade of bit-period delays. Since the delays occur after a decision is made, these delays can be implemented as digital gates. When a transition is detected, the compensation coefficient is added to other weighted transition detection signals and this value adjusts the receiver delay before the next transition reaches the threshold detector.
When and are not independent, as for the first-order response, the dominant time-delay compensation can be determined can be compensated exactly from (3) (26) Voltage compensation is introduced with the logarithm because of the dependence on transition history. If only the penultimate bit is compensated, the compensation coefficient is in the time domain (27) This result will be used to demonstrate the eye improvement of DJE schemes. Finally, the detection of prior rising or falling edges allows individual compensation of the consecutive falling and rising edges, respectively. The illustration in Fig. 8 demonstrates how an AND gate detects either a rising or falling edge after the data value is decided. Parallel adjustments are provided and combined in a variable time delay element. This scheme is presented in [27] and is particularly useful when nonlinear response introduces different DJ on the rising and falling edges.
A. Eye Improvement With DJE in a First-Order Channel
While it may seem that DJE only works on open data eyes, this technique has limited capability for completely closed eyes. This is illustrated in this section for the first-order system. In Fig. 9 , the data eye is shown with and without DDJ compensation at and . For the higher cutoff frequency, the transitions converge when (27) is used as the compensation scheme. Removing the timing deviations also enhances the voltage margins of the data eye and improves both the timing and voltage margins.
For the lower cutoff frequency, the first-order response closes the data eye. When DJE is introduced, the data eye is re-opened. Consequently, the maximum bit rate is increased before the eye is completely closed with DJE. To quantify this theoretical eye opening, we simulate the voltage and timing margin enhancement for a cut-off frequency of 1 GHz and show the results as a function of bit rate in Fig. 10 . The timing margin is calculated at the voltage threshold and the sampling time for the voltage margins is calculated in the center of the timing margin. Interestingly, the DJE is capable of keeping the data eye open to more than 12 Gb/s, an improvement of 3 Gb/s than without DJE.
B. Comparison to Decision Feedback Equalization
The operation of the DJE resembles DFE insofar as that previous decisions about the data are used to dynamically adjust the receiver response. Additionally, many of the implementation issues associated with DJE are analogous to issues pertaining to DFE. For instance, a DJE implementations are subject to similar critical path timing requirements as DFE [31] . Several distinctions between the schemes are noteworthy. First, the feedback of the DJE carries information about transitions rather than specific bits. This information is inferred from bit period delayed bit samples. Analog feedback could also be implemented without actually sampling the data to compensate the DDJ.
Second, the DJE relies on a time delay adjustment in the receiver instead of varying the decision voltage threshold. In some situations, these two processes are similar. For instance, small variations of delay can be viewed as shifting the voltage threshold. However, when the transition of the data edge is nonmonotonic, these approaches are essentially different because the linearization of the slope does not translate into a unique correspondence between the voltage and time-delay. Furthermore, in situations where the compensation is required over a time interval greater than the rising or falling edge, a true time delay is required instead of simply a voltage threshold adjustment.
DJE can be implemented in the transmitter or the receiver. When DJE is implemented as a pre-emphasis technique, the DDJ peaks at the transmitter are reversed such that the effect of the channel is neutralize the DDJ at the receiver and create consistent transitions. The peak power constraints of the output driver de-emphasize the transmitted signal but do not limit DJE techniques. DJE can improve the timing margins of the eye by manipulating the transmit clock which, in principle, does not incur significant power consumption.
In the receiver, DJE could enhance the performance of DFE techniques. In general, DFE techniques are well-suited for channels with strong attenuation while DJE is useful for dispersive channels. In [2] and [31] , the DFE is implemented by multiplexing between two voltage comparators with different voltage thresholds. Once the decision on the previous bit is complete, the output of one of these multiplexers is selected for the next bit. Simultaneously, we could imagine the same process in the time domain. Every bit is sampled at two different sampling times and on the basis of the previous detected bit we would choose one or the other same. Ultimately, combining both techniques to adjust both the voltage threshold and the transition (or sampling) time on a bit-by-bit basis could provide the optimal BER performance.
VI. CIRCUIT IMPLEMENTATIONS
The discussion of data-dependent jitter has identified a technique for removing jitter from the data signal. This increases both the timing margins of the recovered data as well as the clock output jitter. Two integrated circuits have been designed to test different aspects of DJE for DDJ. The first circuit consists of a DJE that is integrated with a CDR circuit. The second circuit is a MOSFET implementation of a DJE with independent control of the rising and falling edges of the data eye.
A. DJE and CDR (DJE CDR)
A chip microphotograph for the first design is provided in Fig. 11 . The fabrication technology is IBM 7HP with 0.18-m feature size and a bipolar device of 120 GHz. The circuit consists of the DJE, a PLL, and 50-output drivers. A circuit schematic of the DJE CDR is illustrated in Fig. 12 . The DJE corrects for jitter caused by the penultimate bit. It functions essentially as described in the dark portion of the schematic in Fig. 7 .
The PLL is designed with a modified Hogge phase detector (PD) [17] , [32] . The Hogge PD comprises cascaded DFFs. The output of each flip-flop is a bit period shifted version of the received signal. The DJE taps the input of the Hogge phase detector and the first DFF as demonstrated in Fig. 12 . Since the delays depend on the DFF driven by the recovered clock, the equalization only operates correctly when the PLL is locked. The delay stage is based on the current-starved differential pair suggested in [33] . Each path is driven by the transition-detection multiplier implemented with an XOR logic gate.
The operation of the DJE requires that the current is fully steered within one bit period. Since detection of a previous transition compensates the timing of the current transition, the tran-sition must be detected and the delay adjusted within a symbol period. While the use of look-ahead logic suggested in [34] can relax the feedback requirement, it comes at the disadvantage of power and complexity.
The circuit demonstrated Fig. 12 only requires one logical gate delay. As this XOR is an emitter-coupled logic (ECL) gate, the gate delay is about 20 ps in a 0.18-m SiGe technology. Faster speeds are possible in this technology given additional power consumption [35] . Since the delay is based on current steering, the delay stage adjusts quickly to the transition data. nMOS transistors provide smooth transition of the current between the differential pairs. The amount of variation between the two delay values is controlled by the equalizer tuning voltage. The nMOS transistor acts as a current bypass to the transitiondetection multiplier. When the applied voltage is high, the current is steered through the bypass transistor. Otherwise, the transistor controls how much current is switched between the two transistors. Since the ECL gate only provides around 300 mV of signal swing, the nMOS transistors is sized to provide the desired current variation when the transition detection is asserted. In simulation, the delay variation is 10 ps.
The phase detector drives a differential charge pump (CP) with a current of 400 A. An on-chip loop filter is designed with a bandwidth of about 50 MHz and damping factor of about 0.7. The loop capacitor is, therefore, around 1 pF and the series resistance is around 4 k . The loop filter generates a differential control voltage for the complementary cross-coupled oscillator. The control voltages adjusts the frequency through complementary MOS varactors. The advantage of the differential tuning is rejection of common-mode noise. Several PLL designs have demonstrated the benefit of differential tuning [36] , [37] . However, the common-mode must be set with feedback as shown in Fig. 12 . The tuning range of the oscillator is 9-11.5 GHz to provide robustness to process variations. The output of the oscillator is buffered and drives the phase detector and a 50-buffer.
The chip area measures 1.2 mm by 1 mm including the loop filter and pads. The circuit consumes 70 mA from a 3.5-V supply. The DJE area is 100 m by 80 m and draws 10 mA of this current.
B. CMOS DJE
The second circuit is implemented with 0.12-m MOSFETs in IBM 8HP. A chip microphotograph is provided in Fig. 13 . The circuit schematic is illustrated in Fig. 14 . This design was intended to remove the restrictions on implementing the DJE within a CDR circuit and uses variable delay stages for buffering. As this was the initial fabrication of the 8 HP process (which combines both 0.12-m bipolar devices and CMOS devices), the MOSFET models were expected to exhibit some process variations. To satisfy the maximum feedback propagation delay, the topology uses look-ahead feedback that introduces a delay to compensate for the propagation delay, . Consequently, this approach targets DJ accumulation in high-speed circuits.
High-speed current-mode logic (CML) AND gates detect transitions at 10 Gb/s. This logic gate approach is more robust to process variations and is sufficient for a proof-of-concept of the Fig. 14 . The CML gates generate logical values when the rising or falling edges occur and this logical value is weighted to adjust the variable time delay.
DJE. The implementation of the AND gate is demonstrated in
The variable time delay is demonstrated in Fig. 14 and is the nMOS analog to the cross-coupled latch used in the ECL implementation. Additionally, the delay control is provided for two different control signals. This avoids explicitly combining the falling and rising edge control signals before the delay stage and lowers the feedback latency. Each delay signal occurs exclusively since rising and falling edges do not occur simultaneously. The rising and falling edge detection signals are combined in the tail of each differential pair. The DJE area is 130 m 80 m. From simulation, the circuit draws 20 mA per channel. Additional current consumption supports the output buffering as well as additional circuits tested in [27] . 
VII. RESULTS
To test the DJE described in the previous sections, we need to introduce a controllable amount of DDJ. Some circuit alternatives are illustrated in Fig. 15 . Ease of implementation and testing issues determine the best scheme. A bandwidth-constrained buffer stage shown in Fig. 15(a) . The dominant time constant introduces a first-order response and the DDJ is calculated from (4). This scheme is easily implemented on-chip at high-speeds to avoid reflections.
Next, DDJ can be introduced between differential transmission lines through cable attenuation and, consequently, bandwidth reduction as shown in Fig. 15(b) and demonstrated in Fig. 1 . At high speeds, the lines are impedance-matched and reflections are typically attenuated over the length of the fiber. However, it is difficult to change how much DDJ is introduced Finally, the circuit in Fig. 15(c) correlates the data through a one bit delay. The multiplier controls the sign and amplitude to create either a positive or negative replica of the original bit. The bit and bit-delayed signal are coupled through transmission lines. The coupling advances or slows the data transitions through the mode between the lines [27] , [38] . This coupling behavior shifts the time of flight for the transition on the microstrip line. The amount of DDJ that is introduced is given by (28) where the time constant is a high-pass filter time constant defined in [27] and the numerator and denominator are the peak-to-peak signal swings. Adjusting the swing gives a linear adjustment of the DDJ peak separation. Using a 10-Gb/s Anritsu MP1763C pulse pattern generator (PPG), the differential output swing of the PPG can be controlled independently. One data output is introduced directly to the coupled transmission lines. The complementary data output is delayed by one bit using wide-band phase shifters before entering the coupled transmission lines. Varying the complementary output amplitude allows manipulation of (28). In Table I , the total rms jitter is measured from the data eye for various ratios. is 4.75 ps and is discounted from the total jitter according to (11) . Then using (9) , the distance between the DDJ threshold crossing time peaks is calculated. Two eyes are shown in Fig. 16 with different ratios of (28) . The DDJ peaks in the two eyes are separated in time by 8.5 and 26 ps, respectively. Comparing these values for the DDJ peaks with the anticipated values in Table I demonstrates the close agreement, less than 0.4 ps deviation, with the theory. This scheme provides a flexible platform to control the amount of DDJ with little impact to the ISI.
These results predict the rejection of DDJ through the CDR. The natural frequency of our loop filter is near 50 MHz and the damping factor is about 0.7. Therefore, the equations that expressed in Section IV for approximate the expected results. In this case, the long-term jitter is anticipated to be (29) from (17) . In Table I , this relationship is demonstrated across the range of the DDJ inputs.
A. DJE CDR
For the DJE CDR, DDJ is introduced with the schematic in Fig. 15(c) . Several steps are taken to demonstrate the performance of the DJE CDR. First, the bias voltage of the VCO is scanned to determine the lowest phase noise. The phase noise is measured with an RDL NTS-1000B. The phase noise is measured when the CDR is locked to a periodic sequence at 5 Gb/s and is demonstrated in Fig. 17 . The phase noise under this condition is more than 10 dB below the phase noise when the PRBS is applied. Additionally, the phase noise of the PPG is measured with an alternating one and zero pattern to demonstrate the noise contributed by the CDR circuit and the VCO. The phase noise increases above 100 kHz due to the additional impact of the VCO phase noise.
Next, the phase noise is measured with a PRBS sequence. The phase noise is measured without any DDJ compensation and with the maximum compensation. Below 1 MHz (the cutoff of the phase noise analyzer), the phase noise is generally improved and is decreased by 4 dB at the 100 kHz offset when the DDJ is compensated.
The data eye and recovered clock are illustrated in Fig. 18 without the influence of DDJ. Using an Agilent 81600B wide-bandwidth oscilloscope, the timing statistics are collected from 5000 histogram points. The rms jitter of the recovered data is slightly greater than the recovered clock due to DDJ in the output driver. The rms jitter on the recovered clock is comparable to the jitter reported at 10 Gb/s in [24] , 0.78 ps, or [25] , 0.95 ps. The timing jitter is recorded for DDJ conditions described in Table I and illustrated in Fig. 19 for the jitter of the recovered clock and data. Increasing the amount of DJE lowers the jitter by 0.3 ps.
The predicted due to the input is on the order of the measured . However, the total variation reflected in the curves in Fig. 19 is not as great as the variation in Table I . One explanation is that other sources of jitter, such as the noise added by the operation of the DJE, limits the minimum improvement. This tradeoff is described at the end of this section.
Finally, a bathtub curve is generated for the DDJ and ISI of the data eye to demonstrate the BER improvement. The sampling voltage and time is scanned with an Anritsu MP1764C error detector and the BER at each sampling point is recorded. The collection of these BER measurements forms two bathtub curves in Fig. 20 . For this bathtub curve, we switched our testing environment to 60 inches of RG-58 cable used to motivate this discussion of DDJ in Fig. 1 . Notably, the data is retimed internally in the Hogge phase detector. Therefore, the bathtub curve in this case demonstrates the ability of the PLL to reduce the contribution of DDJ to the recovered clock timing jitter. Interestingly, improvement was measured in both the voltage and timing margins. The timing margins at 10 BER were increased by about 3 ps while the voltage margins increased around 10 mV at 10 BER.
B. CMOS DJE
To introduce DDJ in this implementation, on-chip capactive loaded amplifiers, as described in Fig. 13(a) , were used to simplify testing. A 2.5-mA buffer is loaded with three large, lasertrimmable capacitors. The capacitors are metal-insulator-metal (MIM) located near the top analog metal (AM) layer, which is easy to trim. Each capacitor loads the amplifier with 91 fF. Using the first-order relationships for jitter in Section II, we predict the DDJ introduced due to this loading at 10 Gb/s in Table II using (4) . The largest load capacitance is used for testing the DJE at 10 Gb/s. The data eye measurement results for the DJE are shown in Fig. 21 and in Table III at 10 Gb/s. Four eyes are demonstrated to show the independent equalization of the rising and falling edges. The individual compensation of the rising and falling edges reduced by similar amounts. was clearly improved entirely by rising edge equalization. The compensation of the rising edge was slightly better since the rising edge suffers from more DJ than the falling edge. The additional DJ on the rising edge is a circuit-induced asymmetry that occurs in the rising edge equalization path but not in the falling edge path.
To analyze these results, we assume that the minimum rms jitter recorded in the data eye was contributed solely by random jitter in the circuit, ps. To understand the expected rms jitter when only one edge contributes to the DDJ as we observe in Fig. 21 , the pdf for DDJ derived in Section II is modified to study DDJ when only one edge is compensated. Now (30) where the minimum DDJ occurs when the compensated edge occurs between the two edges that suffer from additional DDJ. Consequently, the rms jitter can be expressed as (31) Notably, this is 1.4 times smaller than if the DDJ is present on both edges. With (31), the contribution of DDJ is determined in Table IV . The RJ component is discounted to determine and the threshold crossing time variation. This expression gives consistent expectations for the threshold crossing time deviation. The calculation of indicates that the actual DDJ is twice as great as the DDJ contributed from the capactive load in Table II . Additional sources of parasitics were studied to determine the source of 7 ps of DDJ. The layout contained a long connection loading a 500-A buffer with 38.5 fF. This implies that ps and . Therefore, this buffer introduced an additional 5.6 ps of DDJ, accounting for a significant portion of the additional DDJ. This example illustrates interconnect challenges for signal integrity on-chip. Finally, the BER bathtub curve is calculated directly through the MP1764C error detector to verify the BER improvement of the DJE and the results are plotted in Fig. 22 . The BER demonstrates that equalizing the individual edges resulted in similar eye-opening. The equalization of both edges increased the eye opening that achieved BER of 10 from 30 to 52 ps over the 100-ps unit interval.
C. Tradeoff Between Data-Dependent Jitter Compensation and Random Jitter
In the described DJE schemes, delay variation is introduced through a buffer stage. Therefore, a design tradeoff exists between compensating DDJ and introducing additional random jitter. From [5] , the random jitter introduced through a CMOS buffer stage is (32) where is the load capacitance, is the stage bias current, and is a bias dependent term. The delay of the stage, if slew rate limited as in our cross-coupled stages, is where is the logic swing. Therefore, delay variation is achieved by varying the stage current between and .
where is the minimum stage delay and is the percentage delay variation. This delay stage introduces random jitter that depends on the delay variation. From (32), the RJ is (34) where this is expressed with the percentage of delay variation and the minimum buffer jitter. The total jitter with DJE is expressed using (11) (35) The expression in (34) is substituted into (35) and we find (36) If the DDJ is removed, i.e., , the minimum jitter in (36) becomes (37) For the TJ to be less than our original DDJ, (37) should be less than . Therefore,
where is assumed to be much less . The strength of the inequality in (38) determines the effectiveness DJE. If is large relative to , the TJ should reduce dramatically with DJE. If is relatively small, little improvement will result in the overall TJ.
This analysis provides one explanation for the slight improvement in Fig. 17 and Fig. 19 . In particular, the noise of the cross-coupled delay stage adds significant jitter and reduces the benefit of DJE. Furthermore, the delay stage is susceptible to power-supply variations which could add an additional jitter penalty and, consequently, (38) is a lower bound criteria for introducing DJE.
VIII. CONCLUSION
An analysis of data-dependent jitter in general LTI systems is discussed. Features of DDJ are highlighted in the time domain to discuss the impact of DDJ on BER and are related to the frequency domain to discuss implications for clock and data recovery circuits. We have proposed the use of deterministic jitter equalization for DDJ and studied the potential performance improvement. This study includes a comparison to decision-feedback equalization. Deterministic jitter equalization may enhance the performance of DFE techniques in future serial transceivers. Two circuit implementations are presented to demonstrate the design and performance of deterministic jitter equalization for data-dependent jitter. While both of these circuits have been oriented for the first post-cursor DDJ, the technique could be applied to more general equalization schemes that involve long-latency reflections. The measured performance of the integrated circuits demonstrates that deterministic jitter equalization can improve timing jitter as well as improve the BER performance of broadband communication systems.
APPENDIX A When effect of the voltage threshold offset is taken into account, the DDJ pdf becomes (39) which reduces to (8) if
. The peaks are listed in order of the two falling edges and the two rising edges. Note that is the same definition used in (7) where . This describes the minimum amount of DDJ. In this case, the rms and peak-to-peak jitter are (40)
The effect of the voltage threshold offset can be isolated from the DDJ. The variances of the DDJ and the DCD add in (40). This implies that DDJ and DCD are uncorrelated. The DCD jitter effect is unchanged regardless of how much DDJ is present. Therefore, the individual variance and peak-to-peak values describe the duty-cycle distortion (DCD) jitter in the serial link.
APPENDIX B
Using the variance of DDJ from (9), we construct a discrete autocorrelation function describing the phase variance of the input data. This construction of the autocorrelation from the variance is simplified but relevant for determining the general behavior of the DDJ in the frequency domain.
(41) where . When multiple DJ peaks are present in the jitter pdf, the autocorrelation may have nonzero correlation over several data periods. Markov models for the phase progression can be constructed to show the complete autocorrelation and impact on cycle-to-cycle behavior [29] .
Using the Wiener-Khinchin theorem [39] , the autocorrelation is related to an average power spectral density (PSD) using the discrete time Fourier transform (42) This reflects a white noise jitter PSD at the input of the PLL. However, the PSD is discrete and, consequently, the PSD must be transformed to continuous time PSD in the PLL through the sampling process in the phase detector [39] (43)
Since the low-pass filter bandwidth of the PLL is much smaller than , we can approximate (43) as
This frequency domain expression allows study of the impact of DDJ on the PLL output jitter.
