Abstract-A novel approach to equalization of high-speed serial links combines both amplitude pre-emphasis to correct for intersymbol interference and phase pre-emphasis to compensate for deterministic jitter, in particular, data-dependent jitter. Phase preemphasis augments the performance of low power transmitters in bandwidth-limited channels. The transmitter circuit is implemented in a 90-nm bulk CMOS process and reduces power consumption by pushing CMOS static logic to the output stage, a 4:1 output multiplexer. The received signal jitter over a cable is reduced from 16.15 ps to 10.29 ps with only phase pre-emphasis at the transmitter. The jitter is reduced by 3.6 ps over an FR-4 backplane interconnect. A transmitter without phase pre-emphasis consumes 18 mW of power at 6 Gb/s and 600 mVpp output swing, a power budget of 3 mW/Gb/s, while a transmitter with phase pre-emphasis consumes 24 mW, a budget of 4 mW/Gb/s.
I. INTRODUCTION

E
QUALIZATION of high-speed serial links has evolved to compensate intersymbol interference (ISI) caused by frequency-dependent attenuation and reflections found in interconnects. Pre-emphasis-based equalization in the transmitter and decision feedback equalization in the receiver figure prominently in overcoming signal degradation and improving the bit-error rate (BER) [1] - [3] . Currently, one challenge of equalization is minimizing power consumption while still improving signal integrity in the presence of attenuation and reflections. This design targets low power consumption while offering equalization appropriate for shorter (under 16 inches), less dispersive interconnects.
In this transmitter, we expand the notion of pre-emphasis beyond amplitude compensation of ISI and introduce phase pre-emphasis for compensating data-dependent jitter (DDJ). DDJ compensation has been demonstrated to increase the timing margins in clock and data recovery [4] , [5] . Unlike random jitter (RJ), DDJ can be addressed by exploiting the relationship between the data sequence and the timing deviation [6] , [7] . Earlier work on magnetic write heads proposed compensating transition timing deviations caused by long sequences of ones and zeros through the addition of isolated pulses during data runs [8] , [9] . Transmitters must drive enough power over lossy interconnects to meet minimum receiver sensitivity requirements. Finding new methods to lower power consumption in serial links has been explored through low-common-mode signaling [10] and through low-supply operation [11] . Amplitude pre-emphasis distorts the signal to compensate ISI introduced by the bandwidth limitations of the interconnect [12] , [13] . The number of taps of amplitude pre-emphasis depends on the channel quality and bit rate. Further improvement of the signal integrity motivates the addition of phase pre-emphasis. Since DDJ compensation is implemented through adjusting the transition times of the data, it can be introduced without significant additional power consumption in the transmitter. The effectiveness of combining amplitude and phase pre-emphasis depends on the behavior of the channel response.
In Section II, we summarize the operation of one-tap amplitude pre-emphasis in the context of modern bandwidth limitations in transmission lines. Section III introduces the notion of phase pre-emphasis and discusses a general algorithm for reducing DDJ at the receiver. In Section IV, we discuss the implementation of a 6-Gb/s transmitter with amplitude and phase pre-emphasis. Finally, the hardware results are discussed in Section V and we illustrate the performance over both a bandwidth limited cable and a backplane interconnect.
II. AMPLITUDE PRE-EMPHASIS
Transmission lines are often limited by frequency-dependent skin-effect and dielectric losses. Skin effect arises from nonuniform electric fields in conductors. At high frequencies, the resistance of the wire increases and is accompanied by an effective internal inductance. The skin loss can be expressed as (1) where is the wire length, is the permeability, and is the conductivity. Interestingly, the phase shift and amplitude attenuation are identical for skin effect. Additionally, the dielectric material of the transmission line causes loss. (2) where is the dielectric constant, tan is the loss tangent of the material, and is the speed of light. A thorough discussion of the skin and dielectric losses in modern materials is given 0018-9200/$20.00 © 2006 IEEE by Deutsch [14] . Fig. 1 compares skin and dielectric loss to the measurement of 96 inches of RG-58 cable. At 3 GHz, one-half the bit rate of 6 Gb/s, the loss of the cable is roughly 4 dB. Recent equalizer circuits have been introduced to compensate for cable loss [15] . In high-speed backplanes, via stubs and connectors cause signal reflections at gigahertz frequencies. These reflections also cause dispersion and, consequently, ISI in the link. For modern backplanes, these mechanisms limit the bandwidth of interconnects to around 3 GHz, depending on length. Interconnect bandwidth decreases with longer length and the number of discontinuities. In Fig. 1 , the frequency response of a Tyco backplane with Hm-Zd connectors rolls off much faster due to these reflections. The loss over this channel is close to 10 dB at 3 GHz.
Amplitude pre-emphasis compensates the frequency-dependent loss. The block diagram for a one-tap feed-forward amplitude pre-emphasis scheme is shown in Fig. 2 . The circuit consists of two cross-coupled drivers that compete to drive the line. The amplitude pre-emphasis driver transmits data delayed by one bit period, . The amplitude pre-emphasis gain, , is the ratio of current in the pre-emphasis driver and main driver, . The transfer function for this stage is expressed as (3) A first-order Pade approximation is used to express the time delay as a rational function, . The transfer function for the scheme is written as (4) This transfer function reveals that the amplitude pre-emphasis scheme reduces the DC gain. For this reason, amplitude pre-emphasis is often considered "de-emphasis." The DC gain can be increased to maintain a constant voltage swing by increasing the bias current and power consumption. High-frequency amplification is introduced through a zero that depends on . The transfer function has an additional pole at . Other poles in an actual implementation limit the all-pass behavior. In Fig. 2 , the operation of amplitude pre-emphasis creates four distinct transmit levels in the data eye. The higher levels, , are transmitted when there is a data transition. Since the transmitter operates from a fixed voltage supply, a maximum peak power limits the swing of the higher levels. One-tap amplitude pre-emphasis tends to provide some compensation for the frequency-dependent losses, however, more robust approaches use multiple taps to implement the transmit finite-impulse response (FIR) filter at the expense of additional power consumption [1] , [2] , [13] .
III. PHASE PRE-EMPHASIS
To improve the timing margins of the data eye, phase pre-emphasis is introduced to the transmitted signal to compensate the effects of data-dependent jitter. Data-dependent jitter, the result of ISI at the transition times, is a form of deterministic jitter (DJ) that limits the timing margins of the received data eye. In [6] and [7] , the relationship between the link characteristic and the DDJ is established. In particular, it is shown analytically that there are multiple DDJ peaks related to when the previous transition occured. If we assume that the channel is linear, the timing deviation of complementary signals, i.e., a 101 and 010 data sequences, is identical. In [5] , we demonstrated the adjustment of the data transition timing at the receiver to compensate for the effect of DDJ. This work focuses on DDJ compensation in the transmitter where there are several implementation advantages.
Transitions that have occurred recently tend to strongly impact the DDJ. This is described in [6] , where the DDJ resulting from a first-order response comprises diminishing contributions for each previous transition. In Fig. 3 , the transitions of a firstorder data eye are magnified. The timing deviations due to previous transitions are highlighted. For the th previous transition, the transition between the and bit, we denote the mean timing deviation as . For instance, describes the DDJ contribution of the transition between previous and the penultimate bit. The figure illustrates that the DDJ contribution tapers off quickly. After the third transition, the contribution is small compared to the bit period. Phase pre-emphasis manipulates the data transition timing to neutralize DDJ.
The relationship between different data sequences and the threshold crossing time is illustrated in Fig. 4 . A few data sequences are plotted to demonstrate the associated timing devia- tion and to illustrate the operation of the pre-emphasis scheme. Detection of each previous transition is used to compensate the current transition time. The detection of th previous transition is denoted for the current bit, , and is calculated from , where is the XOR operator. From derivations in [5] , the compensated transmit time is calculated through the following algorithm: (5) For example, the 0010 sequence results in the fastest threshold crossing time for a first-order system. For this sequence, and are both 1, implying that we will introduce the largest delay to compensate the fast transition for this sequence. On the other hand, the 1110 sequence results in the slowest threshold crossing time. For this sequence, and will both be 0 and the transmitter will not introduce delay in the transmitted bit timing. In this example, we illustrate two 1110 curves corresponding to whether the initial condition, the unshown previous bit, is 0 or 1. Ideally, compensation is introduced for each previous transition until the difference in DDJ contributed by these initial conditions is negligible.
In Fig. 5 , we demonstrate heuristically how phase pre-emphasis, along with amplitude pre-emphasis, introduces DDJ and ISI to the signal that is removed by the loss mechanisms in the serial link. The combination of both approaches can provide more signal integrity. In essence, the feed-forward amplitude pre-emphasis is a symbol-spaced FIR filter. The addition of phase pre-emphasis introduces an approximation for a half symbol-spaced FIR filter. While the use of amplitude pre-emphasis alone can provide some improvement in the DDJ, the symbol-spaced FIR filter cannot generally adjust the DDJ and the ISI to be simultaneously zero. Consequently, the use of half symbol-spaced FIR filters is essential to minimize both the DDJ and ISI.
Finally, for the purposes of this work, the coefficients of the equalizer are assumed to be adjusted ad hoc. The DDJ in simple linear time invariant (LTI) systems can be solved exactly to calculate the necessary timing compensation, but generally some pulse response characterization or equalizer adaptation is required to adjust both phase and amplitude pre-emphasis coefficients. The coefficients for the phase pre-emphasis could be compensated by sampling the timing deviation at the receiver for particular transmitted data sequences. For instance, in Fig. 4 the 0110 sequence introduces the delay while the 1010 sequence introduces the delay . At the receiver, these particular sequences could be detected and a timing interval could be calculated from the mean transition timing. The coefficients would be transmitted back to the transmitter in a back-channel scheme [3] .
IV. CIRCUIT IMPLEMENTATION
The architecure for the low-power phase and amplitude preemphasis transmitter is illustrated in Fig. 6 . The design is based on a 4:1 output multiplexer that provides amplitude pre-emphasis, combinatorial logic for phase pre-emphasis, delay generation cells for controlling the clock edges of the multiplexer, and duty-cycle control for each clock phase, which is useful for counteracting process variations. Each of the quarter-rate clock phases is ANDed with its neighbor to generate a 25% duty cycle clock at the 4:1 multiplexer.
The transmitter is implemented in IBM CMOS 9SF, a 90-nm bulk triple-well CMOS technology. Static logic in this technology operates over 2 Gb/s, and, consequently, can directly drive a 4:1 output multiplexer at serial rates faster than 8 Gb/s. In this implementation, we targeted 6 Gb/s operation. The output multiplexer is a current-mode logic (CML) stage which is linear, operates at a low supply voltage, and has a relatively fixed largesignal output impedance to avoid source reflections present in some low-power transmit schemes [10] .
The 4:1 output multiplexer provides advantages when considering amplitude pre-emphasis [13] . The output multiplexer schematic is illustrated in Fig. 7 . Each bit is transmitted sequentially and is available for three additional bit periods during which the next bit must be set-up. Using two cross-coupled multiplexers, we can implement one-tap amplitude pre-emphasis. The first multiplexer transmits the original bit and the second multiplexer is cross-coupled to invert the bit during the following period. Sequential clock phases trigger the two multiplexers to provide one bit delay. Therefore, the 4:1 multiplexer adds amplitude pre-emphasis to the output data without requiring additional circuitry and power to latch and hold the data. The only cost to this scheme is the additional pre-driver current required to drive twice the capacitance at the input of the output stage. However, this pair of cross-coupled multiplexers also reduces the ISI and DDJ inherent in the output multiplexer architecture. The gate-drain capacitance of the output stage provides a parasitic path for energy coupling between the input and output as shown in Fig. 7 . With the cross-coupled multiplexer, the gate-drain capacitance is neutralized. The accuracy of this neutralization is subject to the limits of process variations and mismatch for these large output multiplexer transistors m [16] . The output swing, , is designed for 600-mVpp differential, a tradeoff between the link sensitivity requirement and the headroom restrictions of a 1-V supply.
The phase pre-emphasis combinatorial logic is shown in Fig. 8 The phase pre-emphasis calculation for data-dependent jitter compensation results in three differential transition detection control bits for each of the two delay generation cells. Additionally, the combinatorial logic includes a 4-bit interface to provide for a sign adjustment in the DDJ compensation.
A 3-GHz quadrature differential clock generates the four phases used in a quarter rate architecture. The first and third clock phases are initially fully differential and control the timing of the first and third bits. The same holds for the second and fourth clock phases. Therefore, the transition detection output of the phase pre-emphasis combinatorial logic is multiplexed to a 3-GHz rate and controls the rising and falling edges of the clock respectively for the first and third bits. This process is illustrated in Fig. 9 . For instance, the rising edge of one differential clock controls the output timing of the first bit while the falling edge of this clock controls the output timing of the third bit. Consequently, the transition multiplexer introduces different phase pre-emphasis for the first and third bits on the rising and falling edges individually. The multiplexers are modulated with the quadrature clock to ensure the appropriate setup time.
The transition multiplexer output signals switch two independent delay generation cells that handle each of the quadrature differential clocks. These delay generation cells, shown in Fig. 10 , are designed with fully differential CML logic to benefit from the power supply rejection on the clocks. Each delay generation cell consists of a cascade of three 3-bit programmable delay cells. The delay generation is provided by switching between two versions of the clock: one programmable delay and one nominal delay. Depending on the DDJ sign bit, the programmable delay is introduced when a transition is detected in the transmitted data. Each consecutive delay cell is used to handle the timing deviation corresponding to a previous transtion. The programmable delay cell is designed to provide 3 ps of delay for each digital bit. Consequently, the maximum that can be implemented is 24 ps. The output buffer of the delay generation cell is designed as a bandpass buffer stage to reject low-frequency noise. Since the phase pre-emphasis scheme is implemented by modulating the clock phases as opposed to modulating the data edges as in [4] , bandpass buffering can be used to pass only frequency content around the 3-GHz clock. The bandwidth of the bandpass must meet the phase modulation requirements for the clock. A source-degenerated pMOS driver is implemented to provide low-frequency noise rejection.
At this point, the fully differential clocks are split into four different clock phases and duty-cycle distortion (DCD) is compensated. This is an additional source of DJ and must be eliminated in 4:1 multiplexers. DCD control is implemented through four individual pathways, each with four control bits as illustrated in Fig. 6 . Independent DCD control allows adjustment for process and transistor matching variations that influence the duty cycle of each data path [13] . Since the DCD is a static error while DDJ compensation is a dynamic adjustment, it will not interfere with the DDJ adjustment on the clock. The compensation of DCD for one clock phase is illustrated in Fig. 9 . The DCD circuit is based on a current starved inverter that is digitally programmable as shown in Fig. 10 .
Finally, neighboring clock phases are ANDed together. This ideally converts each clock phase to a duty cycle of one-quarter the bit period. Notably, ANDing neighboring clock phases is useful for the phase pre-emphasis scheme. As shown in Fig. 9 , the DDJ timing compensation is introduced to the clock phase that ends the transmission of the current bit. This phase also triggers the beginning of the consecutive bit. Hence, the clock hand-off is seamless, avoiding any clock overlap issues that might otherwise arise from introducing timing variations on four different clock phases.
V. RESULTS
Two break-out sites are shown in Fig. 11 . The top break-out is the entire output transmitter without the phase pre-emphasis capability. The bottom site is the entire transmitter with the phase pre-emphasis capability. The total area of the transmitters is 240 m by 150 m, roughly the area required for two pads. The transmitters also include a two-wire interface for programming the DCD and DDJ delay cells.
The individual operation of the amplitude and phase pre-emphasis is demonstrated in Fig. 12 . The first set of eyes show amplitude pre-emphasis with increasing pre-emphasis current at 6 Gb/s. The pre-emphasis gain is set through a reference current that is recorded for each data eye. As shown in the figure, the minimum swing decreases while the maximum swing increases with amplitude pre-emphasis current. The second row of data eyes show the use of phase pre-emphasis. The DDJ introduced to the signal depends on the digital code used to program the delay generation scheme. The first transition, , is compensated and the two resulting threshold crossing times are observed from the jitter histogram. The separation between the peaks increases with the code value. The variation in the total jitter is measured as a function of the pre-emphasis code in Fig. 13 at 6 Gb/s. The DDJ component is normalized out of this total jitter and the time delay compensation can be caculated from the DDJ peak separation [5] . The linearity of the time delay compensation is assessed from the slope of this curve as a function of the digital code. Finally, the bottom row of data eyes demonstrates the operation to 10 Gb/s of the transmitter without phase pre-emphasis. The DDJ compensation is limited to 6 Gb/s due to the bandpass amplifier in the delay generation circuit. The use of amplitude pre-emphasis opens the data eye slightly at 10 Gb/s to counteract the output bandwidth of the transmitter stage. Four consecutive eyes are shown to demonstrate the relative DCD matching between each of the four paths in the multiplexer. Fig. 12 . Data eyes at 6 Gb/s and 10 Gb/s demonstrating amplitude and phase pre-emphasis. The first row shows the amplitude pre-emphasis at 6 Gb/s as a function of pre-emphasis current. The second row illustrates the phase pre-emphasis as a function of the compensation code. Finally, the last row shows the operation of the transmitter at 10 Gb/s. To test the transmitter featuring phase and amplitude pre-emphasis, a pseudo-random bit sequence at 6 Gb/s was passed through two test channels. The first channel, 96 inches of RG-58 cable, has 4 dB of loss at 3 GHz. Three data eyes are illustrated in Fig. 14 . The first is the eye without any compensation. The second eye is compensated using the first-transition phase pre-emphasis. The rms jitter reduces from 16.15 ps to 11.06 ps when the DDJ code is 011. The jitter statistics are collected over 5 k points. The third eye includes first and second transition phase pre-emphasis and the rms jitter drops to 10.29 ps with the DDJ code for the second transition set to 001. The RJ is measured from a periodic patten and has a jitter of 8.06 ps. The BER bathtub curve demonstrates that at 10 BER the timing margin increases from 62 ps to 95 ps with first transition phase pre-emphasis and compensating the second transition opens the bathtub by an additional 6 ps.
The second channel, a 16 inch FR-4 backplane interconnect with Tyco Hm-Zd connectors, has 10 dB of loss at 3 GHz. The frequency-dependent attenuation in this case is reflected in the closed data eye at 6 Gb/s shown in Fig. 15 . Amplitude pre-emphasis is used exclusively in the first data eye. With the data eye open, phase pre-emphasis is used to open the eye further. The rms jitter reduces from 13.84 ps to 10.24 ps using first transition phase pre-emphasis. The change in the rms jitter is demonstrated as a function of the pre-emphasis (DDJ,1) code in Table I . Clearly, DDJ,1 minimizes the rms jitter in the data eye. The improvement at 10 BER is shown across sampling times and voltage thresholds. Notably, the phase pre-emphasis opens the data eye slightly in the time domain and voltage domain.
The transmitter nominally operates at 1.0 V but can operate from 0.8 to 1.2 V, offering a tradeoff between power consumption and performance. The transmitter without phase pre-emphasis consumes a minimum of 18 mW of power at 6 Gb/s with a 600 mVpp swing, giving a power budget of 3 mW/Gb/s. The output multiplexer draws 12 mA while the data buffers and clock driver, including DCD control and phase ANDing, consume the remaining 6 mA from a 1-V supply. The power consumption of the two-wire interface and clock generation are not included in this power budget. Amplitude pre-emphasis increases the power budget through the current drawn through the amplitude preemphasis multiplexer. The transmitter with phase pre-emphasis consumes a minimum of 24 mW of power at 6 Gb/s, giving a power budget of 4 mW/Gb/s. This additional power is consumed primarily in the custom designed CML delay generation stages and the phase pre-emphasis combinatorial logic. The power consumption is scanned with the bias current in Fig. 16 to show the achievable differential signal swing and power consumption. The voltage swing is demonstrated on the right axis and the power consumption is on the left axis. As expected the voltage swing tracks the power consumption for both implementations of the transmitter. The transmitter with phase pre-emphasis demands roughly 1 mW/Gb/s more over the entire output swing range. However, the power consumption of the phase preemphasis scheme is not fundamentally limited to 1 mW/Gb/s and migration of the delay generation from CML logic to static logic should realize additional power advantages.
VI. CONCLUSION
This work describes a novel equalization technique for amplitude and phase pre-emphasis in bandwidth limited interconnects. Phase pre-emphasis is introduced to compensate for datadependent jitter introduced over the channel. Combining amplitude and phase pre-emphasis gives the flexibility to tailor the signal integrity of the data eye. The implementation of this amplitude and phase pre-emphasis transmitter is demonstrated in 90-nm CMOS. The architecture builds upon a 4:1 multiplexer that allows for efficient implementation of amplitude pre-emphasis. The transmitter consumes between 18-24 mW of power at 6 Gb/s, giving a power budget of 3-4 mW/Gb/s/Channel. The transmitter operation is demonstrated over 96 inches of cable as well as a 16 inch backplane interconnect with connectors. 
