Abstract-By using the clock timing control at transmitter (TX), the crosstalk-induced jitter (CIJ) is compensated for in the 2-bit parallel data transmission through the coupled microstrip lines on printed circuit board (PCB). Compared to the authors' prior work, the delay block circuit is simplified by combining a delay block with a minimal number of stages and a 3-to-1 multiplexer. The delay block generates three clock signals with different delays corresponding to the channel delay of three different signal modes. The 3-to-1 multiplexer selects one of the three clock signals for TX timing depending on the signal mode. The TX is implemented by using a 0.18 μm CMOS process. The measurement shows that the TX reduces the RX jitters by about 38 ps at the data rates from 2.6 Gbps to 3.8 Gbps. Compared to the authors' prior work, the amount of RX Jitter reduction increases from 28 ps to 38 ps by using the improved implementation.
I. INTRODUCTION
The multi-parallel transmission lines are widely used in DRAMs to increase the memory bandwidth at a given data rate. The crosstalk between the multi-parallel transmission lines induces the crosstalk-induced jitter (CIJ) at receiver (RX), which is a bottleneck in increasing the transmission data rate [1] [2] [3] [4] [6] [7] [8] . When two signals propagate through two-coupled microstrip transmission lines, the propagation velocity of the odd mode is slightly faster than that of the static mode. Similarly, the propagation velocity of the even mode is slightly slower than that of the static mode [6] . This difference in the propagation velocity induces CIJ at RX.
Several approaches are published to reduce crosstalk and CIJ in the multi-parallel transmission lines, by using either the line patterns in PCB [1] [2] [3] [4] [5] or the circuit techniques [6] [7] [8] . In [6] (the authors' prior work), the signals are sent out at transmitter (TX) at different 58times depending on the signal modes (odd, even, static), so that the signals with different modes arrive at RX at the same time. The odd mode signals are sent out slightly later than the static mode signals. The even mode signals are sent out slightly earlier than the static mode signals. In [7] , the received analog signal at RX is delayed by different amount depending on the signal modes, so that the signals with different modes arrive at the sampling circuit of RX at the same time. A variable analog delay line and a mode detector circuit are used in [7] . In [8] , the staggered (90° out of phase) data are transmitted along two adjacent transmission lines to reduce the CIJ effect.
In this work, the CIJ component is compensated for at TX, in the same way as in the authors' prior work [6] . Because the digital signal which is free from attenuation and distortion is used at TX, it is much easier to process the digital signal at TX than to process the received analog signal at RX [7] . The difference in the channel propagation velocity due to CIJ is compensated for by adjusting the clock timing for data at TX. Although this method is the same as that in [6] , the implementation circuit is improved significantly so that the jitter reduction of this work is increased to 38 ps from 28 ps in [6] .
Section II describes the CIJ compensation scheme using the clock timing control at TX. The circuit implementation is shown in Section III. Section IV shows the measurement results. Section V concludes this work. In this work, different clock edges are used for data sampling at TX according to the different signal modes. The clock edge for the even mode signal (E) is set earlier than that of the static mode signal (S), while the clock edge for the odd mode signal (O) is set later than that of the static mode signal. If the timing difference among these clock edges is adjusted to be the same as the difference in the channel propagation delay among the signal modes, all the signals with different signal modes arrive at RX side at the same time. This maximizes the eye opening at RX. Fig. 2(a) shows the block diagram of a pair of the proposed transmitter (TX) proposed in this work. It consists of a delay block for clock signal (Fixed DelayBlock), a mode detector (ModeDetector), a 3-to-1 multiplexer (3to1Mux) and two output sampling circuits (DFF1, DFF2). Fixed DelayBlock generates three clock signals (CLKD) with slightly different edge timings for three signal modes. ModeDetector determines the signal mode (MODE [1:0] ) by referring to the two digital input signals (D1 and D2) applied to a pair of microstrip lines [6] . 3to1Mux generates the final TX timing clock (CLKT) for output drivers by selecting one of the three CLKD signals according to the signal modes (MODE [1:0] ). The TX data timing is performed by applying the clock signal (CLKT) to the two output drivers with D flip-flops.
II. CIJ COMPENSATION USING TX CLOCK TIMING
The basic principle of this work (the CIJ reduction by the TX clock timing) is the same as that of the authors' prior work [6] . This work simplified the variable delay circuit (Variable DelayBlock) in Fig. 2 (b) by using a combination of a fixed delay circuit (Fixed DelayBlock) and a multiplexer (3to1Mux) in Fig. 2 [6] . 3to1Mux determines CLKT (the clock for 4to1Mux) by selecting one of the three CLKD according to MODE [1:0] . The TX data timing is performed by applying the shared clock (CLKT) to the two D flip-flops (DFF). The output driver is implemented by using a single-ended NMOS open drain driver with an on-die termination resistor (ODT). Fig. 4 shows the receiver circuit of this work, which is made up of simple circuits to monitor the eye shmoo plot and BER by controlling the Ext_Vref level and the Ext_CLK_RX delay. The PreAmp is implemented by using a NMOS differential pair with an input ODT.
(a). Variable

DelayBlock
Fig . 5 shows the circuit schematic of DelayBlock, which generates three delayed clock signals CLKD (CLKD_Even, CLKD_Static, CLKD_Odd) for each phase of the 4-phase clock signal CLK. The amount of delay for each signal mode is determined by the number of unit NMOS capacitors connected to the inverter of DelayBlock. To control the amount of delay linearly, a 9-bit thermometer code (DC [8:0] ) is used to control the number of unit capacitors.
To make the delay between the even and odd mode clock signals (CLKD_Even and CLKD_Odd) twice that between the even and static mode clock signals (CLKD_Even and CLKD_Static), the number of unit capacitors to be connected is set twice in the CLKD_Odd path (x2) than in the CLKD_Static path (x1) for the same DC [8:0] code. Compared to the authors' previous work [6] , the number of delay stages is reduced significantly in this work. This reduces the amount of jitter generated in the delay block significantly. Fig. 7 shows the test setup of this work, which consists of the TX and RX chips with the on-die termination resistors and the two parallel data channels. The data channels consist of 2-coupled microstrip lines on FR4 PCB. The microstrip lines are 5 inches long.
IV. MEASUREMENT RESULTS
There are rather shortened deliberately to minimized ISI (inter-symbol interference). The spacing between the microstrip lines are minimized to the limit of the PCB fabrication process. A 2 7 -1 PRBS data pattern is applied to TX. A sampling oscilloscope (86100C) is used to measure the eye-diagrams at the TX and RX chip pins. Fig. 8 shows the measured eye-diagrams, with (w/ ) and without (w/o) the CIJ compensation at the data rate of 2.6 Gbps. The CIJ compensation reduced the RX jitter by 41.1 ps from 88.9 ps to 47.8 ps. The control code CIJ_AMOUNT [3:0] was externally adjusted to minimize the RX jitter. Fig. 9 shows the RX eye-diagrams of Fig. 8 , which are enlarged in the time-domain at two steps (one in 65 ps and the other in 20 ps for the horizontal unit time interval). In the jitter histogram measured at the midvoltage level, three distinct peaks can be observed for the case without the CIJ compensation. With the CIJ compensation, only a single peak can be observed in the jitter histogram. Based on these observations, we can conclude that the CIJ compensation eliminates the jitter component CIJ completely. The RX jitter is reduced from 88.9 ps to 47.8 ps by the CIJ compensation. The voltage margin is reduced slightly from 400 mV to 394 mV. The voltage margin is measured at the center time of the eye-diagram. Fig. 10 shows the measured RX eye-diagram enlarged in the time-domain at the higher data rates. At 3.2 Gbps, the RX jitter is reduced from 88.9 ps to 51.1 ps by the CIJ compensation. At 3.8 Gbps, it is reduced from 88.4 ps to 53.3 ps. Fig. 11 represents the measured RX jitter at different data rates, with and without the CIJ compensation. Because the CIJ only depends on the geometry of the coupled transmission line [6] , the measured CIJ (Δ) does not show any significant changes with the data rate. The CIJ values calculated from the measurements shown in Fig. 11 are almost constant at 38 ± 3.1 ps for the data rates from 2.6 Gbps to 3.8 Gbps. Fig. 12 shows the measured shmoo plot at the data rate of 3.8 Gbps. Agilent N4903A high-performance serial BERT is used to measure the BER threshold points (1E-12) by controlling the delay time (Ext_CLK_RX in Fig.  4 ) in x-axis and the threshold voltage of preamp (Ext_VREF in Fig. 4 ) in y-axis. The comparison of the two cases with and without the CIJ compensation in Fig.  12 shows that the voltage margins are the same and the time margin is increased by about 13% from 155 ps to 190 ps. Due to the clock timing control scheme, the BER opening range is shifted to the right in the time axis for the case with the CIJ compensation. Fig. 13 shows the measured bathtub curve at the Ext_Vref value of 1.41 V (mid voltage level) to maximize the time margin. The CIJ compensation increased the eye-opening with BER less than 1E-12 by 13% from 0.59UI to 0.72UI at the data rate of 3.8 Gbps. Table 1 compares the performance of this work and the authors' prior work [6] . The prior work [6] has a 4-inch-long 3-channel architecture, while this work has a 5-inch-long 2-channel architecture for data channels. The RX jitter reduction was improved to 38 ps in this work compared to 28 ps in [6] . This difference is considered to be due to the improved circuit implementation of this work, especially in the reduction of delay stages in DelayBlock (Fig. 3) .
V. CONCLUSIONS
A CIJ compensation circuit is proposed for a pair of parallel microstrip lines by sending data earlier or later at transmitter depending on the signal modes. This scheme uses a delay block with minimum delay stage and a 3-to-1 multiplexer to select one CLKT of the generated three different sampling CLKD, in advance. Compared to the authors' prior work [6] , the Variable DelayBlock circuit is changed to a combination of simplified circuits (Fixed DelayBlock and a 3to1Mux). It is implemented by using a 0.18 μm CMOS process. The proposed transmitter works at the data rates from 2.6 Gbps to 3.8 Gbps and reduces the RX jitter by about 38 ps with the 5-inch-long microstrip lines on FR4 PCB which have the minimumallowed spacing between transmission lines to maximize CIJ. Compared to the authors' prior work, the amount of RX Jitter reduction was increased from 28 ps to 38 ps.
ACKNOWLEDGMENTS
This research was supported by WCU (World Class University) program through the National Research Foundation of Korea funded by the Ministry of 
