Abstract-This paper proposes a tree-topology multiplexer (MUX) that employs a multiphase low-frequency clock rather than a high-frequency clock. Analysis and simulation results show that the proposed design can achieve higher bandwidth and be less sensitive to process variations than the conventional single-stage MUX. In order to verify the feasibility, this proposed design is integrated with a multiphase phase-locked loop and a low-voltage differential signaling driver in a 0.18-m CMOS technology. Measured results indicate that the proposed design can operate up to 7 gigabits/s under 0.3-UI jitter limitation.
I. INTRODUCTION

S
ERIALIZED data transmission systems are usually adopted when the ratio of the on-chip data bandwidth to the off-chip I/O pin count becomes large. Multiplexers (MUX) and demultiplexers (DEMUX) are applied to convert parallel low-speed data into serial high-speed data or vice versa. Conventionally, there are tree-type [1] and single-stage [2] MUX architectures.
A tree-type MUX, as shown in Fig. 1 , is composed of multiple 2-1 MUX cells organized in a tree structure. It requires a high-frequency clock for the final stages. The frequency is half the data rate. The clock is then divided to control the successive stages. At each stage, D-type flip-flops (DFFs) are used to latch the data temporarily in order to let two input data be out of phase. It guarantees sufficient setup time and hold time for the output switch to achieve high bandwidth. However, the bandwidth demands on clock buffers and registers result in extra power consumption and circuit area.
A single-stage MUX, as shown in Fig. 2 , is composed of multiple open-drain NAND cells. It is driven by a low-speed multiphase clock. As a result, its area and power consumption are lower than that of a tree-type MUX. However, due to its large parasitic loading at the output node, the speed is also lower.
A multiphase clock generator is usually implemented by a multistage ring oscillator (OSC), whereas a high-frequency clock generator is normally implemented by an LC-tank OSC. Multiphase clock generators are likely to have wider frequency ranges than high-frequency clock generators [3] , [4] do. Low-cost and wide-range transceivers can be implemented by using multiphase clock generators [5] - [7] . However, as stated earlier, the speed limitation is the main drawback.
In this paper, we propose a multiphase-clock-based tree-topology MUX in order to achieve high speed and low power at the same time. The same 2-1 MUXs are used as MUX cells and clock deskew module to eliminate the skew between data paths and clock paths. Without retiming DFFs, the area overhead and power consumption can be reduced. This paper is organized as follows. Section II describes the proposed MUX architecture and its detailed operations. Section III analyzes the proposed MUX and compares its jitter performance with that of a single-stage MUX mathematically and simulationwise. Section IV shows the chip implementation and measured results. Finally, Section V concludes this paper. Fig. 3 shows the proposed MUX structure and its timing diagram. The structure is similar to a tree-type MUX with multiple 2-1 MUX organized in a binary tree structure. We have to note that no retiming DFF exists in the proposed MUX. The MUX is 1549-8328/$25.00 © 2009 IEEE The major distinguishing feature is the implementation of low-speed multiphase clocks for the tree-type MUX. The parasitic parameters at each stage are minimized by multiplexing only two inputs, so it achieves high bandwidth. Unlike that of the single-stage MUX, the performance of the tree-type MUX remains the same regardless of the number of inputs. The frequency of intersymbol interference (ISI) remains unchanged due to constant output parasitic effects. Note that a single-stage MUX deteriorates as the number of inputs increases.
II. PROPOSED MUX
Although the proposed tree-type structure solves the speed limitation and alleviates the jitter problem, it still has several drawbacks. The delay path mismatch creates deterministic jitter, as shown in Fig. 4 . and denote the delays for the data and control inputs for the MUX, respectively. Therefore, the data have different delay phases to the output, depending on their control. For example, the delay of or the edge of 2b is +2
, while the delay of D0 or the edge of 0 is . This mismatch is transformed into a data period variation. For the 8-1 MUX in Fig. 4(b) , the data periods are and . Here, is the data period, and the delay skew is 2 . For a general -1 MUX, the maximal skew can be derived as . Fig. 5 shows the jitter caused by such a period variation.
In order to solve this delay mismatch problem, delaymatching buffers are inserted to match the delay, as shown in Fig. 6 . The delay-matching buffers are exactly the same as 2-1 MUX cells being used in the data path. Its purpose is to balance the skew of in each stage of the data path. By letting clocks go through the same MUX, the skews are compensated. Since the tree-type MUX and delay-matching buffers are identical, the design is less sensitive to process, voltage, and temperature variations. It will be verified in the analysis and simulation later in this paper. 
III. TIMING JITTER ANALYSIS
The bandwidth of the MUX is determined by the jitter performance in addition to the 3-dB bandwidth of the MUX cell. The sources of deterministic jitter include process variations, simultaneous switching noise (SSN), and ISI. Process variation causes mismatch between control phases. The SSN caused by the large current change during the transition generates power supply noise. ISI becomes significant when the data transition time is close to or larger than the data period.
In order to compare the bandwidth, the jitter performances of the proposed MUX and the single-stage MUX are analyzed under the influence of process variation and ISI.
A. Jitter Caused by Process Variation
Process variation affects many aspects of a circuit. Among them, the performances of transistors and their associated parasitic capacitances are closely related to the jitter performance. For a 2-1 MUX cell, its driven node can be modeled as a simple one-pole system. Let denote the delay time for a signal to reach 50% of its amplitude in a one-pole system. Then, can be derived as follows:
The delay is linearly proportional to the time constant. is the channel turn-on resistance of the driving transistor, and is the total loading capacitance. Since and are changed under process variation, the delay time variation can be derived as follows:
can be regarded as jitter for the following reasons. For a MUX, data pass through different paths. The variations on the path delays create timing jitter. According to the statistical analysis of process variations, the variation on the channel resistance greatly exceeds that on the total parasitic capacitance. Therefore, it is concluded that is dominated by and . Therefore, as shown in Fig. 7 , for a conventional single-stage MUX, the jitter is derived as (3) , and are the parasitic capacitances of the pull-up PMOS, the pull-down NMOS, and the output load, respectively. is the number of multiplexing inputs. The total capacitance inside the parentheses is the total capacitance at the output node.
is the variation of the channel resistance of the driving transistors.
For the proposed MUX, is the accumulation of jitter in the stages that the signal passes through, as shown in Fig. 8 (4) (5) are the gate capacitances, and is the variation of the channel resistance of the driving MOS. The total capacitance in the bracket can be regarded as the total capacitance on the data path. Note that we assume that all nodes are driven by transistors of the same size. Since the single-stage MUX has a parallel structure, the total capacitance is proportional to . However, a tree structure has a complexity. For large , the proposed structure has a smaller jitter. Through simulating the jitter caused by process variation, Table I shows the simulated size and extracted capacitances used in both MUXs in the upper half. By (3) and (5), the low half shows the total capacitance for the MUX with different numbers of inputs (8, 16 , and 32). As one can see, single-stage MUXs have less jitter when is small. However, tree-type MUXs are better when is large. For , they have the same jitter performance. Fig. 9 shows Monte Carlo simulation using HSPICE. Thirty samples are taken and averaged for each case. As one can see, the proposed MUX equals the single-stage MUX when . It is much better when , as suggested in Table I . Of course, the single-stage MUX is better when .
B. ISI Jitter Analysis
Fig . 10 shows the simulated eye diagram. The jitter is caused by ISI effects. Here, and are the times the output waveforms pass through 1/2 when rising and falling. The jitter is . To calculate it, the -domain and time-domain transfer functions, namely, and , respectively, must be obtained first. The impulse responses of the MUX system are (6) (7) With regard to the transfer function, and can be solved by mathematical software such as MATLAB. Thus, the jitter is then obtained.
C. ISI Jitter Calculation for the Single-Stage MUX
As shown in Fig. 7 , the -domain and time-domain transfer functions of a single-stage MUX are (8) (9) and are the time constants at the phase input and the data output, respectively (10) (11) Substituting (9) into (6), the impulse response of the single-stage MUX is (12) Substituting (12) into (7), and can be obtained from
With (11) and (12), by using MATLAB, one can obtain and that satisfy the equations. Again, the ISI jitter can be obtained. In (19), is a positive integrator. By (6) and (7), we are able to obtain the following equations similar to (13) and (14) (21) (22) Similarly, by using MATLAB, one is able to obtain and that satisfy the equations as (21) and (22). As a result, the jitter caused by ISI is obtained.
D. ISI Jitter Calculation for the Proposed MUX
E. Simulation Results of Jitter Caused by ISI
According to the same setting in Table I , Fig. 11 shows the simulated and calculated jitters for MUXs of different topologies, number of inputs, and data rates. The axis is the data rate, and the axis is the jitter in unit intervals (UIs). The dotted lines show the results obtained by (13)-(14) for the single-stage MUX and (21)-(22) for the proposed MUX. The standard treetype MUX in Fig. 1 is also included.
First of all, the analyzed results match well with the simulated results. Second, the proposed MUX has less jitter than the single-stage MUX for the same data rate. Third, the proposed MUX can operate at higher data rates than single-stage ones. Also note that for the proposed one, the ISI jitter increases linearly proportionally with the number of stages or , whereas the ISI jitter is linearly proportional to for a singlestage one.
As compared with the standard tree-type MUX, the proposed MUX has the better jitter performance due to the retiming at the output stage. However, its power consumption is another issue. Fig. 12 shows the circuit structures of different MUX architectures. There, is the number of stages. Cell No is the number of cells being used in a stage. Cell Size is the size scaling of a stage as compared to the output stage. For example, for an 8-1 tree-type MUX, the cell sizes are scaled as (1, 1/2, and 1/4) according to the data rate. For logic gates, currents are normalized to a single selector as (23) , and are the currents of AND gates, DFFs, buffers, and selectors. For the standard tree-type MUX, the circuit sizes are halved, and the total number of blocks is doubled stage by stage. Hence, the total current in each stage remains the same (24) (25) For the single-stage MUX, the sizes of the clock buffer and data registers are 1/2 and of the selector according to their loading effects and operation frequency, respectively. The number of data registers is . Thus, the total current is (26)
F. Power Consumption
For the proposed MUX, the size scaling of all the selectors is similar with that for the standard tree-type MUX. The total current is (27) Fig. 13 shows the SPICE simulation results of the current consumptions for the three MUX architectures. The numbers of inputs are 8, 16, and 32. The total current is dominated by the static current. Table II (PLL) [8] is used to generate eight-phase clock signals with a wide frequency range. The proposed MUX serializes 8-bit parallel single-end data into differential outputs with a data rate that is eight times the frequency of the PLL. For off-chip driving, two multistage current-mode buffers are inserted for the MUX and PLL, as shown in Figs. 15 and 16 . The last stage is a low-voltage differential signaling (LVDS) driver [9] . The 50-termination is achieved by a parallel connection of a 112-on-chip ploy resistor and the 90-turn-on resistance of the data switches of the LVDS driver.
The predriver outputs two pairs of differential signals to control the four data switches of the LVDS drivers. Since Pand N-type switches have different input capacitances, the predrivers are organized differently, i.e., two stages for the N-type switches and three stages for the P-type ones, as shown in Fig. 15 . Different predriver stages have different circuit diagrams to meet their function demands. Their circuit diagrams are shown in Fig. 16. Fig. 17 shows the chip photograph. It is fabricated in a TSMC 0.18-m CMOS process. The PLL and the MUX occupy areas of 0.264 and 0.029 mm , respectively. The measurement is performed on a PCB in Roger material. The Agilent 81130A generates the reference clock to the PLL, and the Agilent 11801C measures the eye diagrams. The measurement is focused on verifying the analysis and simulations on the output timing jitter of the proposed MUX at different data rates. Thus, the reference clock was swept from 19.53 to 62.5 MHz that allows the PLL to oscillate from 312.5 MHz to 1 GHz. As a result, the MUX will operate at a bit rate from 2.5 to 8 gigabits/s. Fig. 18 shows the measured jitter at different data rates. The PLL and TX represent the jitters measured at the PLL output and the TX output, respectively. The dotted line represents a jitter limitation of 0.3 UI set by many serial I/O standards. As one can see, below 7 gigabits/s, the jitter is dominated by the PLL jitter. Normally, a ring-oscillator-type PLL has a higher jitter at low frequency. Above 7 gigabits/s, the jitter is dominated by the MUX. Such measured results match the simulated results shown in Fig. 11 . Both indicate that above 7 gigabits/s, the jitter begins to rise exponentially due to ISI effects.
With the limitation of 0.3 UI, the maximal operation speed is 7 gigabits/s. Fig. 19 shows the output data-eye diagram at 7 gigabits/s. The data transition time is 70 ps, and the amplitude is 400 mV. Table III summarizes the performance of the test chip. The area and power consumption for the MUX, PLL, PRBS, and LVDS are listed individually. The jitters for the MUX and PLL are also individually listed. At 2.5 and 7 gigabits/s, the peak-to-peak jitters are 92.8 and 42.1 ps, or 0.24 and 0.29 UI, respectively.
V. CONCLUSION
In this paper, we have proposed a MUX in tree topology that uses a multiphase low-frequency clock which is normally applicable to single-stage MUXs only. The parasitic effects at each stage are minimized by multiplexing only two inputs. Therefore, the jitter caused by process variation and ISI is reduced, and the data rate is increased. This has been reassured by the mathematical analysis and the circuit-level simulation as well.
The proposed MUX, with PLL and LVDS drivers, has been designed and implemented in a TSMC 0.18-m 1P6M CMOS process. It occupies an area of m m and consumes 30 mW of power at a data rate of 5 gigabits/s. It is able to operate up to 7 gigabits/s for a peak-to-peak jitter of 42.1 ps or 0.29 UI. Measured results, as well as simulated ones, suggest that the jitter is dominated by ISI effects when the data rate exceeds 7 gigabits/s. Otherwise, it is dominated by the PLL.
