40-Gb/s 2:1 Multiplexer and 1:2 Demultiplexer in 120-nm Standard CMOS
I. INTRODUCTION
T ODAY'S serial data communication systems operate at throughputs between 10 and 40 Gb/s. Up to now, communications integrated circuits (ICs) operating at such high speeds were engineered using GaAs, InP, or SiGe bipolar technologies. Heavy emphasis was placed on finding the right match between circuit techniques and fabrication technology.
This work demonstrates CMOS to be a viable alternative for broad-band circuit design at 10+ Gb/s. The approach is very economical because of the lower production costs, higher yield, and integration density. Recent achievements in CMOS multiplexer and demultiplexer designs which fully exploit the speed potential of a 120-nm standard CMOS technology are presented. Advanced circuit techniques and a state-of-the-art fabrication process are combined to extend speed limits.
Data multiplexer (MUX) and data demultiplexer (DEMUX) are key blocks in high-speed data communication systems. Current 2:1 MUX already achieve operating speeds of 20+ Gb/s in CMOS [1] , [2] . A 30-Gb/s DEMUX in CMOS has been reported [3] and a complete 10-Gb/s transmitter/receiver with an integrated 16:1 MUX/DEMUX in CMOS has been published [4] , [5] .
We have designed a 40-Gb/s 2:1 MUX and 1:2 DEMUX using a 120-nm standard CMOS process with six-layer copper metallization. The manufactured nMOS transistors have an of 100 GHz and an of 50 GHz [6] . All subcircuits of the MUX and DEMUX ICs use current-mode logic (CML) with differential signals. Compared to conventional static CMOS logic, CML allows a reduction of the internal voltage swing. The lower internal voltage swing is essential for high switching speeds. To reach this speed, the 2:1 multiplexer uses shunt and series inductive peaking, which nearly doubles the bandwidth.
II. 2:1 MUX CIRCUIT
The 2:1 MUX IC (Fig. 1) consists of a master-slave flip-flop (MS-FF), a master-slave-master flip-flop (MSM-FF), and a multiplexer stage (MUX 2:1). The 2:1 MUX IC features two in-phase differential 20-Gb/s input signals, D1 and D2. The in-phase data input is necessary for higher integration levels like a 4:1 multiplexer. The 90 phase shift between the MUX stage inputs is achieved by adding an extra latch to one path (MSM-FF). Each latch adds a delay of 90 to the data. The latches apply a high voltage swing of 600 mV to the MUX stage inputs, which is necessary due to the low gain of the transistors at such high frequencies. Finally, the data streams D1 and D2 are multiplexed by the 2:1 MUX stage to a 40-Gb/s output data stream. The 2:1 MUX IC uses no output buffer. Fig. 2 shows the schematic diagram of the MUX circuit. Like all CML gates, the MUX works as a current switch. All transistors of the MUX circuit are nMOS devices because of their higher speed compared to pMOS transistors. Except for the current-source transistors, low-devices with gate lengths of 120 nm are used.
The current source consists of two stacked nMOS transistors with a gate length of 0.18 m. A regular-device ( mV) and a low-device ( mV) are connected in series. This configuration increases the output resistance of the 0018-9200/03$17.00 © 2003 IEEE current source. Neglecting the body effect, the output resistance of the stacked current source can be written as (1) where is the output resistance of M7 and M8, assuming that they have the same output resistance. Fig. 3 shows the drain current versus drain-source voltage (load voltage) of the current source. The stacked current source has flat current-source behavior. The main disadvantage of stacked current sources is the higher operating voltage to keep the devices in saturation. However, the minimum operating voltage is very close to the minimum operating voltage of the conventional current mirror.
In the design of the multiplexer stage, the stacked current source with device gate length of 180 nm has been chosen. The current source works above 350-mV load voltage.
The MUX stage uses series gating between clock and data inputs. All transistors in the MUX stage data path are of the same size and are 3/5 the width of the clock transistors. The lower width of the data path devices reduces the parasitic capacitance on the output. The MUX uses 70-polysilicon load resistors as low-capacitance loads. The tail current is set to 7 mA. The dc level of the sinusoidal clock signal is .
III. INDUCTIVE NETWORK
Inductive networks can be used to increase the bandwidth of CML circuits. The most common technique using inductive networks is shunt peaking. An inductance is connected in series to the load resistor of the CML circuit. This technique can increase the bandwidth of the circuit by approximately 80% if an overshoot of 8% is acceptable and ideal inductors are used. The inductor can be realized as bond inductances or as on-chip inductors. Quality factors ( factors) of on-chip inductors are lower than bond inductances, therefore, the latter are often used for peaking [7] , but also, on-chip inductors are proposed for use as shunt peaking inductors [8] , [9] .
However, additional pads for bond inductors or on-chip inductors require large chip area. Inductors cannot be used in every stage to enhance the bandwidth because this is hardly acceptable for the required chip area. In the case of multiplexers, it makes sense to use inductive networks only at the fastest multiplexer of the system. On-chip inductors have low factors due to the limited conductivity of the metal, substrate loss, and parasitic capacitance. Shunt peaking can improve the bandwidth by approximately 50%, assuming the use of on-chip inductors.
Another technique to enhance the bandwidth of CML circuits is series peaking. An inductance is connected in series to the output of the CML circuit. The output network acts as a filter which consists of various parasitic capacitances, load resistors, bond inductances, and on-chip inductors. Series peaking can additionally improve the bandwidth by approximately 45% when combined with shunt peaking. Series peaking makes sense if it is used in combination with shunt peaking. If only series peaking is used, then the enhancement in bandwidth is low. Using series peaking and shunt peaking can nearly double the bandwidth of a CML circuit. The 2:1 MUX IC uses both peaking techniques to achieve high operating speeds. Fig. 4 shows the equivalent circuit of the output network of a CML circuit with shunt and series peaking. The output network is a fifth-order system consisting of various parasitics, load resistors, a shunt, and a series peaking inductor. The shunt peaking inductor is connected in series to the load resistor while the series peaking inductor is connected between the circuit and the pads. The differential signals allow mutual coupling of the inductors denoted by the coupling coefficients and .
A. Shunt and Series Peaking
The load resistors of the CML circuit are realized as polysilicon resistors. The load resistors are 70 , which is a compromise between high voltage swing and reasonable output matching. The external load is . The series resistance of the nonideal inductor is added to the load resistor . Parasitic capacitances are denoted in Fig. 4 by and are the sum of (2) where are the transistor parasitics, are the parasitics of the poly resistor , and are the interconnect parasitics. The pad capacitance is separated by the series inductor from . The nonideal series inductor has a series resistance due to the limited conductivity of the metal. The bonding wires are denoted in Fig. 4 by . The wires are mutually coupled as denoted by . The effective bond inductance seen on each side is
The coupling coefficient of the bonding wires in our setup is in the range of . The parasitic capacitances , the pad capacitance , and the bond inductance are kept as small as possible. We can optimize the shunt peaking inductor and the series peaking inductor to improve the bandwidth.
IV. PEAKING INDUCTORS
The design issues concerning inductive peaking treat an ideal inductor with an inductor series resistance. This model illustrates how to optimize the inductance values of the shunt and series peaking inductors. An on-chip inductor has a complex model and behaves differently then an ideal inductor. Many investigations have been done to make a precise high-frequency characterization of inductors [10] , [11] . In each application, different behaviors of the inductors have been observed.
The multiplexer shown in Fig. 2 is a fully differential design with fully symmetric peaking inductors. Fig. 5 shows the layouts of the shunt peaking inductor and the series peaking inductor used in the design of the 2:1 MUX IC.
The shunt peaking inductor shown in Fig. 5 (a) has a center tap where the supply voltage is applied. The two ports and are connected to the load resistors of the circuit diagram shown in Fig. 2 the substrate is represented by . The substrate parasitics and losses are denoted by and . The interwinding capacitance is modeled by . The extraction of the parameters is done with an in-house tool [11] which uses two cores [12] , [13] to extract the inductance and the parasitic capacitances.
The series peaking inductor is shown in Fig. 5(b) . The ports and are connected to the drains of transistors M1-M4 of the multiplexer circuit shown in Fig. 2 . The coil has three turns and consists of Metal 6 and Metal 5 connected in parallel. The inner diameter is 30 m and the outer diameter is 62 m. Fig. 7 shows the equivalent circuit of the symmetric series peaking inductor which is used for the simulation. The symmetric inductor has a coupling coefficient of . The inductance is nH. The effective inductance seen on each side is nH. The series peaking coil is connected in series to the output. Therefore, it is highly desirable to get a series inductor with a high factor. The series resistance causes additional losses at the output transfer function of the multiplexer and should, therefore, be kept as small as possible. The oxide capacitance can be added to the parasitics and of Fig. 4 . A high substrate resistance minimizes the loss due to substrate coupling.
On the other hand, the series resistance of the shunt peaking inductor is in series to the load resistor of the multiplexer and is, therefore, not critical. The quality factor of the shunt peaking inductor is mainly determined by . It is not necessary to optimize the shunt peaking inductor for a high factor. 8 shows the simulated factor of the shunt peaking coil and the series peaking coil versus frequency using the models of Fig. 6 and Fig. 7 . In the operating mode of the shunt peaking inductor, the parasitics of the center tap (Fig. 6 ) are shortened to the supply voltage. Therefore, the shunt peaking inductor has a high factor of at 20 GHz. In contrast, all parasitics of the series peaking inductor lower the factor. The series peaking inductor has a simulated factor of at a frequency of 20 GHz.
A. Simulations Using Inductor Models for Shunt and Series Peaking
The models for the shunt peaking coil and the series peaking coil extracted from the inductor geometries can be used in SPICE simulations. The models allow optimization of the circuit. Fig. 9 shows the simulation of the small-signal transfer function of the output network when the inductor models of Fig. 6 and Fig. 7 are used. The simulations are produced with fF, nH, , and . The logarithmic frequency axis in Fig. 9 is normalized to GHz which is the 3-dB cutoff frequency when no inductive peaking is used. A bandwidth improvement of 50% can be achieved when shunt peaking is used. The simulations show that a bandwidth of GHz can be achieved by using shunt and series peaking.
However, the bandwidth of the output network is not all that must be observed. It can be shown that a constant group delay is highly desirable for digital communications [8] , [14] . In the output network, only the peaking inductors are varied and, therefore, a Bessel characteristic, which would be optimum, is not possible. Fig. 10 shows the simulated group delay of the output network when the extended inductor models are used. A flat group delay is highly desirable in order to omit intersymbol interference. However, the group delay is not flat in any case at frequencies above . If shunt and series peaking is used the group delay has a simulated delta of ps up to . The affect of shunt and series peaking in combination can be seen in the step response of the multiplexer circuit. While the transfer function only shows the small-signal behavior, the step response shows the large-signal behavior of the multiplexer circuit. On the clock input of the MUX circuit, a step signal with 2 400 mV is applied. Fig. 11 shows the step response of the MUX circuit (single-ended signal). The simulated rise time (20%-80%) when no inductive peaking is used is ps. When shunt peaking is used, the rise time is ps, which is 26% lower. When series and shunt peaking is used, the rise time is ps, which is 31% lower compared with no-peaking.
Series and shunt peaking in combination causes an overshoot of about 6% in the step response. In reality, a rectangular clock never occurs when the circuit operates at high data rates. Clock and data signals are sinusoidal waveforms. (Fig. 12) consists of two MS-FFs and output buffers (BUF). When a 40-Gb/s data stream is applied, the MS-FFs are clocked at 20 GHz. To sample every bit of the 40-Gb/s input data, the clock of one MS-FF is in phase while the other one is inverted. A separate buffer for each output decouples the MS-FFs from the 50-environment.
The MS-FF (Fig. 13) consists of two latches connected in series. The latches are realized in the well proven CML design. As in the MUX, all transistors in the core are low-120-nm nMOS devices. The latches use series gating between clock and data inputs. All data path transistors are of the same size and are 3/5 the width of the clock transistors. Polysilicon resistors are used as loads. No inductive peaking is used because of the large number of latches and, therefore, large chip area needed for the inductors. Nevertheless, latches can make use of all advantages discussed in Section IV. One latch consumes 7 mA. The clock input is realized with 100-on-chip resistors, which are connected to a dc level shifter ( ). To get the demultiplexed data off the chip, a line driver is used. It consists of a pair of differential amplifier stages. Fig. 14 shows the schematic diagram of the line driver.
In each stage, the tail current is three times the current of the previous stage. The first stage offers a high-voltage swing of 660 mV. This high-voltage swing drives the second stage, which works as a limiting amplifier. The last differential amplifier is designed to provide enough voltage swing with a 50-load. To achieve full voltage swing, at least 10-GHz bandwidth is needed for the output buffers. To enhance the bandwidth of the output stage, shunt peaking is used. The advantages of shunt peaking are considered in Section III. The shunt peaking inductor is of the same kind used in the 2:1 MUX IC. A drawing of the inductor layout is shown in Fig. 5(a) .
VI. TECHNOLOGY
The circuit is fabricated in a 120-nm CMOS technology with six-layer copper metallization and silicon-oxide dielectric ( ). The chip has a size of mm and is determined by the pad frame. The active area is only a fraction of the total chip area. Due to fill structures, the chip micrograph shows only the topmost metal layer. The manufactured nMOS transistors have a cutoff frequency of 100 GHz and a maximum oscillation frequency of 50 GHz [6] .
VII. EXPERIMENTAL RESULTS Fig. 15 shows the high-frequency test fixture of the 2:1 multiplexer IC. The 2:1 MUX IC is mounted on a mm microwave ceramic. We use SMA connectors and coupled microstrip lines on the substrate. The chip is bonded with a wetch-wetch bonder to the ceramic substrate. The chip is mounted in a cavity so that the surface of the chip is plane to the surface of the ceramic substrate. This allows very short bond wires, which increases the bandwidth of the MUX. The MUX has a chip size of mm . Fig. 16 shows a chip micrograph of the MUX.
The multiplexer is tested with two pseudorandom bit sequences (PRBS of ). The input voltage swing is mV . The sinusoidal clock signal has a voltage swing of mV . Fig. 17 shows the measured eye diagram of the differential output signal at a data rate of 40 Gb/s. The measured eye opening is mV on an external 50-load. The bounce at the top and the bottom of the eye shows that the circuit is at the limit and full current switching is not achieved. The transistor models are too optimistic at such high speeds. The 2:1 MUX draws 66 mA at 1.5-V supply voltage.
For measurements, the 1:2 DEMUX IC is mounted on a mm 0.51-mm RO4003 microwave substrate with SMA connectors. The demultiplexer is tested with two PRBS, running at 20 Gb/s, which are then multiplexed by a SiGe 2:1 MUX [15] to get a 40-Gb/s data stream with sufficient voltage swing. The input signal voltage swing is mV . The sinusoidal clock signal has a voltage swing of mV . The DEMUX has a chip size of mm . Fig. 18 shows a chip micrograph of the DEMUX. The peaking coils cause a rapid change in the group delay at the cutoff frequency, which results in intersymbol interference. Fig. 19 (top) shows the 40-Gb/s input eye diagram applied to the differential DEMUX input. The 1:2 DEMUX draws 72 mA at 1.5-V supply voltage. Table I summarizes the features of the chips.
VIII. CONCLUSION
CMOS has been demonstrated to be a viable technology for high-bit-rate broad-band circuit design at 10+ Gb/s. CMOS offers low production costs, high yield, and integration density.
A 40-Gb/s 2:1 MUX IC which fully exploits the high-speed potential of a 120-nm standard CMOS technology has been presented. The multiplexer is a fully differential design realized in CML. The MUX draws 66 mA from a 1.5-V supply and consumes 100 mW. To enhance the bandwidth, the 2:1 MUX IC uses inductive peaking extensively. The advantages of series peaking in combination with shunt peaking were demonstrated.
Inductive peaking can significantly increase the speed of the multiplexer. Simulations of the rise time show a speed increase of 26% compared with the nonpeaked multiplexer when shunt peaking is used. A speed enhancement of 31% can be achieved when shunt and series peaking is used.
Fully symmetric on-chip inductors with high factors were used as peaking coils. Precise extraction of the inductor parasitics is necessary to get reliable models for the circuit simulation.
A companion 40-Gb/s 1:2 DEMUX IC was presented. The data latch used in the DEMUX IC uses no inductive peaking and is realized in CML. Output buffers drive the demultiplexed data off the chip. The DEMUX draws 72 mA from a 1.5-V supply and consumes 108 mW.
