S
ILICON germanium (SiGe) heterojunction bipolar transistors (HBTs) have become attractive for high-speed and millimeter-wave (mm-wave) applications. The high cutoff frequency of advanced SiGe HBTs, comparable to that found in III-V devices, has led to a number of record-breaking building blocks being reported in pure bipolar technologies [1] , [2] . Furthermore, SiGe HBTs can be readily integrated with deep-submicrometer CMOS in a BiCMOS technology, paving the way for highly integrated broadband transceivers. However, power dissipation remains a concern in these systems, particularly from the packaging perspective as heat removal considerations limit the achievable level of single-chip integration. To date, little attention has been paid toward methods for reducing power consumption in high-speed SiGe HBT designs.
The main obstacle in lowering power dissipation in these circuits remains the of the SiGe HBT, which is presently approaching 1 V when the device is biased at the peak-current density. High-speed digital building blocks are usually comprised of cascades of emitter followers and bipolar differential pairs. These topologies limit the available voltage headroom and result in supply voltages of 3.3 V for emitter-coupled logic (ECL) [2] and 5 V or higher in [1] . Supply voltages encountered in MOS current-mode logic (CML) circuits are typically 1.5 V or lower for designs implemented in 130-nm technologies. When biased at peak , standard and low threshold 130-nm nMOSFETs require gate-to-source voltages around 800 and 650 mV, respectively. Hence, replacing HBTs with MOSFETs is a logical option for reducing the supply voltage. As seen in Fig. 1 , the and of a 130-nm nMOSFET are higher than those of 150-GHz SiGe HBTs for below around 600 mV. This marks a reversal of trends from the 0.5-m technology node [3] and further supports the use of MOSFETs in low-voltage high-speed applications. Still, even in 90-nm technologies where reported and values rival those obtained in SiGe HBTs [4] , performance in benchmark high-speed digital circuits such as multiplexers [5] lags that of SiGe implementations [2] . It must be demonstrated that MOSFETs can replace HBTs without sacrificing speed.
This paper reports on an effective combination of HBTs and MOSFETs in a high-speed logic family that allows for operation from lower supply voltages than pure bipolar topologies while maintaining the speed of SiGe HBT ECL. Section II examines the advantages and limitations of both MOS CML and HBT ECL families. MOS CML design is discussed, along with techniques for improving speed by minimizing voltage 0018-9200/$20.00 © 2005 IEEE swing. Cascode nMOS, bipolar, and BiCMOS topologies for high-speed digital as well as mm-wave applications are investigated in Section III which leads to an introduction of a true BiCMOS high-speed logic family. This topology, which places n-channel MOSFETs on high-speed signal and clock paths, takes full advantage of the best features of the MOSFET and of the SiGe HBT to produce a hybrid topology that is faster than its individual components. Section IV presents the design of a 45-Gb/s BiCMOS decision circuit. The circuit consists of a BiCMOS full-rate retiming flip-flop, a high-sensitivity transimpedance preamplifier, a tuned clock buffer with an integrated mm-wave transformer, and a broadband 50-output driver. Experimental results for the decision circuit, implemented in a 130-nm SiGe BiCMOS technology [6] , are reported in Section V.
II. COMPARISON OF HBT AND MOS CURRENT-MODE LOGIC
While the and provide useful figures of merit for device performance, these metrics are not representative of the speed of digital circuits implemented in a given technology. Instead, it is well known that the attainable data rate in HBT or MOS broadband circuits is limited by the RC time constants of the circuit [7] . Thus, optimizing the performance of high-speed digital blocks becomes an exercise in minimizing these time constants.
The basic inverter (INV), shown in Fig. 2 for both HBT and MOS CML, serves as a useful reference for comparing highspeed digital circuit performance. Summing the open-circuit time constants for a chain of HBT or MOS CML INVs with a stage-to-stage size scaling factor of provides a useful metric for comparing the digital speed of each technology [8] .
(1)
Here,
is the (single-ended) voltage swing given by the product of the tail current and the load resistance . and represent the base resistance and gate resistance of the SiGe HBT and nMOSFET, respectively, while is the low-frequency, small-signal voltage gain of the inverter. In both expressions, the first term describes the time constant at the output of the inverter and is approximately equal to the voltage swing divided by the intrinsic slew rate of the device. Since the collector-to-substrate capacitance is smaller than the corresponding drain-to-bulk capacitance of the MOSFET [9] , the output time constant is lower in the SiGe HBT CML inverter for identical tail currents. Parasitic interconnect capacitance loads the output and hence also degrades this time constant in both MOS and HBT circuits. The second terms in (1) and (2) account for the input time constant and are dominated in the low limit by the and components, respectively, with and describing the base-collector and gate-drain capacitances. The small-signal voltage gain exacerbates this time constant due to Miller multiplication, especially in HBT INVs. However, this problem is alleviated when employing cascode inverter topologies, as is often the case in clocked CML circuits. In SiGe HBTs, the intrinsic and extrinsic components of the base resistance can be reduced by increasing the emitter length or by using multiple emitter stripes, both of which increase base-to-collector capacitance. Hence, the time constant is approximately constant for a given bipolar technology node. Reducing it has recently been identified as a critical challenge in high-speed SiGe HBT design [10] . For n-channel MOSFETs, the gate resistance of a multifinger device contacted on one side of the gate is determined from the polysilicon sheet resistance , number of gate fingers , finger width , gate length ( ), polysilicon-to-metal1 contact resistance , and the number of contacts per gate finger as
For a given total gate width, the gate resistance can be reduced by increasing . Since the gate periphery remains constant, the gate-to-drain capacitance is unchanged. Therefore, unlike the equivalent time constant in SiGe HBTs, the time constant of the n-channel MOSFET can be reduced through layout optimization.
Minimizing the voltage swing can further improve the delays given in (1) and (2) . This voltage swing is determined by the minimum voltage required to fully switch the tail current in a differential pair. In bipolar CML, this minimum voltage is theoretically limited to about four times the thermal voltage [11] , but the voltage swing is typically chosen in the 200-300-mV range to ensure operation over all process and temperature corners [12] and to compensate for the voltage drop across the parasitic emitter resistance. Note that this choice of voltage swing also impacts , setting it to between and . In MOS CML designs, the required voltage swing is strongly dependent on the bias point. The effective gate voltage value at which the peak of the MOSFET occurs scales with technology, becoming smaller with every new generation. However, as the simulated data collected over three technology nodes in Fig. 3 indicates, the peak-current density remains approximately constant (between 0.3 and 0.4 mA/ m) as the technology scales. This is true for all new MOS generations as a result of the constant field scaling that has been applied since the 0.5-m technology node [3] , [8] . Consequently, a current-density-centric design philosophy, similar to that which is commonly employed in bipolar designs [12] , is more appropriate for foundry-independent design of MOS high-speed circuits than the -centric design philosophy found in even the most recent textbooks (e.g., [11] ). In a current-centric CML design scenario, the gate width of the MOSFET is sized relative to the tail current of the differential pair such that the device is biased at one-half of its peak current density (4) When the tail current is equally split between the transistors in the differential pair, this corresponds to a of around 300 mV in a 130-nm technology. Biasing at a higher current density degrades the circuit performance, as will be explained below. Fig. 4 shows the transconductance of a 130-nm nMOS as a function of gate voltage. Its shape is similar to that of the dependence on and exhibits three regions, typical for all deep-submicrometer technologies. At low effective gate voltages, the device follows the classical square law model and the transconductance varies linearly with [8] . When biased in this region, the voltage swing required to completely switch the MOS differential pair is given by [11] (5)
As the at which peak occurs is exceeded, the large vertical electric field leads to a significant degradation of mobility. As a result, levels off and eventually decreases for large gate voltages. If the MOSFETs are biased in this region, the required voltage swing for full switching can be approximated by (6) Thus, in the second bias region, both the scalar and terms are larger, resulting in a greater input voltage to switch the differential pair. Consequently, for a given tail current, both terms in (2) increase and hence the bandwidth of the MOS INV is degraded when biased in this region. It should be pointed out that the bias point selected by (4) falls in a third bias region where the transition from the square-law region to the high vertical field regime occurs. The actual voltage swing required for complete switching in this region lies between the values predicted by (5) and (6) . Nevertheless, these expressions, in conjunction with (2) , highlight the need to minimize voltage swing in order to improve digital circuit speed.
It should be noted that, unlike that of HBT CML, the minimum MOS CML voltage swing given by (5) scales with the technology node. For example, in a 90-nm CMOS, the voltage swing should be approximately less than that at the 130-nm technology node. While scaling will be advantageous for future MOS generations, overcoming the intrinsic slew rate limitation remains a challenge for MOS CML design. Still, with proper attention, MOS CML can be made to profit from its low input time constant. The contribution of the gate resistance can be diminished through layout techniques, and the Miller effect is reduced due to the low MOSFET gain. Note, however, that the gain must still be greater than unity to maintain regenerative properties within the logic family.
III. BiCMOS HIGH-SPEED TOPOLOGIES
As mentioned earlier, supply voltages can be reduced by integrating MOSFETs into bipolar high-speed digital circuits. Noting that CML and ECL circuits such as multiplexers and latches consist of stacks of transistors similar to cascodes, it is beneficial to examine the frequency response of cascode amplifiers to determine the most efficient method for incorporating MOSFETs. The four cascode structures available in a BiCMOS technology are depicted in As was the case for the HBT and MOS inverters analyzed earlier, the first two terms in (7)- (10) represent the output and input time constants, respectively. Similarly, the former is lower due to an HBT common-base output stage, while the latter can be reduced with a MOS common-source input stage. The third term accounts for the intermediate time constant at the collector (or drain) of the common-emitter (or common-source) input transistor and is inversely proportional to the of the commonbase (or common-gate) transistor. Fig. 6 illustrates the time constant contribution to the delay of each cascode amplifier as estimated from (7)- (10) . Upon inspection, the BiCMOS cascode with MOS common-source and HBT common-base minimizes each of these terms and is expected to have the best frequency response.
To verify this assertion, the cascode structures were fabricated in a production 130-nm SiGe BiCMOS technology [6] . Each device was sized such that it reaches its peak at a bias current of 6 mA. Gate finger widths of 2 m were used for MOSFETs as a good compromise to lower the gate resistance without degrading by increasing the contribution of the gate-to-well capacitance to the total gate capacitance. Small-signal -parameters were measured up to 50 GHz for each amplifier, and the transducer gain is plotted in Fig. 7 . The measured 3-dB bandwidths of the HBT, HBT-MOS, and MOS cascodes are 25, (7)- (10) with a loading factor k = 1. and represent the time constant at the input and output, respectively, while accounts for the time constant at the collector (drain) of the common-emitter (-source) transistor. 14.5, and 23.5 GHz, respectively, while that of the BiCMOS cascode exceeds the 50-GHz measurement capabilities. Despite operating from a lower supply voltage, the BiCMOS cascode outperforms even the HBT cascode and shows the benefits of combining the low gate resistance of the MOSFET with the small collector-to-substrate capacitance of the SiGe HBT. The latter is much lower than the corresponding drain-to-bulk capacitance of the nMOS, which is responsible for the lower 3-dB bandwidths of the two cascode topologies with MOS common-gate output transistors. Of additional concern with the HBT-MOS and MOS cascode is the intermediate time constant at the source of the common-gate transistor, which is comparable to the input and output time constants and further degrades both the bandwidth and the stability of topologies with a MOS common gate. In fact, the detrimental impact of the intermediate time constant in the MOS cascode has led to the insertion of inductors at this node in certain MOS CML designs to mitigate its effects [13] . It is well known that the BiCMOS topology has excellent stability, prompting its use for lower frequency applications such as folded-cascode op amps [11] .
It can be argued that the larger gain of the HBT cascode could be traded off to achieve better bandwidth than that of the BiCMOS cascode. However, to prove that this is not the case, as seen in Fig. 8 , the measured maximum available gain (MAG) of the BiCMOS cascode, which is greater than 20 dB at 50 GHz, is comparable to that of the HBT. This makes the BiCMOS topology well suited for numerous high-speed and mm-wave applications, including low-noise amplifiers, voltage-controlled oscillators, broadband amplifiers, and high-speed digital logic. The latter will be explored in the following section.
IV. 45-Gb/s DECISION CIRCUIT DESIGN
The role of the decision circuit is to retime data, potentially at low input levels, using a low-jitter full-rate clock, making it one of the most difficult blocks to design in a broadband serial receiver. A block diagram of the 45-Gb/s decision circuit is shown in Fig. 9 . It consists of a BiCMOS D-flip-flop (DFF), a high-sensitivity transimpedance preamplifier, a tuned clock buffer with integrated transformer, and a broadband 50-output driver. The design of each of these blocks will be discussed in this section.
A. BiCMOS D-Flip-Flop
The BiCMOS topology discussed in Section III in the context of CML inverters can readily be applied to more complex digital circuits such as latches or selectors. The BiCMOS implementation of a D-latch is illustrated in Fig. 10 . It is emphasized that the highest frequency signal, the full-rate clock, is applied to the input of the device with lower , the nMOS differential pair. A smaller time constant at this node is more important in maximizing the switching speed than the of the transistor. SiGe HBTs are used for the upper level data inputs, as their high slew rate results in fast rise and fall times. MOS source-follower (SF) stages are employed instead of HBT emitter followers (EFs) to allow for operation from a 2.5-V supply. This is particularly important in the feedback path since the or of the follower limits the voltage headroom. Low-threshold nMOSFETs would further reduce the supply voltage to 1.8 V. With a 2.5-V supply, sufficient voltage headroom is available to use HBT EFs along the clock path to extend the frequency response beyond 50 GHz, but SFs would be required from a 1.8-V supply.
B. High-Sensitivity Transimpedance Preamplifier
Traditionally, INV, EF-INV, or EF-Cherry-Hooper topologies have been preferred for high-speed preamplifiers (e.g., [14] ). All require on-chip 50-matching resistors, which add noise and reduce the sensitivity of the preamplifier. While the use of EFs improves bandwidth, this form of series feedback at the input increases the already large optimal noise impedance of the transistors in the INV or Cherry-Hooper gain stages. Unless excessively large bias currents and transistor sizes are employed, a noise-impedance mismatch is produced when these preamplifiers are placed in a 50-environment, which further degrades the sensitivity.
The transimpedance stage of Fig. 11 is instead employed as a preamplifier for the decision circuit. Amplifiers with transimpedance feedback have only recently been considered as low-noise voltage preamplifiers [15] . The use of shunt feedback lowers the optimal noise impedance, which improves sensitivity in a 50-environment, with minimal current consumption. Appropriate choice of the loop gain and feedback resistor resistor results in a broadband impedance match given by (11) A number of mm-wave inductors [16] are employed throughout the circuit to extend bandwidth without excessive power dissipation. The 40-fF pad capacitance is absorbed in an artificial transmission line with input inductor [17] , while a feedback inductor filters high-frequency noise from the feedback resistor [15] .
peaks the output node to ensure proper spacing of the open-loop poles for a maximally flat frequency response. The outputs are taken from the collectors of Q1 and Q2 to improve gain. A split-resistor load alleviates headroom concerns, with the ratio of R2 to R1 set to unity for maximum bandwidth [18] . A bipolar INV stage follows the transimpedance stage for additional gain. 
C. Clock Buffer With Integrated Transformer
The 45-GHz clock is the highest frequency signal in the decision circuit, making the design of a clock buffer challenging. Moreover, differential clock signals are required for high-speed logic, but only single-ended mm-wave signal sources are available for testing purposes. At lower frequencies, single-endedto-differential conversion can be achieved via an active balun or a differential pair with the unused input terminated off-chip. However, at mm-wave frequencies, poor common-mode rejection renders this approach less effective, as the differential outputs exhibit amplitude mismatch and phase misalignment. Previous designs in the mm-wave regime have relied on expensive off-chip techniques [1] or on-chip components such as rat-race couplers at 80 GHz [19] to perform single-ended-to-differential conversion. Even at 80 GHz, a quarter-wavelength in silicon dioxide is about 470 m, making the rat-race approach quite area-intensive. In this work, the first silicon-based mm-wave monolithic transformer is used to generate differential clock signals from a single-ended signal source. The transformer, whose die photo is shown in Fig. 12 , consists of two coupled symmetric inductors and occupies an area of 45 m 45 m, which is about 1/100th of the area of the 80-GHz rat-race coupler [19] . The schematic of the clock buffer is shown in Fig. 13 . Two tuned stages are cascaded for additional gain to compensate for limited signal source power and losses in the cabling and transformer. The clock buffer makes use of the BiCMOS cascode introduced in Section III due to its excellent high-frequency performance. The series addition of small resistors intentionally degrades inductor quality factor and improves the bandwidth of the otherwise narrowband topology to allow for testing over a wide range of data rates. 
D. 50-Output Driver
A broadband output buffer was designed to drive external 50-loads for testing purposes. The schematic, shown in 
V. EXPERIMENTAL RESULTS
The 45-Gb/s decision circuit was implemented in the 0.13-m SiGe BiCMOS process mentioned earlier. The chip microphotograph is shown in Fig. 15 and employs a total of 24 mm-wave inductors and one mm-wave transformer. To obtain adequate self-resonant frequency for mm-wave applications, it is critical to minimize the inductor footprint over the substrate [15] . As each passive device is at most 45 m 45 m, it is feasible to integrate such a large number of them on a single die without consuming an exorbitant area. The total chip area is 1.0 mm 0.8 mm.
A separate test chip was also fabricated, consisting of the transimpedance preamplifier and broadband output driver. Fig. 16 shows the single-ended -parameters of this differential test structure, which were measured on-wafer up to 50 GHz with the unused input and output terminated off-chip at 50 . The 3-dB bandwidth is approximately 45 GHz, and the input and output return losses are both less than 10 dB over the entire measured frequency range. Since feedback is used to set the input impedance, it is observed that the single-ended and differential impedance matches are not identical. Note that, if a signal is applied only to the positive input of Fig. 11 and the negative input is terminated, only a portion of this signal is amplified and fed back to the positive input. The remainder is injected through the emitters of Q1 and Q2 into the other half-circuit of the differential amplifier. Nonetheless, despite this amplifier being designed for a differential 100-impedance match, single-ended measurements still show a close match to the 50-signal source.
Eye diagrams for the 45-Gb/s decision circuit were also measured on-wafer using 65-GHz GGB probes, 12-in 2.4-mm cables and an Agilent 86 100A DCA with the 86 118A 70-GHz dual remote sampling heads and external precision timebase. As the Anritsu MUX delivers a fixed 2-swing to a 50-load, power attenuators were inserted between the output of the MUX and the input of the decision circuit to reduce the signal level to as low as 40 mV, and as high as 200 mV. This signal was applied to one side of the differential circuit, with the unused input terminated off-chip in 50 . As shown in Figs. 17-19 , operation up to 45 Gb/s was verified by applying a PRBS pattern and 45-GHz clock from an Anritsu 43.5-Gb/s MP1801A MUX and pattern generator to the data and clock inputs, respectively. These measurements exceed the factory-specified range of the Anritsu MUX, which led to a slight increase in both input clock and data jitter when measured at 45 Gb/s. Fig. 17 presents 40-Gb/s input and output eye diagrams for a 60-mV single-ended (30 mV per side) input signal, indicating excellent sensitivity due to the low-noise input transimpedance stage. The high dynamic range of this input stage also allows for larger input amplitudes, as demonstrated in Fig. 18 . Finally, a 45-Gb/s output eye is shown in Fig. 19 . Although the high slew rate suggests operation could exceed 45 Gb/s, the bandwidth of the tuned clock buffer limits testing. In 48-Gb/s measurements, significant jitter was observed at the output and indicates that the clock no longer performs the retiming function. Circuit performance, including a breakdown of power consumption per block, is summarized in Table I . 
VI. CONCLUSION
A novel BiCMOS topology, based on a BiCMOS cascode, has been introduced which takes full advantage of the best features of both the n-channel MOSFET and SiGe HBT to maximize high-speed performance. The large gain and bandwidth of the BiCMOS cascode make it well suited for mm-wave applications, such as the tuned 45-GHz clock buffer employed in this decision circuit. When applied to high-speed logic circuits, the topology allows for a reduction in supply voltages over pure SiGe HBT implementations without compromising speed. A 45-GHz retiming flip-flop which consumes 58 mW from a 2.5-V supply was implemented in a 0.13-m SiGe BiCMOS process using this logic family. CML 60-Gb/s circuits with 30-GHz clocks have recently been reported in 90-nm CMOS [5] , suggesting that 40-Gb/s full-rate retiming may not be feasible in CMOS until the 65-nm technology node. The full-rate 45-GHz clock in this work is applied to a 130-nm nMOS differential pair, making this DFF the fastest digital circuit using MOSFETs reported to date. Furthermore, previous demonstrations of full-rate retiming at 43 Gb/s in comparable SiGe technologies have operated from a supply voltage of at least 3.6 V [20] . These results indicate that the proposed BiCMOS logic topology is two generations ahead of pure CMOS while at the same time operating from lower supply voltages than ECL SiGe HBT implementations. From 1986 to 2000, he was with Nortel Networks, Ottawa, ON, Canada, during which time he was involved in the areas of semiconductor manufacturing, BiCMOS process development, yield enhancement, device characterization, and reliability. In 2000, he joined STMicroelectronics, Ottawa, where he currently heads an integrated circuits design team. His scientific interests include high-speed broadband and millimeter-wave integrated circuits.
