A 40-Gh/s decision circuit is reported which operates from a 2.5-V supply. It includes a flip-flop, a broadband transimpedance preamplifier, a tuned 40-GHz clock buffer, and a 50-R output driver. The flipflop features a novel BiCMOS CML logic topology, which allows for lower supply voltages as compared with pure bipolar implementations without compromising speed. A mm-wave transformer is used to perform single-ended-to-differential conversion along the 40-GHz clock path.
Introduction
SiGe hipolar technology has become a popular choice for broadband circuits due to the high fT of the SiGe HBT and the reliability of the mature siliconbased processing technology. However, the high V, , of the SiGe HBT hinders low-voltage operation. In this paper, a novel BiCMOS logic family is presented that reduces supply voltages through effective use of nMOSFETs in the high-speed data and clock paths. 
MM-Wave BiCMOS Cascode Topologies
Recently, record-breaking high-speed building blocks have been demonstrated in SiGe bipolar technologies [ I , 21. While the performance rivals that found in 111-V technologies, the power consumption in these circuits hinders high levels of integration required for single-chip transceivers. The main obstacle in reducing power consumption remains the VRE of the SiGe HBT, which is close to IV when biased at peak-fT current densities. Cascades of emitter-followers and bipolar differential pairs, common to high-speed building blocks, limit voltage headroom and can result in supply voltages of 3.3V for ECL [2] and 5V or higher in E'CL designs [I] . Additionally, modern ECL huilding block performance is limited by RC delays rather than the forward transit time. The single-largest contribution to gate delay is from the RBCRc time constant [3] . For a given technology, this product cannot be reduced through layout techniques since any increase in total emitter length to minimize base resistance results in a commensurate rise in base-collector capacitance.
Incorporation of n-MOSFETs into high-speed bipolar topologies can simultaneously alleviate both the where W, is the MOS gate finger width, Nf is the numher of fingers, L is the polysilicon gate length, ppoly is the silicided polysilicon sheet resistance, RcOnl is the polysilicon-to-metal1 contact resistance, and is the number of contacts per gate finger. For a given total gate width, gate resistance can he reduced by increasing Nf without increasing CGD since the total gate-drain periphery is unchanged. This yields a lower input time constant over the SiGe HBT. Even for the lowest reported base resistance of 100Rpm [3] and for typical gate polysilicon sheet resistances of 5Q per square, the RGCGD time constant is at least one order of magnitude lower than the RBCBc time constant. To validate these findings, test structures were fabricated in a 0.13-1m SiGe BiCMOS technology, with peak-fT values of 90GHz at 0.25mAlpm and 160GHz at I.4mA/pm for the n-MOSFET and SiGe HBT, respectively [4] . The MOS and HBT devices were sized for peak-fT at a bias of 6 mA. Measured S2, Open-circuit time constant analysis shows that the improved bandwidth is due to both the low gate resistance at the input of the amplifier and the low collector-to-substrate capacitance (Ccs) of the SiCe HBT. The latter is considerably less than the drain-hulk capacitance of the MOSFET for the same tail current. In the HBT-MOS and MOS cascodes, the intermediate time-constant at the source of M2 is comparable to the input and output time constants, and leads to severe degradation of stability and bandwidth. While it is textbook knowledge that the BiCMOS cascode improves op amp stability [SI, this work is, to the authors' hest knowledge, the first to exploit the benefits of lower gate resistance to improve bandwidth. Its high bandwidth and MAG makes the BiCMOS cascode wellsuited for numerous (mm-wave) applications, including lownoise amplifiers, voltage-controlled oscillators, folded-cascode op amps, and low-voltage high-speed logic. The latter will be discussed immediately. 
Proposed BiCMOS Logic
A BiCMOS high-speed logic family is now derived from the BiCMOS cascode discussed in the previous section, and is illustrated in the latch of Fig. 4 . Departing from previous conventions in bipolar ECLlCML design, the hizhest frequency signal is applied to the input of the lower fT device, the n-MOS differential pair. The lower input time constant at this node is more important in achieving high bandwidth than the transistor fT While the latch is used as an example, this logic family can more generally be applied to other high-speed building blocks such as multiplexers, where the highest frequency input is the full-rate clock. SiCe HBTs are used for the upper-level data inputs. In digital applications where the output is slew-rate limited, the low C,, of the SiGe HBT results in fast rise and fall times. MOS source followers are used instead of emitter followers to further reduce supply voltage. Low-threshold n-MOSFETs would fuflher reduce the supply voltage to 1.8V. With a 2.5-V supply, sufficient headroom is available for emitter followers along the clock path to extend frequency response beyond SOGHz, but source followers would he required from a 1.8-V supply.
40-Gb/s Decision Circuit
The 40-Gbls decision circuit, consisting of a high-sensitivity input stage, 50R output driver, clock buffer, and BiC-MOS DFF, is shown in Fig. 5 . The DFF is implemented by placing two BiCMOS D-latches of Fig. 4 in a master-slave configuration. The input stage, shown in Fig. 6 , is based on a transimpedance feedback amplifier which, while typically employed as a current-to-voltage amplifier, has only recently been considered for use as a voltage preamplifier [6] . Tran- simpedance feedback lowers the optimal noise impedance, which improves sensitivity in a 50-0 environment. Appropri- 2004 Symposium On VLSl Circuits Digest of Technical Papers The outputs are taken from the collectors of the common-emitter stages to improve gain, and the split resistor load R1 and R2 alleviates headroom concerns. The ratio of R2 to RI is set to unity for maximum bandwidth [7] . A bipolar differential pair stage follows the TIA stage for additional gain.
The 40-GHz clock is the highest-frequency signal in the decision circuit, making design of a clock buffer challenging. The difficulty is compounded by the inavailability of differential mm-wave signal sources needed for CML applications. Previous designs in the mm-wave regime have either relied on expensive off-chip techniques [ I ] or area-intensive on-chip rat-race couplers [8] to perform single-ended-to-differential conversions. In this work, the first silicon-based mm-wave monolithic transformer is used to generate differential clock signals from a single-ended signal source. The transformer consists of two coupled symmetric inductors and occupies an area of 45pm x 4 5~m , which is about 1/I0Oth of the area of the 80-GHz rat race coupler [SI. The schematic of the clock buffer is shown in Fig. 7 . Two tuned stages are cascaded for additional gain to compensate for limited signal source power and losses in the cabling and transformer. The series addition of small resistors intentionally degrades inductor Q and improves bandwidth of the otherwise narrowband topology. From previous discussions, the BiCMOS cascade is bestsuited as an amplifier in this frequency range. 
Experimental Results
The 40-Gbls decision circuit was implemented in the 0.13-p m SiCe BiCMOS process mentioned earlier. The chip microphotograph is shown in Fig. 8 , and occupies an area of I.0mm x 0.8mm. The chip contains 24 mm-wave inductors and one transformer (a record for mm-wave silicon circuits), each occupying less than 45pm x 45pm. The circuit consumes 347mW from a nominal 2.5-V supply, and is functional at supply voltages as low as 2.2V. The DFF, input preamplifier, clock buffer, and output driver nominally consume 117mW, 70mW, 68mW, and 92mW, respectively for an output swing of 400mVp.p per side. The flip-flop was also measured to operate at half-power (58mW) with less than 5% degradation in speed. Fig. 9 shows single-ended S-parameters measured to 50 GHz for a separate test structure consisting of the broadband input stage and the 50-Q output driver. The 3-dB bandwidth is over 50 GHr, and input and output return losses are less than -10 dB up to 50 GHz.
Eye diagrams for the 40-Gb/s decision circuit were measured on-wafer. For initial testing, a 12.5-Gb/s 23'-1 pseudorandom data stream was applied to one of the differential inputs with the other input terminated in 50R A 37.5-GHz clock signal was used so as to achieve synchronization with one of the harmonics of the clock provided by the 12.5-Gbk By adjusting the tail current of the DFF latches, the rise and fall times can be varied between 6.5 ps and 9 ps, adequate for 50-Gbk operation [9] .
Also, the quality of the output eye diagram is maintained when the output swing is changed from 150mVp.p to 400mVp.p per side. Maximum output swing is still observed with a 40-mV signal applied to one input with the other input terminated in S O Q indicating high-sensitivity due to the transimpedance input stage.
Conclusions
A novel BiCMOS logic family has been proposed that reduces the supply voltage from 3.3V to 2.5V while maintaining the speed of pure SiGe HBT ECL circuits. It benefits from the low input time constant of the n-MOSFET and the low output capacitance of the SiGe HBT. A 40-GHz retiming flipflop which consumes 58mW from a 2.SV supply was implemented using this logic family in a 0.13-pm SiGe BiCMOS process. To the authors' hest knowledge, full-rate retiming at 40-Gbls has only been demonstrated in SiGe BiCMOS circuits from a supply voltage of -5.2V [IO] . CMOS CML circuits with 25-GHz clocks have recently been reported in 90-nm CMOS [Ill, suggesting that 40-Gbls full-rate retiming in CMOS may not be feasible until the 65-nm technology node. These results indicate that the proposed BiCMOS logic topology is two generations ahead of pure CMOS while at the same time operating from less than half the supply voltage of ECL SiGe HBT implementations.
